Can We Measure Reading Comprehension Separately From Knowledge?
Maybe "reading" tests should be rebranded as assessments of diverse content knowledge.
The American education system rests in large part on the assumption that reading comprehension can be tested separately from knowledge. But a recent symposium sponsored by the overseers of the nation’s preeminent reading test revealed how shaky that assumption is.
Every two years, a sampling of students across the country take reading and math tests as part of the National Assessment of Educational Progress (NAEP), also called the Nation’s Report Card. It’s considered more reliable than annual state tests because it’s less subject to manipulation. But especially when it comes to reading, there hasn’t been much progress to report. Scores have been stagnant or declining for decades, and the gap between high- and low-achievers has widened.
Against that background, the NAEP governing board has launched an attempt to revise the reading test framework. One goal is to make the test fairer by compensating for differences in students’ background knowledge, a factor that everyone agrees can affect reading comprehension. That was the focus of last week’s symposium, open to the public but aimed at the board, which will vote on the updated framework in May.
The proposed new framework provides some sample test items, including a story at the fourth-grade level about a girl who signs up to play the violin in a talent show. In a traditional reading test, the student would just get the story, along with some directions. But what if she isn’t sure what a violin is or has never heard of a talent show?
The revised NAEP framework, designed for computer-based tests, tries to compensate for that lack of background knowledge so that it won’t interfere with measuring a student’s comprehension ability. The new framework would offer a brief video showing children playing in an orchestra, to help demonstrate what a violin is. And when a student starts reading the passage, pop-up boxes would provide information about the meanings of certain words and phrases.
For example, a sentence in the story says that when the main character told her brothers she’d signed up for the talent show after taking only three violin lessons, the brothers “nearly fell out of a tree.” The student has to decide what that means—are the brothers angry, or happy, or are they making fun of their sister? If the student clicks on the phrase “talent show,” a pop-up box explains that it’s “a show in which different people perform a special skill.”
At the outset of the symposium, the board heard from cognitive psychologist Daniel Willingham, whose basic point (politely expressed) was that this whole endeavor made no sense. Lots of evidence shows that it’s impossible to separate reading comprehension from background knowledge, he said. Readers are constantly using their knowledge to fill in gaps in the information provided by the author. In fact, he said, filling in those gaps essentially is reading comprehension. (To get a better understanding of that point, check out this whimsical ten-minute video Willingham created.)
That pop-up box may seem helpful, but “talent show” means different things depending on the context. In the story, it’s used to mean an opportunity for people to show off how skilled they are—and the implication is that a girl who has taken only three violin lessons doesn’t belong in one. But Willingham posited a sentence saying that a Hollywood producer stumbled upon a rising star at a “talent show.” There the implication would be that a talent show is for relatively unskilled amateurs, and thus not a place where you’re likely to find a potential star.
In providing the definition of a word or phrase, Willingham argued, the test designer is judging which meaning is relevant rather than leaving that to the test-taker—and it’s a judgment that’s central to comprehension. If decisions like that are eliminated from the test, Willingham said, “I think we would be fooling ourselves about how well some kids read.”
With one exception, the other experts at the symposium made points that ran on a parallel plane to Willingham’s. They agreed that background knowledge plays a role in comprehension, but they saw it as something to be mitigated so that true comprehension skill can be assessed. In addition to pop-up boxes and other “support features,” the mitigation strategies included:
· Making the topics of passages diverse, to avoid giving any one group of students an unfair advantage. For example, a story about snow—a phenomenon that students in colder parts of the country are familiar with—would be balanced by one about the ocean. Representatives of international testing organizations—PIRLS and PISA—spoke about avoiding topics that favor students in any one country.
· Reviewing test items for “bias,” with a particular eye to culture, ethnicity, geography, and gender.
· Ensuring that all the information students need to answer the questions is contained in the test passage itself.
· Providing extra information, whether by pop-ups or more traditional means, that is “not tested.”
But if you accept the idea that it’s impossible to separate background knowledge from comprehension, there are flaws that undermine all these efforts. Test designers can’t always anticipate which words or phrases some students won’t know, regardless of where they live or what gender they are. They also can’t ensure that all the necessary information is in the passage, because some students may not have the background knowledge to understand the terms in the passage. And information in side explanations is almost inevitably being “tested,” even if no question asks about it directly. In the “talent show” example, students aren’t asked what a talent show is. But if they don’t have that information, they could choose the wrong answer to the question about the brothers’ reaction as a result.
The most problematic kind of inequity in background knowledge was barely mentioned at the symposium. It’s not between students from different geographic regions or of different genders. It’s between students whose families have different levels of education—and usually, different levels of income. Kids with more highly educated parents are able to acquire at home the kind of knowledge that helps them on reading tests. (Nor did anyone at the symposium mention another problem that disproportionately affects students from lower-income families: Tests like the NAEP assume that students can read words—that is, decode them using phonics. But even at higher grade levels, many have difficulty because of ineffective instruction.)
None of the standard efforts to mitigate the effect of background knowledge has been able to narrow this gap, which has held steady or increased in many other countries as well as the U.S. in recent years. And, it was revealed at the symposium, a trial of the pop-up boxes and other innovative efforts during the 2019 NAEP showed that they were correlated with higher overall scores but led to no difference in the gap between high and low scorers.
The only presenter at the symposium whose remarks rhymed with Willingham’s was Jenna Chiasson, representing the Louisiana Department of Education. She described a new kind of reading test the state is piloting. Unlike the NAEP, the international tests, and every other state test, the experimental Louisiana test uses passages related to the curriculum. Some are from books that students have actually read in class, while others are from books on related topics. In other words, the curriculum provides the background knowledge students need for the test. Louisiana can do that because 75% of the state’s schools are using the English curriculum created by the state.
Chiasson said that even students who aren’t reading at grade level “have a higher level of comfort” and “feel empowered” when they see test passages with familiar passages and topics. While the data is still preliminary, it indicates that students are more engaged with the new tests and that students of lower socioeconomic status are performing better than they do on traditional tests.
The NAEP board can’t try something like that. The U.S has no national curriculum. And according to Gina Cervetti, a reading researcher who represented the drafters of the new framework at the symposium, NAEP is prohibited from grounding its tests in a particular curriculum. In any event, Cervetti said, NAEP has tests in content areas like civics and U.S. history that are designed to assess knowledge. While differences in background knowledge will inevitably affect scores, the reading test is intended to primarily measure comprehension.
True. But is that possible? Perhaps the NAEP reading framework should be revised to include the idea that it’s testing “a diverse array of content knowledge,” as suggested by board member Martin West, a Harvard education professor. It won’t solve the equity problem, but at least it would provide a more accurate idea of what the problem actually is.
This post originally appeared on Forbes.com.