Yes, National Tests Are Useful--But Not in "Reading"
The tests prop up an education regime that tries to test and teach reading comprehension in isolation from content--which can't be done.
Few Americans know much about the standardized tests collectively known as the National Assessment of Educational Progress. A new book argues for NAEP’s importance but overlooks the ways in which its emphasis on “reading” tests contribute to a lack of progress.
When parents and teachers think of standardized tests, what generally comes to mind are the annual state-mandated reading and math tests of the past 20 years. But NAEP, also called “the Nation’s Report Card,” has been around in some form since 1969, periodically testing representative samples of students at certain age and grade levels throughout the country in various subjects. Every two years, when its reading and math scores are released, NAEP does get a bit of attention, but otherwise it flies under the radar.
A major reason is that unlike state tests, NAEP has no direct consequences. Schools and teachers don’t get evaluated based on the results, and students don’t get held back. While that makes the tests less noticeable, it also makes them more reliable: there’s no incentive to game them.
In his recent book Assessing the Nation’s Report Card, Chester E. Finn, Jr., tracks the twists and turns in NAEP’s history and evaluates the prospects for its future. A longtime education policy wonk who served as the first chair of the National Assessment Governing Board—the entity that oversees NAEP—Finn seems to know everyone and everything NAEP-related, and he makes the intricacies of the story surprisingly engaging.
The saga that emerges is one of early pressure to make NAEP data more useful and—once that pressure was succumbed to—increasing controversy, even as the tests have maintained a low profile. Originally communicated as hard-to-interpret scale scores, like “263,” around 1990 the results began to also be reported in terms of “achievement levels:” advanced, proficient, basic, and below basic. At roughly the same time, the data became available for individual states—and later, some districts—rather than just for the nation as a whole.
The combination of those two developments—plus the congressional requirement that states participate in biannual reading and math tests after 2002—turned NAEP into something of a federal watchdog over states’ own test results. In 2005, for example, Tennessee and North Carolina both claimed that 88% of their eighth-graders were proficient readers, based on their state tests, but according to NAEP only a little more than 25% were.
The validity of these comparisons has been controversial. NAEP’s proficiency standards are more rigorous than those used by most states, and some charge they’re unrealistic. Finn counters that NAEP’s definition of proficiency is aspirational and that studies show it correlates closely with standards of college readiness. The debate over which definition of proficiency makes sense continues, emerging most recently in Virginia. But when it comes to our most pressing educational issues, that argument is largely beside the point. Large numbers of students aren’t even scoring “basic” on NAEP, and that group is disproportionately low-income, Black, and Hispanic.
Another source of controversy is the widespread tendency to draw conclusions from NAEP about what’s working or not. Finn points out, as have others, that because NAEP testing is far from a controlled experiment, the results can’t be used to establish that any particular reform or intervention caused a change in scores.
At the same time, though, Finn argues that the true value of NAEP is retrospective: it tracks data about education over time in the same way the National Weather Service tracks past meteorological events. That suggests that if NAEP documents a lack of progress or a decline in a tested subject over decades, that data could legitimately support the conclusion that the standard approach to teaching that subject hasn’t been working for many students.
In fact, four years ago the NAEP governing board itself convened a panel of experts who applied that kind of analysis to the area of reading. Reading scores had been stagnant since 1998, with only about a third of students scoring proficient, and the gap between students from low-income families and their more advantaged peers remained stubbornly wide—a situation that has only gotten worse. The reason, the experts said, is that the prevailing approach to teaching reading comprehension doesn’t line up with cognitive science. It focuses on largely illusory comprehension “skills,” like “finding the main idea” or “making inferences,” and has kids practice them using easy-to-read books on random topics.
Panelists explained that comprehension depends more on background knowledge than abstract skill. But instead of focusing on subjects that could build the kind of knowledge kids need to read complex text—social studies, science, the arts—elementary schools have cut back on those subjects to focus more on comprehension “skills.” The result is that children from better-educated families generally score higher on reading tests—and achieve academically—because they’re able to pick up academic knowledge at home.
Unfortunately, the federal officials who attempt to explain NAEP reading results to the public seem unaware of that 2018 panel or its findings—even though its critique of reading comprehension instruction has become more widely known and accepted in recent years. Finn fails to mention it as well.
You might argue that’s perfectly appropriate. After all, NAEP didn’t create the standard approach to reading comprehension; it merely documents the damage it’s done. But for many years, Finn headed an education think tank called the Thomas B. Fordham Institute, which has long been a leader in criticizing elementary literacy instruction focused narrowly on comprehension skills—and connecting it to low scores on reading tests. Why not mention that as an unintended consequence of a testing regime that puts so much emphasis on reading comprehension?
Beyond Finn’s personal connection to the issue, there’s the indirect but substantial influence that NAEP exerts. Finn himself describes NAEP as “a laboratory or test kitchen for the invention, design, and continuing evolution” of standardized tests in general. And the widespread assumption that reading tests like NAEP’s assess free-floating comprehension skills is a major reason schools focus on those skills at the expense of building academic knowledge.
An internal controversy over NAEP illustrates the inherent contradictions between Finn’s two positions—recognizing that knowledge is the key factor in reading comprehension, on the one hand, and continuing to advocate for “reading comprehension” tests, on the other. Last year, one faction on NAEP’s governing board proposed providing test-takers with more definitions of words in the reading passages, ostensibly so that students’ true comprehension ability could be more accurately measured. Finn and others argued, reasonably, that such a move would undermine the tests’ validity, since knowledge of vocabulary is what the tests are in large part actually measuring. But if you call the assessments “reading comprehension” tests—tests that purport to isolate and measure that supposed abstract ability—then it makes sense to provide kids with whatever definitions they need to enable them to demonstrate what they can comprehend.
I’m not arguing that NAEP should be abolished. As Finn argues, it provides valuable information about trends in student achievement and highlights disparities between demographic groups. But as I’ve suggested before, if NAEP were to secure congressional permission to stop giving its biannual “reading” tests and instead focus on subjects like history and geography—which are assessed far less frequently, and where scores are even lower—it could send a powerful signal about what’s actually important to reading comprehension.
Failing that unlikely move, here are a few other things NAEP could do to help bring clarity to the situation:
Test students’ foundational reading skills separately from “comprehension,” so that we have a more accurate picture of the obstacles facing students who score below basic. NAEP purports to be testing comprehension, but a recent study showed that most fourth-graders in the below-basic category struggled with word-decoding and oral reading fluency. If you don’t have those skills, no amount of work on reading comprehension will turn you into a proficient reader.
On reading tests, assess students’ content knowledge of topics in social studies and science and correlate those scores with their comprehension scores on passages relating to those topics. My prediction: the correlation would be quite high, suggesting that what’s really being tested is knowledge rather than comprehension.
Find an accurate way to measure students’ socioeconomic status, so that scores aren’t routinely reported only in terms of race and ethnicity, obscuring the basic reason most students score low on reading comprehension tests—i.e., that they lack academic knowledge and vocabulary. That has to do primarily with parental levels of education, but socioeconomic status is a pretty good proxy for that.
At the very least, NAEP officials could explain what “reading” tests are actually assessing rather than making vague comments about how we need more time on reading “skills.” In a response after this post was originally published, Finn dismissed that idea as “a small, somewhat nebulous suggestion.” He added, “Sure, it would be good if everyone who presents or interprets those results would start with a dozen cautions and limits. But c’mon, folks, we’re living in a world of soundbites, headlines, and tweets.” I respectfully disagree, and I think Finn underestimates the interest of journalists, educators, and much of the public in the reasons for continuing low reading scores and widening gaps.
Finn’s book itself includes an example of the importance of background knowledge to comprehension. In his concluding remarks, he offers an extended metaphor about possible NAEP reforms based on the story of Goldilocks—involving “porridge” that is too hot, too cold, or just right. To ensure readers have the information to make the necessary inferences, Finn summarizes the story and provides a link to the complete version in a footnote.
But NAEP itself has helped create an environment where, apparently, you can’t just assume that every English-speaking reader is familiar with the story of Goldilocks. Rather than building kids’ knowledge of fairy tales and other common sources of metaphors, elementary schools have been too busy engaging in fruitless attempts to teach kids the supposedly abstract skill of “making inferences.”
This post originally appeared on Forbes.com in a slightly different form.