How to Improve the NAEP Reading Test
Can we make the national tests more useful, or at least less harmful?
Faithful readers of this Substack will know that I’m not a fan of the NAEP reading test, which is given every couple of years to a representative sample of American fourth- and eighth-graders.
Like state reading tests, the NAEP (that stands for National Assessment of Educational Progress) purports to measure reading comprehension skills in the abstract. As I and others have argued, that’s not possible. Students’ ability to comprehend text is inseparable from their background knowledge, vocabulary, and familiarity with complex syntax.
Unlike state reading tests, NAEP scores aren’t tied to individual schools or teachers—or, in most cases, school districts. So the test doesn’t exert direct pressure on schools or teachers to try to boost scores. But it has a strong indirect influence on what goes on in American classrooms.
As I’ve noted, the most recent scores, which were dismal, elicited a lot of commentary about the “science of reading,” interpreted to mean “phonics”—despite the fact that the NAEP purports to test reading comprehension, not decoding ability.
But even when viewed as a measure of comprehension, the NAEP often leads to the assumption that low-scorers just need more practice with the skills it purports to measure—finding the main idea of a text, making inferences, etc. In fact, it’s far more likely that students score low because they lack the knowledge that would enable them to understand the passages on the test, which are on a bunch of random topics that don’t relate to anything they might have learned in school.
“The Bell of Atri”
Take a look at the sample passages and questions available on the NAEP website. One passage is from the 2022 fourth-grade test, called “The Bell of Atri.” It’s a fable, set in “the Abruzzi mountains in Italy” at some unspecified time in the distant past.
The story includes quite a few words and concepts that I suspect many American kids would find bewildering: a bell in a marketplace whose rope is “fraying,” its replacement by “an unusually long grapevine,” a horse that pulls on the vine to soothe “his poor parched lips.” The questions purport to measure students’ general ability to, for example, “recognize text-based generalization.” But if you’re confused by the vocabulary and syntax, how do you demonstrate your ability to do that?
Fourth graders could acquire the vocabulary that would enable them to understand this story. But in our current system—which ascribes students’ lack of understanding to inadequate practice on comprehension skills and limits them to simple texts they can theoretically understand on their own—few do.
To be clear, I’m not entirely against the NAEP. It provides a relatively reliable barometer of where American students generally stand, which is especially necessary at a time when an increasing number of states are fiddling with their own tests to create the illusion of improvement. The problem is that the NAEP reading test obscures the real reasons for the stagnation and decline in student achievement, along with the expanding gap between high- and low-scorers.
The Constraints of the “NAEP Law”
One solution would be for NAEP to just drop its reading tests and focus instead on its tests in the content areas: U.S. history, civics, science. Those are given relatively rarely, and the scores are generally even worse than those in reading. If they got more attention, maybe schools would realize they need to focus more on content.
The problem is, the “NAEP Law,” enacted by Congress, requires the U.S. Department of Education (DOE) to conduct assessments in “reading and mathematics” at least once every two years, in grades 4 and 8. No getting around that, unless Congress amends the law—an unlikely eventuality.
Or at least, that’s what I would have said a few days ago. Even if President Trump carries out his plan to eliminate the DOE, the NAEP might well survive in its present form. The National Center for Education Statistics (NCES), which administers the test, existed long before the DOE was created in 1979. In fact, it dates back to 1867. And the NAEP was first given a decade before the DOE came into being.
But the Trump administration has just announced cuts to the DOE totaling over $900 million, according to the New York Times, “apparently aimed at hobbling the department’s research arm, the Institute of Education Sciences.” The IES is responsible for, among other things, overseeing the NAEP. (Responsibility for the NAEP is spread over a confusing alphabet soup of entities, including NAGB, the National Assessment Governing Board, which sets policy.)
So far, the cuts are directed at researchers and contractors—and could seriously impede some important research—but not at the NAEP. Maybe the tests will be spared because of the NAEP Law, but in the current environment, there are no guarantees.
Still, assuming the NAEP is here to stay, at least for the foreseeable future, can we make it more useful despite the legislative strictures? Here are a few suggestions:
Separate out the different components of reading. Kids might score low on a reading test for a number of reasons: Maybe they can’t decode the words, or can’t decode them with enough automaticity to enable them to understand the text. Even if their decoding is fine, maybe they lack the background knowledge and vocabulary assumed by the text. Maybe they’re unfamiliar with the complex syntax of written language. Many kids undoubtedly experience a combination of these difficulties. (And of course, those who struggle with decoding should also be provided with access to knowledge, and vice versa.)
Right now, though, NAEP scores are a black box that just tell us many kids don’t “read” well. Maybe it’s too much to expect a national test to uncover all the possible reasons, but we could at least test decoding separately from “comprehension.” A NAEP oral reading test given in 2018 revealed that the lowest-scoring fourth graders can’t actually decode very well. Is that true of low-scoring eighth graders as well? The NAEP doesn’t tell us.
Rebecca Kockler, founder of Magpie Literacy, points to the need to regularly screen older students’ foundational literacy skills—not just phonics abilities, but also the advanced skills needed to understand more complex text. “Most sixth graders do not need more phonics instruction,” Kockler wrote in an email, “but they might very well need explicit instruction in syllabification, grammar, or spelling.”
The NAEP tests only a representative sample of students and therefore can’t be used for screening purposes. But if it were revamped to ferret out some discrete components of reading skills—rather than just “comprehension”—the test could provide better insight into the causes of low scores at higher grade levels.
I’m tempted to point out that the NAEP Law itself doesn’t require a reading comprehension test—just a “reading” test. Researchers Alan Kamhi and Hugh Catts have suggested that we could make significant progress on education by redefining the word “reading” to mean decoding. That interpretation would allow IES to just test students’ decoding ability without any change in the language of the statute. But I’m not optimistic anyone in a position of authority will buy this argument.
Ground the reading passages in commonly taught content. If the NAEP, like other reading tests, amounts to what cognitive psychologist Daniel Willingham has called a “knowledge test in disguise,” why not level the playing field by choosing topics that most kids are likely to have acquired knowledge about in school? Why choose random topics in an elusive quest to measure comprehension ability in the abstract?
NAEP aficionado Chester Finn has argued that such a reform would require a national curriculum—a definite no-no, not to mention a no-go, in our highly localized education system. But if that’s true, how come there are NAEP tests in U.S. history, civics, and science? No one is complaining that they’re enacting a national curriculum. It’s just assumed they’re reflecting the content that schools across the country are teaching.
It’s true that state academic content standards vary, especially in the area of social studies. But as a report of the American Institutes for Research has documented, there’s a fair amount of common ground. In the lower elementary grades, standards tend to cover topics like citizenship, community, and local government. Okay, fourth grade often focuses on state history and geography. But U.S. history is often taught in fifth and eighth grade. (I’m skipping over sixth and seventh grades, where standards are often focused on more state history and “world history,” a term so broad as to be meaningless.)
That would give NAEP test designers a place to start—and state science standards are likely to be even more similar, given that 71 percent of U.S. students live in states where the standards are influenced by the same framework. Not only would a focus on common topics level the playing field, it would also give teachers a much-needed incentive to teach social studies and science rather than abstract comprehension skills.
Finn also argued that this kind of reform of the NAEP reading test is unnecessary because if a state did push hard for a “knowledge-rich, statewide curriculum from kindergarten onward,” its NAEP scores would rise and other states would follow its example.
But it can take years for changes in curriculum and instruction to result in improved test scores—especially at the state level, where reforms may be implemented unevenly across districts. Meanwhile, kids will continue to suffer academically. And no state actually has a statewide knowledge-building curriculum, although Louisiana comes the closest.
Plus, even if test scores rise, there’s no guarantee that observers will accurately diagnose the reasons. This year, Louisiana was the one state where fourth-grade scores were up, but the state’s success was narrowly attributed to “phonics.”
Provide scores based on both socioeconomic status and race or ethnicity. As I’ve argued before, the emphasis on race and ethnicity in reporting NAEP scores can lead people to think the test-score gaps are all about race. Racial gaps are undeniable, but the driving factor is more likely to be socioeconomic status, and particularly level of parental education. Children whose families are more highly educated are more likely to absorb at home the academic knowledge and vocabulary that enables reading comprehension and learning in general.
It’s a promising sign that the NAEP is now using a more robust measure of socioeconomic status—not just the unreliable metric of eligibility for free or reduced school meals, but also factors like the number of books in a student’s home. Maybe that will lead to a greater focus on the socioeconomic gap.
But to really understand who is being left behind and why, it would help to have data that has been sliced and diced in more precise ways. What about a Black or Hispanic student whose parents don’t earn much but have PhDs? How does that kid do as compared to a white kid whose parents have a high school education or less?
If at least some of these changes could be put into effect, I think the NAEP could provide us with some truly valuable information—and possibly lead to the educational progress the test is supposed to be assessing.
Update, 2.13.25: Sources cited in a report on The 74 said that the NAEP will continue to be administered, but “contracts to analyze the data and report it publicly were canceled and will be offered to new bidders.”
Update, 2.17.25: New material about the need to test older students’ decoding skills was added to this post, based on communications with Rebecca Kockler.
Background knowledge and experiences are critical for reading material that is outside their everyday lives
The Kim study you mentioned (covid impacted) didn't test a knowledge building ELA curriculum, so we can't really make any conclusions about the knowledge building curriculum on the market as a result. It tested and integrated program using during social studies time.
And it's curriculum (the more curriculum) wasn't set up in a similar way to the knowledge building curriculum on the market and as cited by the KMC. The More curriculum has as much in common with the KMC cited curriculum as it does with basals like magnetic reading and the like.