New Study Suggests Standardized Reading Tests Miss a Lot of Learning

It can take years to acquire enough new vocabulary to yield an increase in measures of general reading comprehension.

Feb 27, 2023

A new study suggests that standardized reading tests often fail to reflect what students have actually learned—and mislead educators about what to teach.

An increasing number of schools are shifting to a new kind of elementary curriculum—one that focuses on rich content, including social studies and science topics, rather than reading comprehension “skills” like finding the main idea of a text. Teachers who switch to a content-rich curriculum are often amazed by the academic knowledge their young students acquire and the sophisticated vocabulary they spout.

But skeptics argue there’s little evidence content-rich curricula boost reading comprehension. Sure, they say, kids may acquire knowledge of the topics covered in the curriculum, but does that enable them to understand texts on other topics?

“If Mrs. Smith’s third grade spends a year studying wombats,” reading researcher Tim Shanahan has written, “the kids may be superstars when reading on that topic, but what about other texts? Wombat knowledge isn’t likely to improve comprehension of texts on the U.S. Civil War, 2020 elections, or relativity.”

It’s true that kind of “transfer” should be the ultimate test of whether a knowledge-building approach boosts comprehension. But we’ve been trying to use that kind of test to measure incremental academic progress, and it’s led us down a self-defeating path.

Standardized reading tests rely on passages on random topics that aren’t tied to anything kids have learned in school. In fact, they’re designed to avoid those topics, because the idea is that the tests measure reading comprehension skills—not knowledge. When kids don’t do well on those tests, the typical diagnosis is that they need more work on the skills the tests purport to measure.

But if students haven’t acquired the background knowledge and vocabulary to understand the passages at least at a superficial level, they never get a chance to demonstrate their “skill” at finding the main idea, or making inferences, or whatever the test questions require. That’s why cognitive scientist Daniel Willingham has called the tests “knowledge tests in disguise.” Students who are able to acquire a lot of general academic knowledge—often those with highly educated parents—typically get the highest scores on reading comprehension tests.

The theory behind a knowledge-building approach to literacy is that learning about a series of specific topics in an engaging way—perhaps including wombats—will eventually provide students with the critical mass of academic knowledge and vocabulary needed to understand texts on a wide range of topics. But it’s not easy to get evidence to support the theory because that process can take years. And most research studies last just a few weeks.

How much time does it take for “transfer” to occur?

Still, there is some evidence that combining literacy instruction with subjects like social studies and science can boost standardized test scores in the early grades after only a year or less. And a recent longer-term study sheds light on just how much time it takes for kids to apply the knowledge acquired through such an approach to the kinds of passages used on standardized tests.

The study, led by James S. Kim of the Harvard Graduate School of Education, involved a curriculum he and his team created called MORE (Model of Reading Engagement). In the study, first- and second-graders got a science curriculum designed to gradually build their “schemas”—mental frameworks that enable people to understand the information they’re taking in. For example, when we see a Great Dane and a chihuahua and recognize they’re both dogs, we’re relying on our schema for “dogs.”

The theory behind the study was that helping students develop schemas for a series of specific topics would enable them to create more general schemas, which would support their comprehension of concepts they hadn’t learned about. Another way of putting this is that students would be storing information in long-term memory that would free up their capacity for taking in new information related to what they’d already absorbed.

In first grade, the question addressed by the curriculum was “How do animals survive in their habitat?” In second grade, it was “How do paleontologists study dinosaurs?” The general schema these units were designed to lead up to was “scientific study of the natural world.”

Teachers using MORE read aloud texts on the topics to build children’s background knowledge and guided them through activities like creating concept maps to help them organize and retain information and vocabulary. Students also learned the roots of words like “paleontology” (“paleo” meaning old and “ology” meaning the study of something) and engaged in collaborative research that involved reading, writing, and group discussions.

The study was a randomized controlled trial involving about 3,000 students at 30 schools—the kind of study that enables scientists to conclude that the new thing they tried (the “treatment”) actually caused the results they found. Most students were Black or Hispanic and from low- or middle-income families.

To measure how well students were able to transfer their knowledge to less familiar topics at the end of second grade, the researchers created a test with three kinds of reading passages:

A “near-transfer” passage about paleontologists using fossil evidence to research ammonites, drawing on what children had learned about the study of dinosaurs and their extinction.
A “mid-transfer” passage about archaeologists studying human fossils to research life in ancient Pompeii.
A “far-transfer” passage about genealogists studying people’s ancestors.

When compared to a control group of similar students who had not gotten the MORE curriculum, the treatment group clearly did better on both the near- and mid-transfer passages—and low-performing students benefited just as much as high-performing ones. But researchers found no significant difference between the two groups on the far-transfer passage.

In other words, after two years, kids had acquired a lot of knowledge about how scientists study the world, and they were able to transfer it to topics that were somewhat different from those they had studied. But there was a limit to how far that transfer could go.

Standardized reading tests don’t measure progress well

Standardized reading tests are all about far transfer. In fact, the far-transfer passage used in the study is nearer to what students had actually learned than the typical test passage is likely to be: it concerned scientists (genealogists, a word related to paleontologists) who study the past (albeit ancestors rather than dinosaurs).

In an email, Kim told me that one implication of his study is that a “spiral” curriculum like MORE—which revisits similar concepts in varied contexts—takes more than two years to show far transfer. He has new data coming out in a few months that, he says, shows “stronger evidence of far transfer after the third year of implementation.” Another implication, he said, is that we need to view transfer along “a continuum from near to far.”

Some caveats are in order. One is that MORE isn’t currently available to schools that would like to try it. About half a dozen other knowledge-building curricula are widely available, but Kim felt none of them connected simpler to more complex topics across grade levels in the way most likely to result in measurable transfer.

But the READS Lab, a research initiative headed by Kim, will soon be making MORE available to select school districts willing to adopt strategies designed to ensure the curriculum reaches all students, including those who might benefit the most. According to Ethan Scherer, director of the READS Lab, the hope is to offer it directly to teachers and schools eventually. The curriculum will also include social studies units as well as science in the near future.

Another important caveat has to do with the typical school schedule. MORE is designed to be implemented during the time set aside for social studies or science. But the average amount of time officially allocated for each of those subjects in elementary school is only about half an hour a day—and most schools don’t actually spend even that much time on them, instead prioritizing reading and math, the “tested” subjects. MORE’s second-grade curriculum consists of 45 40-minute science lessons, designed to be taught over 10 weeks. Teachers in the study who used MORE reported spending far more time on social studies and science than those in the control group—over an hour more for each every week. (Scherer says that in the future, the time required for MORE lessons will be reduced to 30 minutes each over a period of six weeks—three weeks for science and three for social studies.)

The fact that elementary schools spend so much time on reading—a reported average of two hours a day and probably often much more in practice—has led other developers of content-rich curricula to present them as literacy curricula that incorporate topics in social studies and science. But MORE is being marketed as a science (and eventually social studies) curriculum that incorporates literacy. Scherer says the training the READS Lab will offer districts will “clearly communicate” the importance of spending time on science and social studies.

The MORE study should prompt policymakers, administrators, and educators to ask some fundamental questions: Does it make sense to measure progress using tests that focus on supposed comprehension skills, knowing they may fail to capture actual learning? Even more crucially, should we continue to attach high stakes to tests that mislead educators into thinking knowledge is irrelevant to comprehension?

In most states, schools are using a variety of curricula, all of which cover different bodies of knowledge—if they cover any at all. So it may not be possible to create state reading tests with near- and mid-transfer items. But the MORE study suggests states should at least ground test passages in content specified in their social studies and science standards. That would presumably encourage educators to focus on those topics and help level the playing field for standardized tests.

And for those educators who have made the switch to a knowledge-building curriculum, the MORE study should provide reassurance that they’re on the right track—even if they’re not yet seeing results on standardized reading tests.

This post originally appeared on Forbes.com.

Harriett Janetos

Here's the conclusion to "Is reading comprehension an actual thing?" by The Reading Ape.

The research appears to suggest the following ‘best bets’ for schools:

• Reading comprehension is not ‘the next stage in reading’. The artificial divide between ‘learning to read’ and ‘reading to learn’ is a misapplication of Chall’s (1983) stages of reading development. Comprehension develops in tandem with reading acquisition and is aligned with listening comprehension and developing schema.

• Reading comprehension is not a skill and should only be taught as such as an exam technique.

• Prior knowledge is fundamental to reading comprehension and crucial for inference. Knowledge comes in many forms. For a pupil to be able to understand a text they must be furnished with sufficient knowledge to be able to read it with a top down, global perspective.

• This relevant knowledge must be built explicitly…

• …by teachers.

• Knowledge of complex language and syntactic structures is essential to reading comprehension. Regular exposure to these structures is essential through experience with high-quality texts above instructional level. Students will require expert, explicit scaffolding to negotiate these text…

• …from teachers.

• Reading comprehension is also a factor of vocabulary. Vocabulary is a function of greater knowledge. Vocabulary grows more quickly through explicit teaching. Students’ vocabulary appears to grow more quickly in classes where the environment is rich in language and teachers have a wide vocabulary. This is particularly important in early schooling and in areas of higher social and economic need.

Expand full comment

E Blount

Feb 28, 2023

Great article. Just throwing this out as food for thought not a deeply considered/researched point...If in an imaginary world we had a list of 20 topics that might be on a standardized test it’s true that some schools might be cramming wombat facts and Aztec civilization dates in preparation for a test and of course there would be egregious cases along with many normal lovely classrooms. There is an element of well these (typical standardized) tests can’t be stressful because there is nothing to study for, that would immediately be challenged if there was an actual list of topics. I don’t personally believe that they’re not stressful but there is a “do your best, just have a good breakfast” fantasy around some tests. But I do wonder if suddenly many classes/schools populated with underprivileged kids did well on the test, maybe even better than a tony suburb, if people would actually hate those results. That there is an element of it having to show the wealth gap or the results would unsatisfying for some. That said I would very much my kids were memorizing some marsupial facts to weeks of amorphous “find the main idea” exercises which all too often seem like “read the test takers mind and guess in which ways they are trying to be tricky.”

4 more comments...

Minding the Gap

Discussion about this post