Is Testing the Next Frontier in Ed Reform?
A short-lived experiment in Louisiana suggests the power of tying reading tests to content that students have been taught.
Image generated by AI (as you can probably tell)
Eight years ago, Louisiana’s top education official announced a bold experiment: a radically different kind of state reading test, one that would assess students’ learning based on what they had actually been taught.
What’s so bold and radical about that, I hear you say? (Or some of you, anyway—perhaps those who are new to this Substack.) Don’t states routinely test students on what they’ve been taught?
Well, yes and no. Standardized reading tests aim to measure students’ abilities to do things like find the main idea of a text or make an inference about what a word means in a particular context. In most schools, reading or ELA instruction focuses primarily on those kinds of skills. So in that sense, students are being tested on what they’ve been taught.
The problem, as then-state superintendent of Louisiana John White explained in a 2018 opinion piece, is that those skills don’t transfer from one context to another. To be able to make an inference, for example, you need a certain threshold of relevant background knowledge. Lots of kids lack enough background knowledge to make sense of the passages on reading tests, which are designed to avoid topics that might be covered in the school curriculum. A student might be asked to find the main idea of a passage on, say, rugby, when she’s never even heard of the sport.1
Kids from less highly educated families are particularly likely to lack the background knowledge they need for the tests. And the problem becomes most apparent at higher grade levels, when the test passages assume the reader possesses increasing amounts of knowledge and vocabulary.
In his 2018 op-ed, White announced that Louisiana had just submitted a proposal to the federal Department of Education to develop an “Innovative Assessment Pilot” under the recently passed ESSA legislation.2
“Rather than administering separate social studies and English tests at the end of the year,” White wrote, “Louisiana schools participating in the pilot will teach short social studies and English curriculum units in tandem over the course of the year, pausing briefly after each unit to assess students’ reading, writing and content knowledge. Students, teachers and parents will know the knowledge and books covered on the tests well in advance. Knowledge of the world and of specific books will be measured as a co-equal to students’ literacy skills. And teachers would have good reason to focus on the hard and inspiring lessons of history and books.”
The tests, to be administered three times a year at the middle school level, wouldn’t just assess students on texts they had read—referred to as “hot” reads. They would also introduce “warm” reads, other texts related thematically to those in the curriculum. If students read The Giver, for example, they would get some questions about that text and some about passages from other works of dystopian fiction. Through a mix of multiple-choice and essay questions, students would be asked to make connections across texts.
The US DOE did approve Louisiana’s Innovative Assessment, or “IA.” And those of us who advocate for recognizing knowledge as a prime component of reading comprehension eagerly awaited the results of the experiment.
What Happened to the IA?
It’s been a long wait. White stepped down as superintendent in 2020, after eight years in the position. The IA experiment continued, but Covid disrupted testing for a couple of school years. And White’s successor as superintendent seemed to lose interest in the project.
Louisiana adopted new social studies standards in 2022, which—White told me—required separate reading and social studies tests rather than the combined test that was envisioned in the state’s proposal. That reduced the appeal of the IA for school districts. At its peak, only 28 percent of districts in the state were participating. In 2024, the state quietly discontinued the experiment—so quietly that if you ask Google when it was discontinued, it will tell you that it’s still going, even though I have been assured by people involved in the IA that that is not the case.3
So there isn’t a lot of data to work with. The IA was fully operational for only two school years, 2021-22 and 2022-23. For the first of those years, when it covered just grade seven, only about 4,700 students participated. The second year, when it covered grades six through eight, it reached 17,000 students.
It’s frustrating that the experiment wasn’t given more of a chance to succeed. But the data we do have are promising—particularly with regard to changing teacher practice.
Student Performance
Let’s start, though, with the preliminary data on student performance. One of the state’s partners in the effort, NWEA, released a white paper in 2021 finding that “many students” reported feeling less anxious while taking the IA as compared to the regular state test, known as the LEAP. The white paper also showed that students were generally more engaged when taking the IA. And Louisiana’s 2023 annual report on the IA to the U.S. Department of Education reported that 66 percent of students who took the IA preferred it to the LEAP, with the same proportion saying they felt confident or very confident in answering the questions.
What about narrowing the gap between students who are economically disadvantaged and those who are more affluent? That gap remained substantial on the IA, but the NWEA report found that it was significantly smaller than on the LEAP. “The new test design may be ‘leveling the playing field’ by providing students a more equitable opportunity to show what they know,” the report suggested.
More Focus on the Meaning of Texts
The IA was also the subject of a 2022 PhD dissertation, written by none other than John White himself. White focused not on students but on teachers in a subset of districts using the IA in the 2019-20 and 2020-21 school years, relying on surveys and interviews.
White, now CEO of Great Minds—a curriculum publisher whose products include the knowledge-building literacy curriculum Wit & Wisdom—turned up one significant finding: Teachers who used the IA were more likely to focus their instruction on the meaning of a whole text rather than on isolated comprehension skills.
White and others have found that even in districts using content-rich, knowledge-building curricula, teachers often continue to focus on isolated skills. Standardized “benchmark” tests, given throughout the school year, appear to be a major factor; they send educators the message that struggling students need more work on skills—when in fact, the problem may be a lack of background knowledge. The result, according to a recent rigorous study from SRI, is that instruction is “superficial” rather than “robust.”
In districts participating in Louisiana’s AI experiment, on the other hand, teachers trusted that the IA’s interim tests were reliable measures of their students’ progress. Those tests guided them to focus primarily on content rather than skills.
In the interviews White conducted, teachers made it clear that testing was the major influence shaping their instruction. “The most important thing is whatever test we’re giving,” one teacher told him, “because it is how I’m judged as a teacher. It’s how your students are compared to other students. It’s how your school is compared to other schools.”
Even though teachers were using the same curriculum as before—Louisiana’s state-created ELA Guidebooks 2.0, which is content-rich—they reported that the IA changed their approach. “We used to devote time to test prep, and we would just do practice LEAP tests,” one teacher said. “We don’t do that anymore. We devote our time to diving into the unit and making sure that students have a strong understanding, as much background knowledge as we can possibly give them.”
Little Change in Teacher Beliefs
While the IA did prompt teachers to change how they taught, White found no strong evidence that it changed their underlying beliefs about the goals of comprehension instruction. He told me that didn’t surprise him, given the brevity of the experiment and the many influences on teachers’ beliefs. A change in a “procedural dictate,” which is typical in a large system, is likely to produce only “procedural responses,” he said.
Research by assessment expert Thomas Guskey supports that observation. Guskey found that changes in teachers’ beliefs generally occur only after they try out a new classroom practice—and see that it produces student success. But if the system guides them to measure that success by increases in supposed skills, they may not notice or attach much importance to increases in students’ knowledge. A test that rewards increased knowledge and skills could change teacher beliefs eventually, but unfortunately Louisiana’s IA didn’t last long enough to reveal whether it would have had that effect.
Even if the IA had lasted longer and produced more dramatic results, the exact model couldn’t have been replicated in other states. Louisiana was able to engage in its experiment only because the vast majority of its schools use the same literacy curriculum, ELA Guidebooks. The state’s 2023 report to the US DOE estimated that 80 percent of Louisiana classrooms were using it. In other states, districts use a wide variety of curricula, covering different texts and topics. There would be no common content in which to ground a test like the IA.4
What Other States Could Do
There is, however, another possibility. State literacy standards rarely specify content—they just list comprehension skills—but all states have social studies and science standards that do include specific topics. Any state could, theoretically, ground the passages on its state reading test in the content of those standards, giving teachers an incentive to teach those subjects and helping to level the playing field for students.
White is skeptical that will happen. While he believes the education reform movement’s “gravest mistake is to be agnostic on the substance of what kids learn,” he says there are practical obstacles to tying tests to specific content. State officials in charge of curriculum have no control over the design of the tests. That authority belongs to psychometricians, whose primary concerns are that tests be “reliable,” aligned to relevant state standards, and comparable within and between states.
That perspective, White says, is narrow. “There’s not great curiosity, in that technocratic worldview, of getting under the messy hood of, well, was it worth it?” he says. “Did kids learn the content that people need to be productive?”
Changing state reading tests to align with content standards would, White says, “take a level of vision, coordination and time.” Officials would need to not only engineer a new kind of test and pilot it but also convince others—including the state board of education and the legislature—that “it’s worth moving off the old reliable model and onto the new one.” It would also cost money. There was no federal funding for the IA, and Louisiana had to raise the funds in part through philanthropic support.
Steep as the obstacles are, the potential rewards are huge, and I can only hope that at least one state will take on the challenge. White says the federal government could help by establishing a pool of funding that states could draw on for research and development in the area of assessment.
He added that the current Republican administration might be more open to experimentation with testing than a Democratic one. In the past, White says, Democratic administrations have seen proposed changes in testing more as a threat to civil rights than as a means of improving outcomes for students in historically disadvantaged groups.5
“What gets tested gets taught,” according to a timeworn but clearly evidence-based adage. If we continue to test illusory skills, that’s what teachers will continue to focus on, to the continued detriment of many students. What those students need, if they’re going to have a chance to succeed, is another state education leader who is willing to try a bold and radical experiment—and whose efforts aren’t swept away by political winds before they’ve had a chance to take root.
This example is based on a real anecdote I heard from a school leader.
ESSA, which stands for Every Student Succeeds Act, replaced the federal No Child Left Behind legislation in 2015. One of the goals of the legislation was to reduced the emphasis on standardized tests.
The state has never announced that it terminated the IA pilot, let alone explained why. But others I’ve spoken with who were connected with the IA effort have cited several factors. The US DOE required that after five years, the innovative test must be made available to all schools in the state, and the IA could only be used by districts using one of two curricula it was tied to, Louisiana Guidebooks or Wit & Wisdom. That left about 15 percent of schools unable to use the IA. There were also considerations of cost and comparability to the regular state test. But it appears that politics also played a role. After White left office, the Democratic state administration was replaced by a Republican one.
Before the IA experiment was discontinued, the Louisiana Department of Education had created a second version of the test that was tied to the content in the second most widely used curriculum in the state, Wit & Wisdom. It was also considering ways to reshape the test so that it was more curriculum-agnostic.
The concern over civil rights is rooted in the fact that without standardized tests, students and their families can be lulled into a false belief that they’re progressing. Teachers’ assessments of their own students are often subjective and unreliable. When England switched from standardized exams to teacher-graded tests during the pandemic, for example, grade inflation skyrocketed. But changing the tests to guide teachers away from wasting time on isolated skills instruction is quite different from eliminating them.



Wonderful piece, Natalie — I really appreciated the specificity of the example. Seeing how The Giver would be taught to all students, revisited for analysis, and then paired with thematically related dystopian texts made the assessment model feel much more concrete.
After years of assessing students mostly on cold passages, my own classroom assessments eventually evolved in a similar direction: close reading anchored in the most crucial moments of a shared text, aligned to the kinds of thinking the standards require, and extended through related passages (e.g., pairing Balzac and the Little Seamstress with Red Scarf Girl) where core concepts like dystopia or propaganda had to be explored in a fresh context.
One question this raises for me is about structure as distinct from content. Modern ELA assessments have a very specific register, question architecture, and style that’s often backward-aligned to college entrance exams. Many HQIM seem to step away from those structures in favor of short constructed responses and meaning-centered tasks. In a vision like Louisiana’s IA, I’m curious how much of our current assessment structure would remain, and how much would need to change alongside the content.
As your piece makes clear, if you change what gets tested, you change the game. The open question for me is whether we’re prepared to rethink not just what reading assessments are about, but the forms they take and the incentives they therefore encode in our classrooms.
Thanks for opening up this conversation.
Indeed testing does change teacher practice. What we need to do is study the practices of teachers of reading who teach READING FOR MEANING FROM ONE PROJECTED TEXT. What exactly do they do in the classroom when they teach for TEXT MEANING that engages students and engenders correct answers? What types of questions and writing assignments do they use to test comprehension? How do they assist students in applying what they have learned from texts they have read TOGETHER? Answers to these questions should provide the framework for testing reading comprehension. The question then is what content should be tested . . . my thoughts on that open up a whole other can of worms though.