How to Make the Dream of Education Equity (or Most of It) a Reality
Studies on the effects of tutoring--by humans or computers--point to ways to improve regular classroom instruction.
It sounds like a beautiful dream: What if we could enable almost any student to learn as well as the top-achievers? Or what if we could at least enable all students, regardless of where they start from, to learn at the same pace?
Two sets of data, one from many decades ago and one more recent, suggest that this dream—or something approximating it—could be turned into reality.
The dataset from many decades ago comes from someone whose name is familiar to many teachers: Benjamin Bloom. of Bloom’s Taxonomy—or, more accurately, a couple of his graduate students. In 1984, he relied on their experimental studies to write an essay called “The Two-Sigma Problem,” which has been much cited in the years since.
The studies that Bloom wrote about found that when compared to traditional classroom instruction, tutoring had outsize effects on learning—two “sigmas,” or 2.0 standard deviations—over the course of just three weeks. Standard deviations are a statistical method that researchers use to compare the effects of different experiments.
How massive is 2.0 standard deviations? Education researcher Matthew Kraft has argued that in the education context, an effect size of between 0.05 and 0.20 should be considered “medium,” and anything above 0.20—one tenth the size of the effect Bloom was talking about—should be considered “large.” Kraft pointed out that one in four randomized controlled trials he surveyed had effect sizes that were zero or negative. (The studies Bloom relied on were randomized controlled trials involving students in grades 4, 5, and 8.) Plus, a lot of studies finding zero or negative effects never get published; why bother to publish something that shows an intervention didn’t work? The bottom line is, as Kraft notes with dry understatement, “raising academic achievement is difficult.”
No wonder Bloom’s two-sigma studies have drawn so much attention over the years—including from tutoring entrepreneur Sal Khan and personalized learning enthusiast Mark Zuckerberg. If we could replicate that effect on a large scale, it would bring the average student to the 98th percentile of the achievement distribution!
One problem, of course, is that it’s prohibitively expensive to hire a tutor for every average or struggling student, or even one for every two or three of them. This was the two-sigma “problem” that Bloom alluded to in the title of his essay: how can the massive benefits of tutoring possibly be scaled up? Both Khan and Zuckerberg have argued that the answer is to have computers, maybe powered by artificial intelligence, serve as tutors instead of humans.
Students All Learn at the Same Rate
Enter the more recent study, which the Hechinger Report’s Jill Barshay wrote about in November 2023. Researchers at Carnegie Mellon analyzed the learning rates of about 7,000 students, ranging from late elementary school through college. To do this, they had participants use instructional software or play educational games on computers. They also divided up the information to be learned into units they called “knowledge components.”
What they found astonished them, just as Bloom and his grad students had been astonished by their findings on the effects of tutoring: All participants in the study learned at essentially the same rate. No matter where they started, it took about seven “learning opportunities” for each of them to master a typical knowledge component.
Of course, the participants did start at different places. Some of them came into the study with more relevant background knowledge than others. Students in the lower half of the distribution initially scored only 55% correct on a pre-test of the subject matter, on average, while those in the top half averaged 75%. That means that those who started out ahead reached “mastery” levels—defined as 80% correct—with fewer learning opportunities. Still, those who started out ahead were learning about the same amount with each opportunity as those who started out behind. That raises the possibility, the researchers conclude, that—under “favorable conditions”—“anyone can learn anything they want.”
All of this may sound too good to be true—and in some ways it is. Writing in Education Next, Paul T. Von Hippel recently pointed out a number of reasons why the two-sigma effect hasn’t been replicated. For one thing, students were learning about topics they started out knowing nothing about—probability, for the younger students, and cartography for the older ones. To assess the effects of tutoring, they were tested just on those subjects, not on a standardized test that covered a range of subjects. Both of those characteristics, Von Hippel says, makes it more likely that a study will yield large effect sizes.
How to Improve Outcomes Even Without Tutoring
But some of the other reasons Von Hippel cites for the massive effect size suggest ways to significantly improve outcomes for students even without tutoring—as does the Carnegie Mellon study. Von Hippel points out that the students who got tutoring also got a lot more testing. At the end of each unit, they took a quiz. And if they scored below 80% (in one study) or 90% (in the other), they got feedback showing them what they got wrong and why. Then they took another quiz—and if they still didn’t meet the bar for success, they repeated the process. Von Hippel estimates that this extra testing and feedback explains about half of the two-sigma effect.
For anyone who is familiar with cognitive science and its application to education, this may not be surprising. There’s a phenomenon that used to be called the “testing effect” and—because of negative associations with testing—is now more commonly known as “retrieval practice.” Whatever you call it, the idea is that testing is a powerful way to reinforce knowledge. The more you practice retrieving an item of information from long-term memory, the more likely you are to be able to retrieve that item when you need it.
Prompt, targeted feedback—not just an “X” next to a wrong answer—is another effective teaching strategy backed by cognitive science. The students in the tutoring group clearly benefited from both of these practices.
In addition, Von Hippel notes, the tutors got training that instructors for the other groups in the study didn’t receive. That training guided them to do things like summarize frequently, take a step-by-step approach to instruction, provide sufficient examples to illustrate each new concept, and encourage students’ active participation by asking questions that called for extended responses or alternative answers. All of these are teaching strategies backed by cognitive science.
Similarly, the “favorable conditions” in the Carnegie Mellon study align with findings from cognitive science. The computer “tutors” administered frequent tests, provided immediate feedback on errors, gave further instruction or provided an exemplar response as needed, and gave learners repeated opportunities to master various tasks.
Why Computers Probably Aren’t the Answer
So could computer tutors at last enable us to achieve the “two-sigma” effect at scale and solve Bloom’s “problem”? Do learners who start out behind just need more technologically enhanced tutoring to catch up?
Probably not, unfortunately. As the Carnegie Mellon researchers point out, learners have to be willing to engage in the kind of repeated practice that formed the basis of the study. It might be possible to encourage kids to do that for a while by adding enough bells and whistles and games. But without the human element, that’s unlikely to be enough. Students are more likely to work hard to please a teacher than to please a computer.
Besides, if each student is working individually at a computer, a crucial element of learning gets lost: interaction with other students. The same thing is true of one-to-one tutoring. At least there’s interaction with a human tutor, but ideally students will also bounce ideas off their peers. Part of the learning experience is acquiring the ability to engage in dialogue with others about common content—and disagree with them in a civil manner.
Beyond that, the Carnegie Mellon study looked at the acquisition of information in a few fields: math, science, and learning a language. The researchers noted that the rate of learning varied more for language, perhaps because in that area memory capacity is more important—and that varies with the individual. That may well be true for other subjects as well, not to mention that some areas call for more subjective kinds of learning. Would the same results hold for, say, teaching literary or historical analysis, which isn’t just a matter of remembering discrete bits of information?
It may be that you need a human being for that. And indeed, Bloom wrote in his essay that in the tutoring studies, the two-sigma effect held not just for retaining information but also for “higher mental processes” like analysis.
It should be noted that the tutoring most students receive—the kind that many have pinned their hopes on to compensate for pandemic-related learning loss—probably bears little or no resemblance to the kind involved in either Bloom’s or the Carnegie Mellon studies. Few tutors, and few computer programs, have been trained in strategies backed by cognitive science. Perhaps they could be. But maybe it would make even more sense to train regular classroom teachers in those strategies.
Why Not Train Classroom Teachers in Cognitive Science?
Bloom’s studies provide a basis for believing that kind of training could go a long way, even without tutoring. There were three groups in his studies. In addition to the tutoring group, there was a control group that got “conventional” classroom instruction. A third group got “mastery learning,” which provided the same kind of tests and corrective feedback that the tutoring group got. The effect size for the mastery learning group, as compared to the control group, was about one sigma—half as large as the effect for the tutoring group, but still way more than what most education interventions achieve.
But wait, there’s more: the classroom teachers in the mastery group, according to Von Hippel, didn’t get the additional training in science-backed pedagogical techniques that the tutors got. What if they had? How much greater would the effect size for the mastery group have been?
Unfortunately, most teacher training falls far short of what Bloom’s tutors got. Prospective teachers rarely learn about instructional strategies grounded in cognitive science. In fact, much of what they learn contradicts evidence from cognitive science—and the instructional materials they’re expected to use often do as well.
Rather than stressing the importance of enabling students to retain information, instructors at schools of education are likely to dismiss information as something kids can “just Google on their phones.” Especially in the area of literacy, instructional materials generally give short shrift to content and focus on supposed abstract skills like “making inferences.” Where there is content, far too little attention is paid to the need for clear, explicit instruction in manageable chunks of it, followed by plenty of prompt feedback and guided practice.
If we were to provide classroom teachers with materials and training that align with cognitive science, we might not achieve the two-sigma effect or enable all learners to proceed at the same pace. But, as evidence from the few schools that have tried that approach indicates, we would almost certainly enable many struggling students to achieve at levels far above what they’re able to reach in our current system.
"Would the same results hold for, say, teaching literary or historical analysis, which isn’t just a matter of remembering discrete bits of information? It may be that you need a human being for that." Yes. You absolutely need a human being for that. 😳
I appreciate the ideas in this post and would like for schools with low ratings to give these ideas a try. Often, I suspect that our teaching units cover too much material. Maybe less would be more in the long run. I particularly join those who advocate for frequent testing followed by rapid feedback and correct answers. In addition, I support the idea of teachers talking less and asking more questions that require extended answers. I hope to ask questions that require thinking.
I fear that once state exams became over-valued as predictors of mastery, schools across the nation began teaching kids ways to identify the correct multiple-choice answers. Teaching children to analyze the exam has replaced teaching them to think and master information.
To summarize, I support the post's suggestions to 1) teach smaller chunks of information, 2) test frequently, 3) quickly provide feedback along with correct answers, and 4) provide examples to clarify the content of instruction.