Critiques of Instructional Technology
Free-Lance Journalist/Author (Random House)
- 1 Copyright
- 2 Preface
- 3 Chapter 9
- 4 About the Author
- 5 Citation
- 6 Copyright
CopyrightAttribution-NonCommercial-No Derivative Works 3.0 This chapter is licensed under a different Creative Commons (CC) license than the remainder of the Foundations of Instructional Technology e-book. Please disregard the CC license at the very bottom of this chapter that has been embedded into the template and MediaWiki does not allow us to remove it for a single page. The content of this chapter is available under Attribution-NonCommercial-No Derivative Works 3.0. Again, the CC license at the very bottom of this webpage is not the one under which this chapter is licensed.
The chapter is a reprint of "The Research Game: Faith and Testing in Las Vegas" from his recent book The Flickering Mind. This chapter appears as Chapter Nine in the book. Hillary Leigh, a doctoral student in the Instructional Technology program at Wayne State University provides the following summary of the eight chapters that precede this chapter and the material that follows this chapter.
Chapter one begins with a brief review of the past promise and subsequent failure of technologies such as film, radio, and television. This historical account introduces the American education system's infatuation with the promise of technology in general, and the computer in particular. In his discussion of the rise of the personal computer, the development of a "digital divide" between the rich and the poor, and the political factors involved in effectively integrating computers into schools, Oppenheimer suggests a cycle of expectation, adoption, failure, and innovation. Chapter two provides an illustration of the harsh reality of the digital divide. Oppenheimer explores the use of computers within schools located in Harlem - this demonstration reveals that computers in schools may both resolve problems and simultaneously create others, such as an increased need for technical support (including access to the internet), out-of-control classrooms, and questionable types and quality of interaction between students, teachers, and the software. In chapter three, the discussion turns to students in a rural West Virginia high school and examines the ways in which computers transform information-gathering and the trend toward distance education (noting a few benefits but relying on research, emphasizes that distance education courses appear to be successful in very narrowly defined circumstances.
Oppenheimer departs from the theme established to this point in chapter four, and examines how computers function in a school and district with plentiful financial resources and discovering similar issues with bureaucracy (e.g. network downtime, maintenance requests) and student unruliness, conjectures that students are simply not psychologically ready to use such sophisticated tools arguing instead for simplification and a "back to basics" approach. Chapter five takes educational reform one step further, questioning whether whole-scale reform of the education system via technology is possible and worthwhile. This question is answered, in part, by an analysis of New Tech High in Napa, California that illustrates common problems despite a high level of technological integration, financial resources, and administrative support: cursory knowledge of subject matter and issues with the difficulties of learning (and teaching) in a project-based environment. Chapter six closes the section related to the false promise of technology by arguing that one of the fundamental assumptions underlying the emphasis on technology in education is that there are (and will be) significant need for technology-based professionals, yet these hopeful predictions rarely come to pass.
In chapter seven Oppenheimer begins his examination of the hidden troubles of technology in education; namely, the purported “bulldozing” of the imagination through the elimination of machine shop and art classes; however, Oppenheimer tempers his criticism of the latter by pointing out that those who talk about the value of art education are essentially making the same argument as proponents of technology. Chapter eight provides harsh criticism of problems inherent in industry partnerships; emphasizing notable corruption related to telecommunications contracting and the for-profit Edison project, leading Oppenheimer to propose a critical appraisal of the evidence supporting the relationship between technology and achievement.
After the chapter published here - an expose of one large but problematic courseware maker, which illustrates weaknesses in education research - Oppenheimer concludes the book with a critique of teacher training, then a section on "smarter paths." This final section of the book comprises three case studies of innovative, successful schools; some use technology (but only moderately), some avoid it almost entirely.
To view the companion website for The Flickering Mind: Saving Education from the False Promise of Technology, visit http://www.flickeringmind.net.
During the woeful debates that have come to characterize Americans’ relationship with their schools, one of the most consistent subjects of hand-wringing is our failure to properly teach children how to read. Impassioned promises to break the nation out of this pattern are much of what helped President George W. Bush win the White House, and his initiatives on reading soon led his domestic agenda. Chastened by the schools’ history of failure with loose education fads, Bush coupled his efforts with some academic tough love: Before schools adopted new programs in reading instruction, or any other curricular domain, the programs had to be scientifically proven to be worthwhile. The president was so adamant on this point that his signature education package, the "No Child Left Behind" Act, repeated the words scientific or scientifically 115 times, and the word research 245 times .
As auspicious as the president’s priorities seem, they have created a new dilemma: How are schools, and the public at large, supposed to evaluate the complicated claims of scientific research? Hundreds upon hundreds of studies of one kind or another have been conducted over the years to see what effect, if any, technology has on student achievement . Experts in the private sector and the academic world have built entire careers around this question; some have even sought hyper-objective relationships with the research, by devoting their energy to studies of the studies–a rarefied family of science called meta-analysis. Taken together, the body of research is wildly varied, and seemingly inconclusive. As an example, one of the largest and most frequently mentioned reports–a 1991 meta-analysis of 254 studies by James Kulik–was cited by the Clinton administration when it launched the first large-scale federal campaign to get computers into schools in December 1994. Kulik reported that this survey—one of many he has done before and has conducted since—showed that computers helped students learn 30 percent faster than they do when receiving traditional instruction. Kulik’s studies haven’t held up to scrutiny particularly well. One such examination of his surveys was conducted in 2000 by Jeffrey T. Fouts, a professor of education at Seattle Pacific University and the lead evaluator for the Bill and Melinda Gates Foundation. Fouts found that Kulik’s surveys suffer from a bothersome meta-analytic habit: they continually recycle the same old pool of research, and simply add new material as it comes in. In Kulik’s case, the ongoing drift in his conclusions has been further dirtied by the fact that most of the studies in his original pool lacked standard scientific controls . In the face of complications like these, promoters of the dominant competing points of view–those who think technology works in school and those who think it’s a failure–have had every opportunity to shop for facts that bolster their side.
To most people involved in education today, disputes of this sort have a perfect solution: standardized test scores. These simple statistics are wonderfully understandable, ostensibly objective, and easily tallied. They would seem to be the ideal tool to cut through polarized debates about "why Johnnie can’t read," ending decades of costly educational missteps. Bush, too, has heartily embraced standardized tests, thereby accelerating their popularity. As it turns out, this three-pronged drive–in reading, in research, and in standardized tests–has introduced the schools to a whole new set of vagaries, some of which are far more consequential than those of the past. From all indications, the education world is not terribly well prepared for the challenge.
One of the best ways to look at the trouble caused by these developments is to begin at the ground level, with the teachers and their attitudes toward the task of teaching youngsters to read. One unusually intense illustration of the teacher’s world view played out for three spring days in Nevada, at Las Vegas’ MGM Grand Casino and Conference Center. The occasion was the second annual national conference of Renaissance Learning, Inc., a Wisconsin-based firm that, in the early years of the 21st century, was one of the largest and most profitable publicly traded companies among those devoted solely to educational software. It was also far and away the number one manufacturer and seller of software aimed at improving reading skills. At the time of the Vegas conference, Renaissance’s customers numbered 55,000 schools, from nursery school through high school. Roughly 50,000 of those were public schools, more than half the nation’s total. By the company’s estimate, this meant that, since its founding in 1986, Renaissance Learning had shaped 300,000 teachers and 20 million students with its products. To achieve that level of success, Renaissance Learning has built a large, aggressive, multi-faceted organization based in large part on prodigious amounts of research, which, according to the company, definitively prove that its products powerfully stimulate learning. The history of this company therefore offers an unusually radiant portrait of the way the fields of research, software manufacturing, and education play together. Some of those relationships, it turns out, are rather questionable – and quite damaging to both students and their schools.
Renaissance Learning’s product line comprises a handful of remarkably simple software packages designed to help a teacher test and track students’ progress in math and, most of all, in reading – a scholastic domain that has stubbornly refused to show much progress over the years. Not surprisingly, with the level of market penetration that Renaissance Learning has achieved, the publishing world has begun to take notice. Virtually all the big firms that supply classrooms with their texts – Harcourt Brace, Houghton Mifflin, and Macmillan/McGraw-Hill, among others – now send schools brochures and special catalogues promoting books that are tied to the Renaissance program. Some even bundle the program free of charge with textbooks and other materials that schools buy.
Somewhat coincidentally, Renaissance Learning’s work is now center stage in the country’s political discussions – at least when the public’s attention has been on education. This is because the company’s emphasis on reading and on a systemized assessment of reading progress dovetail nicely with the latest trends out of Washington. It has not hurt that education itself had, until the 2001 terrorist attacks, become the nation’s hottest sociological obsession. Chief among the people thusly obsessed is President George W. Bush, who sees standardized measures of achievement in reading, and other subjects, as the key to his education plan’s ultimate goal: school accountability. None of this has been lost on the computer industry. By 2002, software makers were happily producing new programs and supplementing old ones that could keep teachers fed with evaluation data all day long.
By 8:15 a.m. on the Renaissance conference’s opening day, the MGM Grand Arena, which has been a venue for everything from Rolling Stones concerts to heavyweight title boxing matches and professional bull-riding contests, was packed with close to 6,000 teachers and school administrators. Vegas, of course, is the town built on synthetic themes, and the MGM Grand is abundantly in step. Once the largest hotel in the world, the MGM Grand’s theme is "the city of entertainment." It offers, under one air-conditioned roof, a 172,000-square-foot casino, 15 theme restaurants, and more than 5,000 hotel rooms. (Running throughout the complex’s 115 acres is a special, supplementary theme of grandness. Among other sites, there’s "The Grand Pool Complex," a Japanese restaurant called the "Grand Wok," and "The Forever Grand Wedding Chapel." Phone conversations with hotel staffers typically close with MGM’s customized good wishes: "Have a grand day.")
Renaissance Learning designed its conference to fit right in. As the audience assembled, a pop choir from a local performing arts high school warmed them up with a classic Vegas song-and-dance show to the tune of Carly Simon’s "Let the River Run." Renaissance directors had chosen this number because of one line, which it turned into the conference slogan –"Dreamers, Wake the Nation" – and which was projected, along with the show, onto four oversized screens at the front of the arena. Before a word had been spoken about teaching or software, swarms of educators were on their feet shouting and clapping and ready to go.
A few minutes later, as the performers waved their good-byes, Loy Ball, a senior Renaissance Learning consultant from Tennessee, who served as conference emcee, took to the podium. Smiling and shaking his head at the auspicious beginning, he reminded the audience of the previous year’s successful first annual conference, titled "Take the Next Step," which was held in the capital of his home state. "Many of you left Nashville ready to take that next step," Ball said. Now, "We hope you are ready to wake the nation to a Renaissance education." This was met with whoops and widespread applause. "We believe," Ball told the crowd, "that with technology, we have found the future of education." With these products, "I saw kids succeed who had never had success before." (More whoops.) In the coming days, he said, "You will see why Reading Renaissance is the most effective comprehensive school improvement model in the country." Ball closed by promising them "the best three days of professional development I think you’re ever going to experience in 2001. And I say 2001 because we’re going to do it again next year, and we’re going to get even better."
Ball then introduced the opening keynote speaker, Christopher Paul Curtis, a former auto plant worker turned successful children’s book author, who delighted the crowd with tales of his hard-luck past and hilarious readings from one of his homespun stories. Most of the teachers in attendance were quite familiar with Curtis’s work. His books are among the nearly 50,000 that were treated in Renaissance Learning’s database, which is the raw ore for "Accelerated Reader," the company’s flagship product.
Accelerated Reader is the company’s oldest and most successful piece of software. Ball calls it "the most successful and widely used educational program of all time." It’s also the company’s simplest. The program is built around computer disks filled with short quizzes about books, with one quiz keyed to each book. The books treated on these disks are those the company deems educational and within the normal range of abilities for a particular grade, kindergarten through high school. When students pass a book’s quiz, they receive a certain number of points. Passing these tests is purposely made relatively easy (it can often be done with as little as 60 percent of the answers right), to put a sense of achievement within reach of those who’ve long been treated as average, or even poor, performers. Once students have accumulated sufficient points, they gain the right to some prizes, which vary at each school. Some schools have offered candy and toys, fancy treats like color TVs and, in one case, even a car. (This happened in Dallardsville, Texas, where the principal of Big Sandy High School bought a used truck to given away in a drawing. The more Accelerated Reader points the students earned, the more ballots they could fill out for the drawing.) Some schools just stick with the academic, preferring to give out pencils, books, or psychic honors, such as recognition at school assemblies or lunch with the principal.
One of the beauties of the program is that it doesn’t require much technology. In a pinch, a single classroom computer could handle the job. The computer doesn’t even have to be terribly up to date. (In schools that are low-tech, teachers from different rooms have even shared a computer.) As students finish reading their books, they simply pull up a chair for what’s usually a five- to ten-minute, multiple-choice test. By focusing on work done away from the computer, Accelerated Reader actually stands apart from the typical educational software product. In fact, Terry Paul, chairman and co-founder of Renaissance Learning, counts himself as something of an iconoclast about educational technology. "The research has proven, with a few exceptions, that computers are not a great teacher," he told me during a conference luncheon. Their real value, Paul believes, is in helping teachers manage their growing piles of student performance data. The challenge of handling that task is fast becoming educational technology’s new frontier; one firm predicted that spending on school technology would rise dramatically to a record $15 billion in 2002, as administrators rushed to meet the new president’s expectations on school accountability . Terry Paul is well-positioned for this shift. Assessment, classroom management, and record keeping are what the Renaissance system is about.
In other respects, Renaissance Learning is very much like traditional educational software of yore. It piggy-backs on tried and true approaches to learning (in this case, basic book reading). It offers itself as a conveniently automated, comprehensive system – an attractive prospect for any school. And, like its competitors, it requires a commitment – not only of faith and time but also of money. A bare-bones approach to the company’s reading program could be had at the time of the conference for as little as $499 (this buys four disks, with each disk holding quizzes on up to 50 books). Bigger packages could be had, of 20 disks, for $1,499. For $2,999, a school could buy a "Super Kit," comprising 20 disks and some reading-evaluation software. But the company doesn’t consider this product-only approach terribly effective. Its trainers usually suggest that schools invest in the company’s comprehensive "Renaissance" program. This includes the full Renaissance product line, along with training and consulting. For a small school of, say, 600 students, the price for that is approximately $135,000 over two to three years. For an entire district of roughly 15,000 students, it’s $3 million.
The point, of course, is what a school gets in return. As with most educational software, the power of Renaissance Learning’s products derives from the many other things the computer manages to leverage. For poor and generally failing schools, those other things quickly add up, in the company’s view, to a new culture of success. The most dramatic portrait of this kind of turn-around was delivered, as the conference climax, by Don Peek, then executive vice president of the School Renaissance Institute, the company’s private think tank. Days later, he was named the institute’s president, a promotion that would come as no surprise to the teachers and principals who laughed and cried over and over as they listened to his tale.
Nothing Short of a Miracle
The crowd’s heart-strings were vibrating before Peek even began. He was introduced by Judi Paul, Terry Paul’s wife and Renaissance Learning’s co-founder, who spoke in the MGM Grand Arena via video, in yet another oversized broadcast on the arena’s gigantic projection screens. Judi couldn’t be there in person, she explained, because her daughter had just had triplets (shown on screen), and she was needed at home. Nonetheless, Judi told the crowd of educators, "I want to personally thank each of you, because you [she then pointed firmly at the camera] are the heroes." Judi proceeded to relay the story, well-known to any Renaissance regular, of the Pauls’ son, Alex. "I want to take a few moments to talk about dreams," she said. As the story goes, Alex had shown so little interest in reading as a child that Judi was moved to come up with a game to motivate him. On their kitchen table, and later in their basement, she devised a paper-and-pencil version of what was to become Accelerated Reader, and Alex soon became a real reader. Today, he’s a graduate of law school, and was clerking at the time of the conference for a state Supreme Court judge.
After Judi’s warm-up, Peek walked on stage and took his position front and center. No podium. No video projections. It was just Peek and a microphone. A compact, middle-aged man with a wide smile, Peek stood there with his arms hung away from his sides, like a wrestler ready for a match. Which in some ways he was. He had come, he said, to tell what he called "the Pittsburg story." That’s Pittsburg as in Pittsburg, Texas. Peek is Texan through and through. Curiously, a good number of Renaissance presenters are, and after a few days at a Renaissance conference one begins to see why. A knack for wrapping a pitch in a good story seems to run in Texan blood; its disarming charm is much of what once put Ross Perot within striking distance of the White House. Peek is so good at it that he can hit the same folksy marks every time he tells this story, which he has done hundreds of times. In fact, some of the exact quotes in this retelling come from the video that the Renaissance company sells of Peek’s first big speech, packaged with a companion pitch from a Texas colleague, delivered during the company’s inaugural national conference the year before.
The Pittsburg story began, Peek said, 28 years earlier, when he was a young teacher at Pittsburg Middle School – a place deep in Northeast Texas that Peek describes as being "dirt poor," where the only industry in the area is chickens. "We have millions and millions of chickens." Economic and academic difficulties are so severe in Pittsburg that during Peek’s time there, the federal government gave 50 percent of the students lunch for free or at a reduced cost, and 60 percent received additional financial help under the federal Title One program. Having grown up in Pittsburg, Peek had attended this middle school as a youngster, and was filled with anticipation at the opportunity to return as its teacher. "Ahm ’on tell you something," he said. "That first day of school, they wheeled in those textbooks – ratty ole pink-lookin’ textbook that looked vaguely familiar to me. You guessed it. Our school board in their infinite wisdom had turned down the last two adoptions. I was going to teach world geography from the same textbook from which I had been taught. One child had a special treat. He had my book."
About six weeks later, Peek made a discovery. "All those wonderful discussions I wanted to have, I couldn’t have. Because those kids couldn’t read." Many, he said, were reading two to three grade levels below their age; some were four to five years behind. "By mid-term, this young, naïve teacher was down at the principal’s office knockin’ on that door," asking the principal to make him the Title One reading teacher the following year. "Folks, you may find this hard to believe, but there was not a long line behind me for that job."
After working for seven years to build reading skills at the middle school, Peek was named assistant principal, then high school counselor, whereupon he started noticing that the kids coming down to his office for being in trouble – or because they wanted to drop out – were the same ones he’d seen with reading problems in middle school. Eventually appointed middle school principal, Peek decided to find out why teachers were having such difficulty teaching kids to read. To reassure the crowd that he wasn’t pointing fingers at elementary school teachers’ failures, he reminded them that his wife taught first grade at the time. "You start pointing fingers about first-grade reading problems in my house, that’s a quick way to go to bed without your supper." What it’s really about, he said, is "taking kids where they are, and movin ’em forward as fast as they can go."
Despite Peek’s efforts, the middle school chronically struggled with state tests. In 1991, less than half the school passed the reading exam on the state’s notorious standardized test, the Texas Assessment of Academic Skills (commonly known as TAAS); even fewer (43 percent) passed in math. Shaken, Peek called a faculty meeting. The core problem, everyone agreed, was weakness in reading; it even affected math scores. In the arithmetic sections of the test, the math teacher pointed out, "every problem was a stated problem" – that is, framed verbally. If you can’t read, you can’t do the math. This dilemma feeds into one of Renaissance Learning’s operating maxims – that reading controls everything. Terry Paul has gone so far as to calculate that "70 to 80 percent of academic performance is predicted by reading ability." In less statistical terms, President George W. Bush makes a similar point when he makes such a priority of reading instruction.
"That day," Peek told the audience, "we decided that reading was the largest problem we had. And our focus, our money, our time and attention was going to go on reading." Peek essentially declared war. He ordered his faculty to go to any workshop on reading they could find, to bring him any book or magazine article on the subject, to visit any school that might offer lessons. His own explorations brought him to a 1983 study, by John Goodlad of the University of Washington, that surveyed schools to measure what portion of the school day a typical public school student devotes to reading. By Peek’s recollection, the statistics were as follows: six percent in elementary school, three percent in junior high, and two percent in high school. For middle school, Peek noted, "that translates to eight minutes a day." Like almost everyone in the audience, Peek couldn’t believe these figures. So he embarked on an observation at his own school, and found the count remarkably accurate.
There’s a simple point to these findings, Peek explained: "You cannot talk the skill of reading into a student. You cannot talk the skill of anything into anyone." Any decent educator realizes, he said, that "we must teach them short bursts of skills and then we must do what? Practice. Practice. And more practice." To underline the point, Peek asked the audience to compare those seven minutes of reading to the two hours students typically spend at athletic practice, which can be even worse during football season. "I don’t know what it’s like where you come from," he said, "but in Texas, if you lose last Friday night’s game, that two hours can stretch to five in a heartbeat."
Eventually, Peek said, he came across a magazine ad for a program called Accelerated Reader, which promised to dramatically raise school reading scores. All the program asked was that schools set aside a chunk of time each day for students to read freely in one of the books on the company’s list. Skeptical as he was about an ad, he noticed that the program was relatively inexpensive, so he ordered it. Right away, he realized it offered new methods of "accountability," and therein lay "a true gold mine." Peek immediately telephoned the company and asked for the president. ("I am not a slow mover," Peek said. "I’m a whole hog or none man, myself.") After crying poorhouse for a while, and bemoaning the sorry state of his school’s test scores, Peek asked Terry Paul to make Pittsburg Middle School one of the company’s first test sites. If Paul would throw in some extra software, Peek would provide him with detailed information on student performance. Terry Paul lives on data, so they had a deal.
There was a catch, however. Peek’s teachers had to faithfully follow the Renaissance routine. But they apparently did so. They carved out 60 minutes each day for students to silently practice independent reading (this is the company’s mantra). They fit students to books that suited their abilities, had skilled students tutor those who were struggling, and persuaded local businesses to donate a small collection of prizes. Library circulation quickly doubled. Four months later, Peek said, his students posted a full year’s growth in reading skills, as measured by a Stanford University diagnostic test. By year’s end, the passing rate on the TAAS reading exam had risen from 49 percent to 65 percent. This brought in $49,000 – the state’s bonus to the school for a job well done. The faculty elected to invest the money in staff training, but there was a decent chunk left over for some real prizes, which Peek had five students round up in a marathon shopping spree at a Wal-Mart superstore. When they returned to campus, it became clear to all that a new day had arrived. "Folks, I don’t know what Wal-Mart shopping bags do to kids in your part of the country," Peek said, "but it can whup Northeast Texas kids into a frenzy in a heartbeat." To make his new priority crystal clear, Peek cleaned out the school trophy showcase ("There were only third and fourth place trophies in there, anyway") and filled it with Wal-Mart booty. One morning, he saw a little boy talking to the librarian about a basketball in the cabinet. "Please don’t sell that basketball," the student apparently pleaded, "’cuz I’m readin’ as fast as I can."
The next year, Peek said, students’ scores on the Stanford test jumped even more dramatically, rising 2.23 grade levels. Peek asked the audience members how they’d feel if their students’ skills rose more than four grade levels in two years. "Would that make a difference in your job? Would that make a difference in those kids’ lives? You better know it would." Peek knew, however, that Texans don’t care about tests from Stanford. So when the district superintendent called to tell him his school’s TAAS scores were in, Peek told him to stop right there; he hopped in his truck, rushed across town, and opened the box "with trembling hands." The results: 90 percent of his students had passed their reading exams. "I guarantee you, we were pumped." And things only got better. In 2001, sophomores at Pittsburg High School, who were also using the Renaissance program by then, posted 98 percent passing rates in reading and 100 percent in math.
News of the school’s success spread, bringing visitors, Peek said, from 150 different Texas schools. "Folks," he noted, "you don’t come to Pittsburg, Texas, by accident." What pleased him most, however, was what the new program seemed to do for the school’s most poverty-stricken students. After starting with what he calculated to be a 35 percent performance gap between the advantaged and the disadvantaged, by 1999, he said, that figure had shrunk to less than eight percent. To Peek, the message is quite plain, and it’s one that has echoed in the speeches of President George W. Bush. "It doesn’t matter what your skin color is in Pittsburg, Texas," Peek said. "You’re going to read, and you’re going to learn. And we expect it from every one of our children."
Peek closed by telling the story of his own son, Nick, a seventh grader at the time who Peek said was a perfectly capable reader but just didn’t like books. Intrigued with the prizes, the boy started reading some easy books, enjoyed them, and continued checking out more – at least during the school year. After Nick’s graduation, Peek noticed over the summer that his son had gone to Wal-Mart to buy six or seven books. "There were no points. There were no prizes. There was no Accelerated Reader. He was reading out of pure enjoyment." Peek calls the whole experience – with his son, with his school, and with the rest of the Pittsburg district, which soon adopted the Renaissance program – "nothing short of a miracle." The audience seemed to agree, judging by their long and heavy applause.
While most schools’ stories aren’t as dramatic as Peek’s, similar tales, with much the same tone of sin and salvation, abounded in Vegas. Numerous teachers had reports that the program had doubled their schools’ library circulation. One, from the Peggy Heller Elementary School, in Merced, California, told me that when reading hour is over in her class, students now beg for more time, and get frustrated when the computers fail and they can’t take their reading tests. Another teacher, from Mesa, Arizona, said the software lets her track progress individually, something "that’s almost impossible with this many kids" but that "most teachers have tried to do on their own for years."
Dozens of additional testimonials are sprinkled throughout Renaissance marketing material. A rural Kentucky school reportedly rose from the bottom of the state’s barrel in 1994, when only 5 percent of its students met state reading standards, to near the top, with 70 percent now passing. In a fifth-grade class in Georgia, reading abilities apparently rose 2.2 grade levels in seven months, and test scores jumped 30 percent. At a school in Oxnard, California, the library collection was said to grow from 2,000 to 10,000 books in two years, and students reportedly have been asking to give up recess to do their Renaissance work. The school (Our Lady of Guadalupe) now has a waiting list, the leaflet says, "for the first time in many years." To round out the picture, quotes from satisfied customers are highlighted in company catalogues. "When I introduced Accelerated Reader, it was like magic!" says an elementary school teacher from Niagara Falls, New York. "Students had their noses in books everywhere I turned." In another catalogue, the superintendent in McKinney, Texas, says, "No other school-improvement program or process provides me with the ability to ensure district-wide accountability and improvement." A principal in Memphis, Tennessee, adds, "In 33 years as an educator, I have never encountered a program that could transform a school the way Renaissance has transformed ours."
One would think accounts like these would provide more than enough proof that a company’s programs improve student achievement. But they don’t for Terry Paul.
Paul is constantly searching for more proof, more data, more information. He’s so obsessed with these issues that at the time of the conference he was writing a book on the application of information theory in the schools (specifically, how technology can generate "information feedback loops" to the teacher). "You’ve got to be able to prove this stuff works!" Paul told me at one point. In some ways, Paul believes he already has, in spades. "I’ve got more data on reading behavior and math behavior than anybody in the world," he said. That’s a big statement, but anyone perusing Renaissance Learning’s literature might be inclined to believe him. One of Paul’s proudest documents is a 60-page booklet entitled "Research Summary." Put out by the company’s School Renaissance Institute, it consists of a series of brief write-ups on approximately 75 different studies – "field reports" from different schools around the country, "white papers" from the institute itself, and evaluations done by outside experts.
The studies are full of charts and numbers and dozens of the esoteric terms that spew from the field of scientific research. There are discussions of statistical significance, statistical correlation, sample size, "pre-test" and "post-test" data, standard deviations, multiple linear regression, virtually everything that plagues a student’s sleep when she’s taking a college statistics course. There are also signs of a curious phenomenon found throughout the field of education research – and many fields of scientific research as well. It’s a kind of intellectual disconnect that seems to have become part of the research game. Researchers will frequently tell you, with a kind of accepting calm, that most education research is horribly flawed – full of limitations, biases, or odd influential factors that researchers did not see, or did not acknowledge. In the next breath, most of these same researchers will tell you why their research is solid. Terry Paul is no exception.
"The Accelerated Reader is a reading researcher’s dream," says one of his studies. "For the first time, all the major elements required to measure reading practice (quantity, level, and score) have been reduced to the simple statistic of reading points." Billed by Renaissance in 1994 as "the largest study ever of literature-based reading," the study (called the 1992 National Reading Study and Theory of Reading Practice) looked at 4,498 students from 64 schools across the U.S. It claimed "statistically significant correlations" proving that this program "can more than double the growth in students’ reading ability."  An update and expansion of the study a year later offered additional evidence of what Paul called the "Reading Fallout Theory," his notion that new reading skills lead to new skills in other subjects – in this case, mathematics. The study produced a number that Paul often invokes: 68 percent of math gains, the study found, can be traced to increased reading skills.  The company says that by 1996, it had distributed more than 400,000 copies of these studies to educators across the U.S.
Later studies piled on even more research procedures, and arrived at equally noteworthy conclusions. One of Paul’s favorites is a massive project – one he now calls "the largest study ever done on whether technology makes a difference in schools." Delivered at a 1996 National Reading Research Center Conference in Atlanta, Georgia, the study, conducted by the Renaissance Institute, was called "The Impact of Accelerated Reader on Overall Academic Achievement and School Attendance." This study looked at 6,149 schools in Don Peek’s home state of Texas, chosen partly because Accelerated Reader (or AR) is so widely used there (at the time, the study said, more than 40 percent of Texas schools had bought the program). To draw a comparison, the study chose two kinds of schools – those that had bought AR (2,500 fell into that camp), and those that hadn’t (about 3,500 fit this category). The study found "statistically significant evidence that schools which owned AR performed better than non-AR schools on virtually all subject tests, including reading, math, science, and social studies." 
The company was so pleased with the results in its Texas study that, in its marketing materials, it went on to say that "schools that purchased Accelerated Reader show students improved" in a number of important new ways. Not only did students do well in traditional, rote standardized tests, but they also improved in what’s become the latest trend in state examination. These tests aim to judge such things as "critical thinking" and creative skills, partly by giving students "performance-based" tests – generally a series of essay questions or multi-faceted tasks. Challenges like these, the theory goes, students can execute only by invoking both factual knowledge and analytical savvy.
Interestingly, although Paul has co-authored or helped design many of the large-scale studies of his company’s products, he has no special training in statistical research. His degrees are in economics, business, and law. "I’m just a numbers guy," he told me one afternoon, as we walked toward a conference session he was hosting in Las Vegas, entitled, not surprisingly, Research Symposium. "I’m just a quantitative person," he said. "Either you’re a ’quant’ or you’re not." Paul’s session did feel like a researcher’s dream. Half a dozen specialists, most with university affiliations, delivered presentations for Paul that were stuffed with scientific analysis. Most were accompanied by a barrage of quantitative slides full of statistics, charts, and carefully worded but noticeably bold claims about the power of Renaissance’s products.
Those claims have helped the company seek a respected position within education reform movement. During one conversation in Vegas, Stuart Udell, then president of the School Renaissance Institute, told me that the company considers its record to be superior to the dozen or so school reform models that have become national names. Among these are the $500-million "New American Schools" movement, launched by publishing magnate Walter Annenberg; the oddly scripted but seemingly effective Success for All reading program, designed by Robert Slavin of Johns Hopkins University; and the highly traditional "Core Knowledge" schools, founded on the back-to-basics principles of the University of Virginia’s E.D. Hirsch.
Schools across the country have happily contributed to this image by dramatically reorganizing some or all of their academic routine around Renaissance concepts. To encourage this, the company has set up two prizes for its adult customers. One is essentially a free consulting service, which helps schools interested in Renaissance apply for grants that will fund the program – an onerous process that many schools can’t manage on their own. The other is a system to reward teachers, schools, and entire districts with special certifications for having achieved "model" status. That accomplishment (which brings schools numerous freebies, a Renaissance press release, and showers of recognition at the national conference) is so coveted that teachers in Vegas were on the edge of their chairs when it was time to announce new model inductees. Some became teary eyed when they weren’t chosen. Education organizations, children’s’ software reviews, and business publications have added to all of this buzz, treating the company over the years to more than a half dozen different product awards and commendations.
The Emperor’s Clothes
There is another way to look at these accomplishments. When one begins rummaging around in Renaissance Learning’s closet, it becomes clear that the company’s clothing is not as carefully woven as it looks out in the bright lights. Some of its fabrics start to unravel rather badly, in fact, once they get tangled up in the ongoing, national story about literacy. At a certain point, tears open around the gains Renaissance schools seem to make on test scores. Other holes lead to questions about the company itself, and how it functions in the marketplace. The trouble begins, however, with the basics – the nature and quality of Renaissance Learning’s scientific research.
In the late 1990s, in yet another wave of public panic about student competence, building up the reading abilities of America’s youngsters became the ultimate political priority. By the time George W. Bush was running for president in 2000, and soon after in the early days of his administration, concerns about student literacy had risen to the top of Bush’s education agenda. That April, a report was released by a group called the National Reading Panel, a collection of experts whose study of literacy had been formally ordered by the U.S. Congress .
Unfortunately at the time, the nation (or, more precisely, the nation’s army of news editors) was a little distracted. The study came out at the peak of media frenzy about Cuban refugee Elian Gonzalez, a fact that tended to bury news reports on an issue like reading. But the literacy report had some after-life. Congress was sufficiently pleased with it that, in 2001, it gave $15 million to the National Institute for Literacy, just to publicize its findings, and $5 billion to the U.S. Department of Education to carry out its recommendations.
The study did its part to earn these rewards. Its mission, which was conducted by a 14-member panel (composed mostly of educators, along with one physicist and one parent) was to find the best, solidly proven way to stimulate reading abilities. To do this, the panelists spent two years combing through the scientific literature on reading – specifically, 100,000 studies published since 1966, and 15,000 others published previously. By early 2000, the panel had boiled these studies down to approximately 400 that met gold standards of scientific research – that is, done as controlled experiments, or close to it (a.k.a., "quasi-experimental"), with results published in a refereed journal.
With this exemplary pool, the panelists started looking for solid instructional possibilities. In the process, they were required to take one more step: to hold their meetings in public. This was partly to avoid the troubles that had beset other high-level, fix-it commissions that tried to go about their work quietly. The public vetting also lent the panel’s final recommendations some extra credibility, since, by then, it had heard from parents, other researchers, and an array of educators whose job it is to carry out academic ideals in the less than ideal real world.
When the panel finally issued its conclusions, it offered both old news and, for outfits like Renaissance Learning, some stunning revelations. In essence, the report said that while certain practices were more effective than others were, it was clear that no one method of teaching reading could really carry the day. The most effective approach of all, the panelists concluded, was a mixture of practices: exercises that help beginning readers distinguish sounds, and then associate letters and words with those sounds; time spent reading aloud to children, and having them read aloud as well; adult coaching and discussion; and, most important, a variety of activities that build comprehension and the capacity for literary analysis.
Buried in the middle of the panel’s report was a section innocently titled "Encouraging Students to Read More." The idea sounds elementary – a predictable garnish on a report that was aiming to be comprehensive. However, the findings in this section were not so easily digested. Over the years, the panel found 92 different studies that had looked at the effects of simply getting students to read more literature of their own choosing, what educators call "free reading." As with every other issue under the commission’s purview, the panelists first winnowed the pile to those studies that met scientific standards. The tally came to 14, two of which concerned Accelerated Reader.
The panel’s evaluation of these studies was not pretty. Basically, its report said, the studies were so handicapped by design flaws that it was impossible to tell what was going on. Students may well improve their reading scores while using AR, but it’s equally likely that something other than AR may be causing the rise. The possibilities include everything from new books to extra hours of study, an atmosphere of higher standards, increased involvement by parents, or nothing more than a re-energized staff of teachers. In fact, judging from other teachers stories in Pittsburgh, Texas, some of these other changes were the main causes of rising test scores in Don Peek’s school. "Even if you buy a bad program, if the staff is committed to it, it will work," says Joel Hodes, the principal of Schurz Elementary, on a Nevada Indian reservation and a participant at Renaissance’s Las Vegas gathering. The National Reading Panel members apparently had similar impressions. "For the most part," their report said, "these studies found no gains in reading due to encouraging students to read more. It is unclear whether this was the result of deficiencies in the instructional procedures themselves or to the weakness and limitations evident in the study designs."
In one example, the panelists looked at a 1994 Accelerated Reader study that examined two North Carolina schools, comparing 50 9th-graders in a school that used AR with 50 9th-graders in a school that didn’t.  The researchers tracked these students for five years; this gave their results some longitudinal flavor, a welcome icing on any researcher’s cake. They measured these students from third through eighth grade, according to their performances on the standardized California Achievement Test. But the national panelists discovered that, strangely, the Renaissance researchers calculated their numbers by subtracting scores on a third-grade test from scores on the ninth-grade test, an exam so much more advanced that it’s almost an entirely different test. A separate evaluation of this study, which the panel did not see, found further errors. According to this review, the two groups of students, at least in the ninth grade, had drastically different levels of access to books – five to six hours a week in the AR school, versus two to three hours in the "control" school.  Even without this insight, the Renaissance study was such a case of comparing apples to oranges that the national reviewers could do nothing but declare its statistics invalid.
The panel’s findings in the second AR study were even stranger. This study was conducted in 1999 by three researchers, one of whom was a Scotsman named Keith J. Topping, a paid Renaissance consultant and one of its most heavily promoted academics.  Of all the AR studies, this was the only one (at least in the peer-reviewed literature) that attempted to follow a basic rule of experimental research: You take one group that’s engaged in the activity you’re trying to study (in this case, AR) and compare it with another group that is similar and is in similar circumstances but that isn’t involved in the activity under review. In other words, while the first group is feverishly practicing with its exciting new toy, the students in the second, "control," group should not be catatonically sitting at their desks as a teacher drones on about the waning of the Middle Ages. To be fair, the control group should get a promising new toy too, just a different one.
This is what Topping did – sort of. According to Topping’s summary of his research, he looked at one Scottish class that was using AR, one class that wasn’t, and a third class that was using "an alternative intensive method." The problem is that the class that was using AR was also practicing some other reading instruction – coincidentally, some of the very same alternatives that the American reading panelists reported to be effective. "If you have AR combined with something that works, showing that it still works doesn’t tell me much," Tim Shanahan, professor and director of the Center for Literacy at the University of Illinois, in Chicago, and a member of the National Reading Panel, told me. "That’s like saying, if you’re sick, I’ll give you some penicillin and pray for you. And in five to 10 days you’ll be better. That doesn’t prove much about prayer." In sum, Shanahan said, "The two AR studies were just dismal. They couldn’t have possibly answered the questions being asked."
Not surprisingly, Terry Paul has not taken too kindly to the NRP report. "Every study that’s ever been done has significant flaws," he told me. "I could go through the accepted NRP studies and find flaws." Paul’s deeper complaint is that the NRP misunderstood AR, seeing it as a reading program when in his view it is more of a management tool, which teachers use to supervise reading. To Shanahan, that distinction is flimsy, especially since Renaissance so aggressively markets AR as a reading program. Either way, the NRP’s dour assessment of AR, and other programs like it, has not been fun for Paul to live with. "The NRP sits there like a big ugly frog," Paul said in an e-mail to me as recently as mid-2003.
But what about the basic premise here – the old adage that practice makes perfect? The idea seems so obvious that at the Las Vegas conference, Renaissance presenters constantly drew laughs and applause by citing instances where schools, and high-minded experts like those on the reading panel, seem to blindly ignore it. Can this maxim actually be wrong? Are we really supposed to believe that students’ reading ability won’t improve when they do more reading?
Shanahan, who sits on the board of the International Reading Association, chairs the reading committee for the U.S. Department of Education’s high-profile National Assessment of Educational Progress, and has helped design a number of state testing programs in reading, gets questions like these from audiences all the time. To answer, he sometimes plays a game. He first asks for a show of hands from people who play the piano. He then asks whether they think they’d play better if they devoted more time to practicing; virtually all vigorously nod "yes." He then asks those who don’t play piano if they’d improve with more practice; virtually everyone laughs. The point is obvious: Without at least some basic skills – and, ideally, some complement of more advanced skills, too – practice is meaningless. "The assumption that just reading a book and taking tests about it will improve reading ability is a poor assumption," Shanahan argues. In Accelerated Reader’s case, "It’s probably too much practice, too much unguided practice." But the primary concern, Shanahan said, is that "it’s stealing instruction time."
Misconceptions of this sort are much of what has hurt the nation’s academic performance, but not quite in the fashion that’s popularly believed. While President Bush and other politicians publicly fret about little children who can’t read, international scores show that young readers in America out-perform everyone except Finnish children._ The fall-off hits later, in fourth through eighth grades, and it hits hard. By eighth grade, American students lag behind a dozen different countries in reading. In presentations, Shanahan has made this clear to the Bush Administration. Nonetheless, in 2001, Bush proceeded to devote $5 billion to boosting reading skills among the relatively un-needy: children in kindergarten through third grade. Campaigns about suffering 11-year-olds on skateboards, it seems, don’t quite tug the political heartstrings the way doe-eyed six-year-olds do
Whatever age level is in question, the panelists’ report concluded that top-of-the-line reading instruction is not complicated; it’s not even terribly new. It does take some time. And, most important, Shanahan pointed out, none of the panel’s suggestions require schools to buy fancy or expensive products, low-tech or high-tech. "If I were teaching the class," Shanahan said, "and was asked if I could do more interesting things to teach a book than just have them read it, I’d like to think I could.
Those things, according to the panel’s recommendations, fall into a simple sequence. For unskilled readers – many of whom can often be found as late as junior high school – the panel strongly recommended a complex of exercises that build facility with sounds and word recognition. Some educators who were displeased with the National Reading Panel’s conclusions complained that the panel was perpetuating the long-standing “reading wars” between “phonics” loyalists and defenders of “whole language.” (For those new to this odd and unnecessary feud, whole language shuns nitty-gritty work on “phonics,” or individual sounds, in favor of a more “natural” approach – regular exposure to whole words and whole stories.) Some of the panel’s critics, including at least one of the panel’s own members, later went so far as to argue that the panel was taking the phonics crowd’s side. But the report itself indicates otherwise. It goes to great lengths to say that exercises with natural text have their place, when combined with phonics and a number of other linguistic exercises. If anything, reading comprehension, not phonics, was the panel’s main obsession, since more than half the studies in its final pool were about comprehension. The panel’s other big emphasis was on the nettlesome challenge of teaching comprehension, an issue that becomes increasingly important, and sometimes increasingly elusive, as poorly educated readers advance in age.
Smart ways to teach comprehension can get somewhat involved; they can also be a lot of fun. Some entail intensive discussions about a work’s characters and themes, or how the writer manages to set a particular tone or mood. If the text is non-fiction, discussion should lean to the analytical, building skills in how to evaluate facts, and how to reconcile conflicting accounts of historical events. Other pedagogies have shown great effects with non-traditional exercises in comprehension – for example, by having students compose loose diagrams that lay out a story’s dramatic structure, or by having young children act out a story. One of the most effective ways to deepen a reader’s sense of a literature’s meaning is as old-fashioned as it gets: basic writing exercises. These include not only standard book reports that summarize what’s been read but also creative essays that invite students to focus on one point of fascination, or to use what they’ve read as a launching point for their own stories. Curiously, Renaissance Learning says, as a selling point for AR, that the program makes book reports unnecessary.
This illustrates Shanahan’s central worry – that Accelerated Reader leads students, as well as teachers, to ignore these basic elements of literary nutrition. "I’d like to believe a teacher could ask a harder question, or a more interesting question, about a book than what’s on their tests," Shanahan observes. The possibilities, he points out, are numerous: "Unpack the plot. See how conflict works in the story. Gauge the mood, and how the writer sets it." AR not only overlooks these issues, it sanctions the neglect of more basic skills as well. "What if a child is having trouble with word recognition?," Shanahan asks. "They’re on their own for that. Vocabulary? They’re on their own. Comprehension? They’re still on their own. It’s oversell. It’s hype.
Shanahan is in a unique position to know. Shortly before Terry Paul headed off for Las Vegas, he hired Shanahan to conduct a separate, private review of Renaissance Learning’s primary pieces of reading research. After Paul returned to Wisconsin, he was greeted with some sobering news: "They seem to take the same old handful of studies and keep repeating them with larger and larger samples," Shanahan told me, summarizing what he’d reported to Paul. As a result, "the mistakes they make initially continue to be the same ones you see on down the line."
Some of those mistakes are pretty serious. Consider Renaissance Learning’s bedrock studies, particularly the 1992 national study on the "Theory of Reading Practice." This study, which was designed and written up by Terry Paul himself, goes to considerable lengths to make very definitive claims. In the study’s Introduction, for example, Paul writes, "Does reading practice cause reading growth? Surprisingly, reading researchers have never conclusively answered this fundamental question." Pointing to his own study, Paul says, "Now, though, the results of the National Reading Study are in. We know the answer. Literature-based reading does cause reading growth, and dramatically so."
To find these dramatic answers, Paul fudges his figures a little, which takes a moment’s patience with statistics to understand. In one case, for instance, Paul cites reading growth of "2.13 grade" levels in a single year through the use of AR. To anchor this claim, he points to the "correlation for the least-squares regression." This, he notes, is "significant," and therefore "associated with dramatic growth in reading ability." The problem is that he based this huge growth on the notion that students had gained 100 points. But none of the students did. Most got well below 50, and the low achievers got as low as 14. How could Paul come up with such a calculation? He used a hypothetical multiplier, then based his conclusions on the expanded result.
When Cathleen Kennedy, a computer science professor at the College of San Mateo, California, and a researcher at UC Berkeley’s Evaluation & Assessment Research Center, reviewed Paul’s claims on this count, then did her own calculations of the data, she was stunned. "This is not an honest picture of what this program is doing," she told me. "It’s a typical dog-and-pony show used on administrators who don’t know about statistics." One of the things that galls research experts most is Paul’s tendency to play fast and loose with statistical terms, particularly his frequent claim to have found "highly significant correlations" when there are none. Actually, the regression statistics in this study turned out to be flat, with associations that Kennedy called “extremely weak” – that is, indicating very little evidence of any cause and effect. “It’s not measuring what they say they’re measuring,” Kennedy said, about not only the 1992 study but also the 1993 follow-up study, which further attempted to link reading improvement to gains in math ability. At best, Kennedy said, the studies show what statisticians would call “statistically significant, weak correlations.” The point here is that statements of “statistical significance” aren’t what they seem. They’re not, as many would assume, signs of a large change or effect; they’re only an indication of a gargantuan sample base, within which a tiny change has been observed. Obviously, the issues here quickly descend into the technicalities of statistics-speak, but those technicalities matter. As federal authorities seek proof that education reforms are based in solid ground, more and more enterprises are invoking the terms of science to lend credibility to their products. To their customers in the schools, or anyone whose work involves taking academic research a step further, these esoteric terms are valuable guideposts. They help people distinguish the solid from the flimsy, the promising from the useless. Wherever such distinctions are blurred, there lies fertile ground for hype and deception.
One question that is persistently blurred in Paul’s claims is whether other factors that have little or nothing to do with AR might be driving the action in the company’s studies. It’s a common question in research on any education program. And it came up repeatedly in my conversations about AR with Kennedy, Shanahan, and other research experts familiar with Renaissance Learning’s work.
When a school buys a program like AR, quite often that purchase simply becomes a way of organizing the staff’s recommitment to teaching. (This seems to be what happened in Don Peek’s school, judging by accounts from several of his teachers.) And sometimes, the product is but one of many new literacy initiatives being undertaken. This may be one explanation for the superior performance by AR schools in the study that Terry Paul singled out as a favorite – the company’s massive 1996 survey of more than 6,000 schools in Texas. The study makes much of the fact that it "cross-referenced" AR information with additional data, some from the state education agency and some from a leading private data firm. But virtually the only thing the researchers knew about these schools was this: one group had bought the AR software; the comparison group hadn’t. Whether any of these schools also expanded teaching hours, bought new books, or instituted any number of other effective changes was never examined. Nor was the very likely possibility that the AR schools were simply wealthier, and thus more likely to have the funds for fancy programs, and a population of more confident students. Despite all these complications, the final difference between the two groups is actually quite slight. Only four percent of the AR schools performed better than the non-AR schools, a figure not much outside a margin of error. 
These and other Renaissance studies are further compromised by a lack of controls for all sorts of tricks that students commonly rely on to master AR tests. Those tricks include resorting to shortcuts, such as skimming or reading Cliff Notes, or old-fashioned cheating. (By many teachers’ accounts, cheating is surprisingly prevalent with AR. Renaissance claims to prevent cheating by scrambling the order of its test answers. But the questions, and their answers, don’t change. And, since the computer disks contain only five or 10 questions on each book, numerous schools report that students find it easy to crib the answers and pass them around. At many schools, students sit down for their tests with the book at their side, or with a friend who has read it, and quickly rack up a passing score.) Because of these complications, Kennedy argues, "This is not a measurement of reading practice. It’s a measurement of reading performance."
When I asked Terry Paul about these criticisms, he dismissed them all. As an example, the original study – the 1992 report that Cathleen Kennedy criticized for misusing statistical measures of validity, and then compounding the problem with a multiplier – Paul regards as an innocent, informal piece of work. While the multiplier technique might have been buried, Paul says, “It’s not something I didn’t disclose.” As for the experts’ other concerns about the company’s approach to research, Paul questions the experts’ criteria. “For an academic journal,” he said, “you have to qualify everything up the ying yang, talk about the fact that of course this doesn’t prove causation, list the questions raised which deserved to be researched, etc., etc., etc.”
"I am skeptical of small-scale control studies in social science," he says. "The selection bias, the Hawthorne effect, the difference between teachers kind of overwhelms small studies. Big databases are the only way to filter out some of this noise." Some researchers may agree with Paul, but most apparently don’t. Relying on big, uncontrolled databases, Shanahan says, "is like trying to read the entrails of a goat." Federal authorities seem to agree. That’s why the Elementary and Secondary Education Act, as passed by Congress in 2002, required school initiatives to be founded on "scientifically based research" – in other words, on controlled studies
The tempting conclusion to draw from this story is that Renaissance Learning, Inc., is an anomaly – a company that generates unusually irresponsible research. In actuality, the kind of material the firm generates, and the claims it makes about that material, is more common than one would think. In 1998, for example, Educational Technology Review reviewed 834 articles from leading research journals in educational technology, published from 1991 to 1996. The journal found "only 12 percent… of the work is of an empirical and objective nature." Upon further inspection, the journal concluded that "approximately five percent… is conducted using formal methods such as control groups with comparative learning outcomes" – that is, a second group whose characteristics and options are truly equivalent to the group that’s armed with computers.  As Edward Miller, the former editor of the Harvard Education Letter, once put it, "The research is set up in a way to find benefits that aren’t really there. It’s so flawed, it shouldn’t even be called research. Essentially, it’s just worthless."
Considering the sorry state of the art in this field, Shanahan didn’t consider the limitations in Renaissance Learning’s research to be terribly serious. Many companies, he pointed out, do no research of any kind on the effectiveness of their products "Renaissance Learning deserves some credit," he said, "just for putting themselves on the line." In fact, the gaps and odd twists in Renaissance research are so common that, Shanahan surmised, the company could probably get some of its studies published in middle- or low-level research journals – if it were willing to slightly pull back on its claims.
As Renaissance’s compendium of research accumulated over the years, they have done just that – to an extent. Part of the reason is that Paul has arranged to have the later studies conducted by independent academics. But important weaknesses have persisted, and these are of a kind that pervades the wide swath of research on education products and programs. As a result, many private companies that work with schools have applauded Bush’s higher scientific standards. In response, however, a good number of these firms simply pasted the government’s terminology on their past research materials, to prove to customers that they are solid academic citizens. Renaissance Learning has been no exception. "Renaissance is supported by the highest-quality research, as defined by the federal government—it meets the five criteria of scientifically based research and involves control groups," Terry’s wife, Judi, wrote in a March, 2003, letter to educators. Similar comments prominently appear on the company’s web site. To support her statement, Judi listed 39 of the studies done over the years on various Renaissance products. While the tone in some of the more recent studies’ claims is more cautious, numerous reports still suggest or flatly state that Renaissance programs have led to dramatic gains in achievement measures. And, while the growth numbers may be real, there is no proof that Renaissance programs were the cause. In fact, the studies continue to leave open every possibility that other factors could have driven the schools’ accomplishments; in some cases, the gains appear to be nothing more than normal student growth. An interesting example of the more recent studies on Renaissance’s reading products is the work of Jay Samuels. A professor of both educational psychology and curriculum and instruction at the University of Minnesota, in Minneapolis, Samuels served with Shanahan as a member of the National Reading Panel. But he emerged with a more positive view than Shanahan did of instruction through what reading experts call “sustained silent reading”—essentially, Accelerated Reader’s routine minus the company’s short quizzes. In March 2002, Samuels wrote a forceful letter for Renaissance pointing to a very mixed study that, he argued, firmly proved AR’s value. To make his case in the letter, which Renaissance publicly distributed, Samuels listed his credentials in some detail. Missing, however, was any mention of the fact that Samuels was paid by Renaissance to conduct his own AR studies (these turned out to have their own methodology weaknesses), or that he serves on the company’s board of directors.
The Creative Research Lab
The story behind Terry Paul’s approach to academic research is as curious as the research itself. In the early 1990s, before Terry Paul formally joined Renaissance (which was actually founded by his wife), Paul was serving as president of Best Power Technology, a company that manufactured backup power systems. Best Power was a family-owned company and had long been run by Terry’s mother and his brother Steven. Paul’s tenure there was brief, but it yielded some creative lessons; it also produced the model for today’s School Renaissance Institute, the company’s research subsidiary, where nearly a fifth of its one thousand employees work.
Best Power was founded in 1977, in Necedah, Wisconsin, a tiny town of less than 1,000 people in the state’s desolate heartland. Marguerite Paul, Terry’s mother, was drawn to this community for a very specific reason. Since the 1950s, Necedah had become the gathering place for a sub-sect of devout Catholics, who were followers of Mary Ann Van Hoof, a woman who had claimed to have visions of the Virgin Mary (among other saints and angels), private knowledge of coming waves of global devastation, and stigmata to prove it. Adding to Van Hoof’s appeal were her regular sufferings, endured on the Fridays of Advent and Lent. During these times, Van Hoof would take to her bed and, by some followers’ eyewitness accounts, endure the physical blows that Christ suffered (sometimes with outstretched, rigid arms, as if in crucifixion), and intermittently receive heavenly messages, which are now recorded in six written volumes. Today, the most visible remains of Van Hoof’s teachings are a shrine that her followers started in her honor, in the belief that, when devastation did come, anyone who followed her Christian teachings would be spared. The shrine – a pastoral garden and still unfinished concrete basilica – houses roughly a dozen life-sized statues, which include various apostles, George Washington and Abraham Lincoln, and Jesus Christ in a state of bruised and bloody anguish. The shrine, which was built by Van Hoof’s followers, is Queen of the Holy Rosary, Mediatrix of Peace, Mediatrix Between God and Man. It is being sustained and completed by a local organization called For My God and My Country, Inc. Marguerite also saw a business opportunity in Necedah, which regularly drew tens of thousands of visitors during Van Hoof’s vigils. Anyone struggling to survive when the rest of the world was crumbling would need independent power supplies, and Marguerite decided to manufacture them. (They were also drawn to this area, according to former employees, because Necedah, which sat in the state’s poorest county, would be a fertile seed-bed of cheap labor.) Before long, the company was selling thousands of power inverters and transformers to survivalists, private companies, and other customers across the nation.
Best Power grew slowly through its early years, aided at the time by alternative- energy tax breaks created by former President Jimmy Carter. Finally, in 1980, the company had become large enough that Marguerite and Steven decided they needed some extra management muscle, and brought in Terry. (Terry’s father, Willard, while technically a founding officer of the company, was relatively absent. And Steven was busy with other interests, one of which was the development of a perpetual-motion machine.) Terry immediately saw two new business possibilities. One was to capitalize on the nation’s newly developing high-technology industry by supplying back-up power sources to computer users. The other was to build an impressive base of research about energy usage, and back-up energy needs, which could be promoted and legitimized by a separate think tank. That operation was called the National Power Laboratory, and its success (its data was soon being quoted by competitors and The New York Times) fed Terry’s later visions in education.
By 1992, Best Power was pulling in close to $100 million in sales, an achievement that inspired the family to take the company public. But Terry was wary of the family’s weaknesses as a management team. His solution was to convince his mother (and the company’s board), that it was time for her to retire. Marguerite, it seemed, was not in a retiring mood. Convinced that Terry was trying to steal the company from Steven, she persuaded the board to fire Terry instead. Terry responded by suing his mother and several company board members for dismissing him without cause. This spawned a long and strange court fight, which was finally resolved in a settlement to Terry and company stockholders worth almost $12 million. (In making his case, Terry claimed, among other things, that the family had issued stock improperly, misused the company plane, and made unauthorized loans to corporate officers. He specifically focused on his mother, charging that she’d taken out questionable loans worth $700,000. Before long, the discovery process got brutal enough that there was talk of getting a formal evaluation of Marguerite’s mental health. According to former company principals, Marguerite had occasionally shipped gold bullion in and out of the Best Power building. (Some thought this was one reason why Best Power maintained a corps of armed security guards.) During outgoing shipments, the gold was occasionally mailed to a small town in Spain, where it was donated to a church headed by a priest whom Marguerite believed to be the true Pope. However, Terry soon found a way to settle the dispute without getting into public questions of mental stability. He knew that, for years, Marguerite had kept employees in the shipping department busy with something other than gold bullion: before mailing out power transformers, they imbedded them with a Christian medallion. (Company rumors have it that this was ultimately noticed by outsiders when some of the transformers started smoking. The suspicion was that this was either the medallions’ masking tape wrapping burning up, or the medallions themselves coming loose and shorting out wires.) In the end, the possibility that the existence of the medallions might be disclosed – producing not only embarrassment but also a potentially large product recall – helped Marguerite see the wisdom in agreeing to a quick and generous resolution. In early 1993, the case was settled for $11.8 million to company stockholders. For his part, Terry received $515,000 to liquidate his stock options, $360,000 not to compete with the company, and up to $250,000 in legal fees. (Sources: “Firm’s Ownership Squabble Resolved,” The Capital Times, June 15, 1993; and author interviews.) While the suit dragged on, Terry joined his wife as co-chairman of Renaissance Learning, which was, and still is, based an hour’s drive from Necedah, in the town of Wisconsin Rapids, doing business at the time under the name Advantage Learning Systems, Inc.
Pleased with his success with the National Power Laboratory, Terry quickly looked for similar opportunities in the education market after his arrival at Advantage Learning. He began by setting up a new, separate operation in Madison, not far from the University of Wisconsin, which did business under the name "The Institute for Academic Excellence." The institute immediately adopted an academic patina, which was polished by its scholarly address: "455 Science Drive, University Research Park." In reality, there was never any tie to the university. In fact, the manner in which the institute went about its research would have left many university researchers astounded.
But it wouldn’t astonish them all. In recent years, news reports have occasionally popped up indicating that scientific research, even at respected universities, has become increasingly biased toward the commercial enterprises that fund it.  Sometimes the service being delivered to commercial interests is made plain; sometimes it’s carefully hidden. In either case, statistical concoctions that look true enough to fly through the public radar tend to share common rules. Because the Renaissance Institute turns out so many of these studies, and because those studies aim so high, their distortions are particularly revealing.
One of the first things the institute did, after Terry Paul’s debut studies in 1992 and 1993, was create what looked like an objective framework for gauging the ideal level of reading challenge that any student should face. To do this, Paul went prospecting for ideas in the annals of mainstream academic literature, an approach that would become a common practice of his over the years. Eventually, Paul came across the writings of the Russian psychologist Lev Vygotsky. To many experts in the worlds of education, psychology, and child development, Vygotsky is one of the great luminaries; he’s the man who, more than any other, devised a set of theories that trumped those of Jean Piaget, the even more famous 20th-Century psychologist and definer of the stages of child development. Vygotsky’s idea, in part, was that children were entirely capable of pushing the envelope on Piaget’s rigid demarcations of academic ability – if they were properly guided. As Vygotsky’s work progressed, he became so fascinated with the outer limits of children’s capabilities that he gave them a formal name: "the zone of proximal development." For Paul, that zone looked like a gold mine.
Paul realized that he could become, in a sense, the zone’s modern master. If he could find a meaningful way to define reading challenges, and then show teachers how to push students to the pleasurable edge of their comfort zone, but not over it, he’d have a killer product. Vygotsky helped him do that. As one of Paul’s reports stated, "The point between unchallenging and frustratingly difficult text, the point at which maximum growth occurs, is the zone of proximal development or ZPD."  Over the years, the concept has become the engine in the Renaissance reading program. "ZPD! ZPD!" Don Peek screamed out to one conference session in Las Vegas, in a typical exhortation. "If you’re having trouble motivating your students, you’re not using the zone." After some experimentation, Paul defined this zone as any score on Accelerated Reader quizzes that fell between 85 and 92 percent correct. Scores above 92 percent mean a student is reading books that are too easy; anything below 85 percent means the books are too hard. All a teacher needed to do, therefore, was watch the numbers. Simple enough, right?
Not to Terry Paul. As with most of the institute’s material, he wanted a reassuring blanket of data behind this theory. So, in 1998, Paul put together a study of approximately 80,000 students who used Accelerated Reader in Tennessee. The study was supposed to find the exact "ZPD" that would produce the biggest boost in reading ability. To conduct his inquiry, Paul contracted with William Sanders, a professor at the time at the University of Tennessee, at Knoxville, and a highly regarded innovator in the nettlesome problem of tracking performance, over a period of years, by both students and their teachers. Paul of course hoped that such a mother lode of data – Renaissance scores from thousands of AR users, cross-referenced with Sanders’ detailed achievement records – would generate some powerful statistical arguments. Once the study was done, Sanders found that most students were not showing gains with the program of any great significance. More important, the older the students were, the less the program tended to be of any help. In retrospect, Sanders saw plenty of indications that the AR program could be useful – as one tool among many to teach reading. But it seemed problematic when viewed as a necessity, or when used as a stand-alone solution. "There were plenty of highly effective teachers who weren’t using the tool," Sanders told me. "And there were some highly effective teachers who were." While the AR teachers showed a slight edge over the non-AR teachers, the problem, Sanders said, is "You don’t know if those teachers were more effective to start with."
There might have been a reason the numbers didn’t cooperate with Paul’s expectations. Vygotsky, it turns out, had something quite different in mind regarding any ideal "zone" for reading. At the crux of Vygotsky’s work was an intriguing discovery, and the theory for which he is most known: pushing the limits of a youngster’s learning zone made sense, Vygotsky realized, only when his or her efforts are robustly supported with social interaction – with teachers or skillful friends. Challenges pursued independently were another matter entirely; if anything, they framed the bottom of Vygotsky’s zone. As Vygotsky himself put it, "… the zone of proximal development… is the distance between the actual development level as determined by independent problem solving and the level of potential development as determined through problem solving under adult guidance or in collaboration with more capable peers."  In other words, take an eight-year-old girl who is given lots of books to read at increasingly advanced levels. If she then gets to read or talk about the books with a teacher or parent, and is also paired with smart playmates to paint or build a make-believe world around the concepts she’s reading about, before long she should be reading like a 10-year-old. That, at least, is Vygotsky’s concept. It was not about silently poring through advanced books all by herself, and then getting her accomplishments evaluated by a computer program.
Terry Paul saw no reason to worry about these details. In his view, books ought to be tutor enough in themselves. If students are encouraged to pick challenging enough material, they’ll be "pulled into their ZPD," as several Renaissance presenters put it. The limitations in that statement are why many experts in reading and academic research, including some who have worked for Renaissance Learning, Inc., have trouble with Paul’s connections. "You see how quickly this gets fuzzy," Cathy Upham, one former Renaissance employee, told me. "What Terry Paul does is he takes the most faddish theorist and creates a frame where it looks like this simple little test of his fits this complex learning theory." In doing so, Upham said, Paul "gains sexiness in an educational theorist way – by yoking what he’s doing to a known learning theorist." The problem, Upham argues, is that Paul employs "none of the controls" necessary for these assertions. "To do that," she said, "you need all sorts of validity tests. And he does zero of that."
Upham, who has advanced degrees in rhetoric and writing, and a Ph.D., ironically enough, in Renaissance Studies, was hired by Renaissance Learning in 1997. Her job was to help the company’s sales to high schools, a market that has not embraced Renaissance products as enthusiastically as the early grades have. To give the high school foray some muscle, Upham was supposed to design something called Accelerated Literature, poised to be one of the company’s most sophisticated software packages. The product was supposed to test "higher order thinking skills," an elusive but seductive target that educators aptly refer to by its acronym: "HOTS." In Upham’s view, HOTS has become "the big buzz-word, although nobody knows what it is. And nobody defines it, least of all a commercial company."
Accelerated Literature’s commercial life didn’t go terribly well – for reasons that say something about the prospects for any sophisticated educational software. In 1998, before the product ever launched, the company dropped the original concept, Upham and others recall, for two reasons. First, a true "HOTS" product would require sophisticated content, which could be created only by people erudite in high school subjects such as history, mythology, science – the list could go on and on. But Upham got the sense Paul didn’t want to spend what it took to hire people with those skills. Second, once the product was distributed, it did not have great prospects for spawning the ongoing profit stream that AR disks generate. "Once schools buy it," Upham said, "they wouldn’t need much more." (Terry Paul differs with Upham’s version of events, saying it was simply more efficient to fold HOTS efforts into their existing products.)
Before long, Upham left the company, thoroughly disillusioned. At the time of my interview, she was employed by Wisconsin’s Department of Public Instruction, to help the state improve its reading and writing tests. Her comments here are not official viewpoints but her own personal opinions. They are quoted at length partly because they are widely shared by a number of former employees who were in senior positions at either the Renaissance Institute or the mother Renaissance company. Most of those former employees are reluctant to be identified, however, because they are fearful of the aggressive stance the Pauls have always taken with their critics.
Despite former employees’ fears, an open discussion of the company’s operations can still be had, through the views of independent contractors who have occasionally helped the Pauls do their work over the years. Many of these contractors are credentialed professionals, armed with Ph.D.s in rarefied sciences like psychometrics, which is the statistical art of designing valid psychological measurements. These people are not so reluctant to discuss their experiences.
One such expert is Michael Beck, a well-regarded psychometrician, and the founding president of Beck Evaluation and Testing Associates, catchingly known, for high-tech enthusiasts, as BETA. Terry Paul hired Beck to develop a program the company called its Standardized Test for the Assessment of Reading, commonly referred to by its own evocative acronym, STAR. This program (which sells, in a basic package for 200 students, for $1,499) was to become the company’s bedrock diagnostic test. In quizzing students on a specific piece of text, it is supposed to let teachers set students’ reading levels, and identify those students who need additional help. In that respect, it’s the one Renaissance product that, ideally, would prompt teachers to provide what reading expert Tim Shanahan was crying for: individual assistance in specific areas of weakness, such as sound recognition, vocabulary, or comprehension.
Beck began the work as any researcher would – gathering lots of data, and establishing "norms" (these are basically midpoints on a national bell curve, drawn from average classroom situations, which become guideposts of where the average student should score). When it came time to convert all this information into a final product, Beck ran into unusual obstacles. Paul, he said, "became a little more inventive than was called for." A typical conversation, Beck remembers, would run as follows: "He’d say, ’How do you like this? And I’d say, ’Well, it doesn’t have anything to do with what we did.’ Then Terry would say, ’We’re going to put it in there, anyway.’" Part of the reason for Terry Paul’s frustration may be that he thought statisticians didn’t crunch information properly. Beck recalls Paul frequently telling him that he had found "a better fit for the data." Part of the reason also may have been bias. "Terry would go looking for data to support his ideas," Beck remembers. Plenty of people do that, he acknowledges. But, he noted, "We don’t then call it research. We call it a belief system."
Paul’s evaluation of Beck’s criticisms is much the same as his assessment of Shanahan—that neither of these people understood his company’s products. He credits Beck for having the courage to create a test that had little precedent. But Beck’s methods, he said, weren’t based in "scientific item response theory. He didn’t understand item response theory." Beck, who has spent 30 years working with procedures that use item response theory, is baffled by Paul’s conclusions. "I don’t recall ever having a discussion with Terry or any of his staff concerning the advisability of using these procedures," Beck told me.
Thinking back on her tenure, Upham said she came to see the Renaissance Institute as something other than what it appears. "It’s a pseudo-independent research firm that really functions as a marketing tool," Upham said. "And people are being snowed to think it is a real research institute. By late 2002, Terry Paul also started rethinking the Renaissance Institute, at least in part. “People were getting confused about it,” he told me. So Paul dropped the effort to maintain the operation as a separate subsidiary and folded it into the umbrella firm, where it became simply another company division. But it retained its emphasis on educational research. The stuff he does with statistics is just nonsense. He plays with it. And that’s appalling. There’s this thin veil of research over what’s purely a marketing product." What worries Upham most is that "the educational community is not savvy enough to scrutinize this stuff." It’s a concern shared by many former employees. "The education community expects people to be just like they are, that they’re there because they want to help kids," one former Institute senior employee told me. "If someone comes in with a motive to make money, you’ve got the most gullible population in the world just eating out of your hands."
Is This Book Worth Much?
One evening, long after Tim Shanahan had wrapped up his work on the National Reading Panel, he visited with the parents of a Chicago-area elementary school, in a district for which he’s done some consulting. When it was time to entertain questions, the first ones were about Accelerated Reader, even though Shanahan hadn’t even mentioned the program. It turned out that this school (Kimball Hill) had begun using Accelerated Reader, and parents had become concerned that their children would only read books that were part of the program. The students’ reasoning was obvious: if they read other books, they wouldn’t get any AR points. That meant no special kudos, and no prizes.
These troubles extend far beyond Chicago. All across the country, librarians have complained in Internet discussion forums that, while AR has increased their circulation numbers, the quality of students’ reading hasn’t always followed. Important, well-reviewed books, both classics and new releases that librarians have gone to considerable trouble and expense to acquire, tend to be ignored once library shelves are full of AR books offering points and prizes. The narrowness of the students’ focus is exacerbated by the fact that teachers often use AR points for class grades. Several librarians noticed that students had trouble managing anything other than the most basic comments about books they’d read through AR. One librarian in an Iowa elementary school told me that if she asks any complicated questions about the books, the students will draw a blank, and often admit that they skimmed the material just enough to take the quiz. In one school teachers have prohibited classroom discussion of AR books. AR test questions tend to be so simple that teachers have been afraid that, after hearing a little conversation, many students would pass an AR test on the book without having read it. Even librarians who are generally supportive of the program are concerned. "With dwindling funds, I’m finding it more difficult to provide a balanced collection," wrote Mary Givins, a teacher and librarian at Roberts Elementary School, in Tucscon, Arizona. "Easy fiction and fiction shelves are crammed, and I’m having to replace and repair constantly. I could book-talk until I’m blue in the face and a lot of kids won’t touch a book unless they can ’take a test’ on it." 
Complaints of this sort, along with persistent questions about cheating, even came up at Renaissance’s gathering of the faithful in Las Vegas. Amid the applause and celebrations, more than a few teachers dared to voice complaints. The most fervent protestations came from high school teachers, who were frustrated by the program’s simplistic design. Some found themselves with a rare opportunity to do their own informal, controlled studies. When students arrived as freshmen, some had come from lower grades that used AR, some from grades that didn’t. The differences between the two were often telling. In one session, someone asked how much "carry-over" there was – in other words, did the gusto for reading last, once points and prizes were no longer in the picture? "I see a big drop, to be honest," said one high school teacher, prompting a round of nods. A cursory independent study published several years later confirmed the teachers’ hunches. (The researchers came to this conclusion by surveying 1,771 seventh graders from 10 schools, where some had used AR in fifth grade and some had not. When they were asked to identify books they’d read, students from non-AR schools actually identified more books than the AR students did. )
In Las Vegas, the grumbles sometimes became prevalent enough that a few started challenging the very basis of the Renaissance program. "Is Renaissance Learning company aware of how weak the STAR test is, and how elementary the training seems to be for high schools?" asked Sam Hack, a high school teacher from Missouri. "And are they doing anything about that?" The reply from the Renaissance presenter – a teacher herself – was rather curious. She said she wasn’t using STAR much for skill diagnosis, and instead relied on the basic AR program for that information. Some talked about how unrealistic the complete Renaissance program seemed to be for any school that’s not up for altering its entire routine. Later, in a private conversation, several teachers said they were uncomfortable committing to something promoted solely by a commercial interest. "I’d be a lot happier if it was endorsed by the state, or some education organization I trusted," said Annette Halpern, a high school teacher from Santa Paula, California. "Everything here feels like it’s about selling."
To reading professionals, complaints like these are serious enough that a few have begun to take some action. In Texas, one of the company’s biggest markets, two professors became so concerned about Accelerated Reader that they embarked on an intensive study of the program.  The researchers – Jo Worthy, an associate professor of education at the University of Texas at Austin, and three of her doctoral students – spent several years looking at how seven different fourth grade classes were using the program at two elementary schools. They worked from the ground up, starting with the students, and what they found wasn’t encouraging.
Some students said their teachers wouldn’t allow them to pick books outside their AR "reading levels." Some had become turned off to reading, because they didn’t like Renaissance’s book selection. Some were motivated for a while, but eventually stopped reading once they’d gone through the AR books that were available on their level. Ultimately, the researchers found, these constraints discouraged students from exploring subjects that are often more appealing and challenging. For many students, particularly boys, these tend to be works of non-fiction – books about satellites or snakes, for example, or elementary histories of Africa, or America’s early days. Titles of this sort aren’t organized for the consumer market the way children’s fiction is. While librarians may have the time and knowledge to overcome such obstacles, it’s quite another matter for a commercial firm to pull that off.
Over the years, Renaissance Learning has tried, with some success, to broaden its selections. Yet odd snafus have remained. For example, when challenging non-fiction books have been included on AR lists, they’ve sometimes offered fewer points than easier works of fiction do. This fact is not lost on students. "It breaks my heart," wrote Julie Criser, a media specialist at Blair Elementary School in Wilmington, N.C., "when I hear a child ask ’Is this book worth much?’" 
As it turns out, there’s a second element that determines the worth of AR books, and simultaneously limits students’ reading choices. This involves the odd scheme used by Renaissance Learning – and other mass-market reading programs – to determine a book’s difficulty level.
For purposes of uniformity, almost everyone involved with reading in schools – publishers, state text adoption committees, librarians, and teachers – chooses books according to standardized measures called "readability formulas." While these formulas come in roughly half a dozen varieties (depending on the company setting the formulas), all rate the difficulty of a text by looking for the same basic factors: sentence and word length, vocabulary choice, and a few added signs of complexity. In the best of circumstances, these automated ratings offer a skewed picture. A simple fiction story, for example, that’s full of long sentences or a few long or obscure words, is likely to be rated as more difficult than it is; conversely, a science or history book that’s full of complex ideas, but written very simply, may get an undeservedly low rating.
Knowing this, Renaissance Learning, like many companies, has occasionally tried to add nuance to these formulas. But, once again, Renaissance Learning has introduced artistry to the process that other companies never dared to attempt. An indication occurred in the late 1990s, when Terry Paul went searching for a new, improved formula, which would be linked to Renaissance’s STAR test. Over the following years, Paul bounced from one testing and evaluation house to another, apparently convinced that he knew a better way to do their business. He finally ended up at TASA, the Brewster, New York, testing outfit that stands for Touchstone Applied Science Associates. The contract called for an ambitious new hybrid – a formula that would be both sophisticated and easy to use, with rating levels that simply told teachers whether a book was, say, an easy fourth-grade book or a hard one. To accomplish this, the TASA staff hoped there would be some equally ambitious new research.
Within a week of starting work on Paul’s new formula, Stephen Ivens, then a TASA vice president, walked off the project. "I knew no one was serious about doing the work," Ivens recalled. As a former director of research and development at the College Board, with a specialty in reading assessment, Ivens had some opinions about how that work should be done. "When someone says ’We’ll have the results in three months to announce at IRA’ [an International Reading Association conference], you know no serious research can be done." Sure enough, there wasn’t much new research. Renaissance did embark on a massive cataloguing effort, Paul says, going through 30,000 books to establish a database of some 30 million words. This ostensibly let evaluators to see how frequently difficult or easy words occurred in certain texts, which could be rated accordingly. The company then tested the results against Renaissance performance data. In Ivens’ view, this was simply recycling the company’s old numbers. "They didn’t collect any new data. No one was interested, for example, in the difference between how expository and fictional texts were written." In the end, Ivens said, TASA and Renaissance Learning came up with something that’s "more of a vocabulary test than a reading test. And it’s awfully short. It’s just marketing jive. It sounded good but it didn’t do anything."
The Technologies of Testing
As the new federal emphasis on testing and accountability has taken root, an increasing number of educational software companies have begun to make claims very similar to Paul’s. The numbers behind those claims would seem to be real facts. So what’s not to love?
In the test-score world, truth turns out to be as elusive as it is in readability evaluations. And it is far more consequential. In the administration’s drive to test students across the country to the same "high standards," and to bring "accountability" to schools that fail to do so, it appears as though we’re finally getting down to academic business. In some ways, we are. After decades of literature demonstrating that elementary and high school academics have, to some extent, become a loose and fad-ridden enterprise, a little rigor and tangible measure of progress is certainly in order.  To its credit, the testing industry has tried to deliver just that, making every effort to improve the sophistication and nuance in these crude rituals of scholastic life. Overall, however, their success has been remarkably spotty. And it does not seem to be getting helped much by technology.
In the opening years of the 21st century, standardized tests began, more than ever before, to define academic life. Before long, testing was the dominant reality in almost every classroom, its implacable boss.  Test scores draw their power, first, from the realm of politics, which worships numbers and harvests them any way it can. This in turn hands power to a highly mechanized private test-manufacturing industry, where errors and scholastic limitations are not only rampant, they’ve also been kept largely hidden from public view.
One of the more detailed accounts of this phenomenon was provided in the spring of 2001, when The New York Times published a lengthy, two-part series on the companies that create and evaluate standardized tests. Horror stories abounded from employees paid $9 an hour who were asked to score tests in subjects they knew nothing about, and often in a rush. "We are actually told to stop getting to involved or thinking too long about the score – to just score it on our first impressions," said Artur Golczewski, a former scorer at NCS Pearson, the nation’s leading scoring company, which handled 300 million standardized tests in 2000. As might be expected, employees occasionally discover that they have scored a particular item wrong, creating hundreds of errors. "There was never the suggestion that we go back and change the ones already scored," said Renee Brochu, another scorer. Apparently, evaluators are also sometimes told to manipulate the scores. "One day you see an essay that is a 3," Golczewski said, "and the next day those are to be 2’s because they say we need more 2’s." Company executives have disputed these stories. Yet when the executives found errors, in numbers substantial enough to alter a school district’s performance, their correction measures were rather curious. In Tennessee, CTB McGraw-Hill randomly changed the test scores to fit a Tennessee official’s estimate of where the numbers should land.
In New York City, CTB sat on scoring errors for months, by which time nearly 9,000 students had spent the summer in remedial classes when they should have been on vacation.  These errors should not have surprised anyone, especially in New York. Years earlier, in 1980, a shrewd New York student discovered that the preferred answer on a PSAT questions was in fact incorrect, which produced front-page headlines announcing the following: "Youth Outwits Merit Exam, Raising 240,000 Scores." As more people began to re-examine the exams, errors were soon found on the PSAT, the SAT, the LSAT, and some Graduate Record Examinations.  Now, two decades later, the testing world’s potential for both good and evil is being heated up even more by a rash of computerized evaluation products like Accelerated Reader.
In the Spring of 1994, the Harvard Education Review published a series of articles about academic testing, one of which was on its long and problematic history. The article was written by George Madaus, a professor of education and public policy at Boston College, and the former director of its Center for the Study of Testing, Evaluation, and Educational Policy. Madaus looked specifically at testing as a technology, and its effects on equity in education.  Much of his article reads as though it had been written today, because the issues in testing have changed so little over the years.
To Madaus, the system of testing has long functioned as a technology, even before the age of the computer. This is not only because of the technical gear schools need to administer tests – the paper and pencils, the various sorts of scoring and rating systems. It’s also because, in his view, technology and testing affect people in similar ways. In both cases, Madaus finds, people tend to be seduced to follow novelties, and to forget old, worthwhile values. Both insidiously mask complications, and make them feel irrelevant. "Technology leads a double life," Madaus wrote. "One life conforms to the intentions of policy-makers; the second contradicts them, proceeding behind their backs to produce unanticipated uses and consequences… Although the benefits of technology are enormous, technology simultaneously creates problems, opens new ways to make big mistakes, alters institutions in unanticipated negative ways, and impacts negatively on certain populations." The same patterns occur, Madaus believes, in the world of testing.
The system of intellectual testing on a large scale began with Alfred Binet’s invention, in 1905, of what evolved into the IQ test. Interestingly, Binet had a very different purpose in mind than the mass evaluation system that the IQ test became. His fellow Parisians were simply looking for a quick way to identify students unlikely to succeed in "normal" classes, and who therefore needed special instruction. Yet the appeal of a dominant, objective measure was too strong to resist. In the years since, Madaus wrote, Binet’s technology has been used to "misclassify and label people through most of this century." Those classification biases, Madaus argued, have particularly hurt minorities and the poor.
Modern testing technology, in Madaus’s view, has only worsened the problem. "Inequity associated with a technology may be difficult to detect," Madaus wrote, "since most technologies are based on highly technical, arcane underpinnings." As a result, "most Americans usually do not inquire whether the design of a test or any other technology might produce a set of consequences or inequities along with its professed advantages." The reason, he said, is that "All those who benefited from testing, such as test makers, policymakers, and a host of different test users" have become testing’s "maintenance constituency." And that constituency has "covered up, evaded or ignored their dependence on this technology, as well as the fallibility, vulnerabilities, and failures of testing."
In the early days of the 21st century, the vulnerabilities and failures that Madaus described seven years earlier were more alive than ever.
At almost every public school today, a common scene in a teacher’s routine is regular prep sessions for students’ standardized tests. Actually, it’s more common to hear about these practice sessions, but not to see them. When testing time approaches, public schools generally go into emergency mode: No field trips, no art, music or drama, no special programs, and no visitors. Even the usual curriculum is tabled – for weeks at a time. Meanwhile, students do virtually nothing but prepare for their tests. They get drilled in multiple-choice problems in reading, math, and social studies. They go over banks of historical facts. And no wonder. These exams are the sort that educators call "high-stakes tests" – the term for a test that is the overriding criterion for a student’s advancement, and for a school’s ability to stay in business. Nothing else counts.
It wasn’t always this way. In the past, if a girl tested poorly but racked up good grades, if she had been a consistent participant in class discussion, or if the teacher considered her capable and motivated, she could still advance from one grade to the next, and even graduate. Today, those considerations are often of little consequence. They’re seen as signs of softness, and bundled up with the new public distaste for "social promotion," the old practice of sending children on to the next grade level as they age, regardless of their performance. (Interestingly, despite the odor of low expectations in social promotion, the custom may not be as irresponsible as it seems. A good many credible studies, including recent surveys of high school drop-outs, have found that holding students back a grade or two can be more damaging than sending them on before they’ve learned their lessons. It all depends, of course, on how low performing students are treated; the promotion, or lack thereof, is secondary.  )
At a certain point, as many readers of the news have noticed, a number of schools imitated former First Lady Nancy Reagan. They just said "No." By the spring of 2001, from the working class town of Harwich, Mass. to the upper class, test-savvy communities of Marin County, Calif., and Scarsdale, New York, both students and teachers had begun boycotting standardized, high-stakes tests. In Fairport, a middle-class suburb of Rochester, New York, after parents boycotted the high-stakes state Regents exam, the school superintendent made plans to issue an alternative local diploma. In doing so, the superintendent enlisted the help of both university leaders and local businesses, in the belief that they could come up with better standards than the state had. (The head of the local employers’ group was especially concerned that with so much time spent on testing drills, students were doing fewer projects and apprenticeships that "inspire the better thinking, reading and math abilities that businesses need." )
But the winds of the moment are always strong. For the most part, the testing protests have been framed as permissive quibbling from soft headed liberals, whose children can’t hack cold competition. (As an indication, in mid-2003, two years after proposing an alternative high school diploma, Fairport’s superintendent was still working on the idea.) Among the handful of different state exams that have consistently dominated definitions of achievement, one of the most prominent is the Texas Assessment of Basic Skills (TAAS) – the central plot device in Don Peek’s story about his old middle school in Pittsburg, Texas.
The real story behind Peek’s tale – and the truth about Texas tests – is instructive on a number of fronts, which roll out in an intriguing sequence. Bush obviously has drawn much of his national vision for schools from his experience in his home state. Coincidentally, Texas has long been Renaissance Learning’s primary market. Most of the reports about the power of Renaissance products therefore come from Texas, and from Texas achievement data. This also makes Texas one of the early leaders in taking a vigorous approach to both challenges – standardized testing and computerized methods of preparing for tests. The nexus of those pursuits is thus likely to influence public assumptions about computerized educational products everywhere. If the data here has problems, so does Terry Paul. And so do schools across the country, as Bush’s vision becomes an American reality.
During Don Peek’s story, there is a moment in the middle where he explains the history behind TAAS. He slides by it pretty fast, but he drops just enough detail to start a skeptic wondering. Before the days of TAAS, Peek noted, Texas students had to take other standardized exams, which were more basic tests designed to evaluate minimum competency in the three R’s. Each test reigned for about five years, before being replaced by a new, improved version. In each case, Peek recalled, the students and teachers had trouble in the beginning, but within a few years, they’d figured out how to get their scores up. Then, in 1990, along came TAAS, a real challenge, Peek told his crowd, "an upper level thinking test." The next thing the audience knew, Peek was off into gripping anecdotes, showing how seriously his teachers took the school’s failure on TAAS the first year, and what they did to improve matters the next year. Unnoticed, of course, was the repeat of their old pattern. The school was simply figuring out how to get its scores to rise on the new test. During those years, a number of Texas school districts that had never heard of Accelerated Reader posted similar gains.
That climb, statewide, was soon contributing to the glow around the presidential campaign of then Gov. George W. Bush. Like others who had sought the White House from a governor’s seat, Bush managed to plant a fertile seed during his campaign regarding his role in a state "miracle." The concept had a nice circularity to it. In 1988, Michael Dukakis, then governor of Massachusetts, threatened Bush’s father’s run for the White House with a "Massachusetts Miracle." Dukakis’ miracle was a state economic revival in the midst of national hard times ; the Texas governor’s would be a state education revival in the midst of national dismay about school quality. As evidence, Bush invoked TAAS scores as proof that achievement was rising, and state drop-out records to prove that more students were staying in school.
The media quickly took the bait. "Accountability Narrows Racial Gap in Texas," crooned USA Today, in an editorial in March of 2000, which went on to describe "Texas-size school success."  Even The Boston Globe, presumably chastened by Dukakis’s doubtful miracle, picked up on Bush’s theme. "Embarrassed into success," announced a front-page headline in June of 1999. "Texas school experience may hold lessons for Massachusetts."  The academic world was somewhat more skeptical, though. Before long, a chorus of critics rose to debunk Bush’s miracle.
In assembling their attacks, the critics were aided by a multitude of research supposedly proving the Texas story to be more myth than miracle. A good portion of this information was generated in the course of a lawsuit against the state, brought in 1999 by a veterans group, which claimed that TAAS discriminated against Hispanic and African-American students. "An education system in which 30 percent of students overall (and 40 percent of minorities) do not even graduate from high school is one to be deplored rather than applauded," wrote Walt Haney, a professor of education at Boston College. Haney, one of the expert witnesses hired to fight the state, put together two lengthy studies that, by his measure, showed academic achievement in Texas had grievously declined through the 1990s.  His studies were full of dramatic numbers – comparing ostensibly bogus TAAS scores with negative results from other state assessments, the SAT, the highly regarded National Assessment of Educational Progress (NAEP), drop-out figures that were some the country’s worst, and survey data that quoted dozens of disgruntled teachers.
Haney’s studies read like a slam-dunk for the plaintiffs; yet the judge ruled in favor of the state. As it turned out, some of the bad news Haney found in Texas was no different elsewhere in the country – and not especially attributable to TAAS; other negative trends were true but a partial picture, slightly exaggerated, or caused by changing procedures that were fixed by the time of the trial. Some remain subject to differing opinions about which tests really count, since Texas high-school students get up to eight tries at passing TAAS for graduation. Nonetheless, while ruling in favor of Texas, the judge acknowledged that its testing system was not nearly as effective as state officials would have everyone believe. 
One benefit of this filtering process is a relatively clean pool of information on what can and cannot be claimed about the power of standardized tests like TAAS. It also clarifies claims about any exercises connected to standardized tests, including computerized products like Accelerated Reader. So what can be known about the rise in Texas test scores? And what can be said about the value of those gains?
In winning its lawsuit, Texas emerged with several pieces of impregnable evidence that Texas students – particularly its minorities – have been more than holding their own against students in other states.  And TAAS seems to deserve some credit for these gains. In combination with the state’s school accountability system, the test specifically targeted low-income and minority groups, set the passing bar within their grasp, then moved it up slowly, "like a magnet," said a report by the Education Trust, pulling them into steadily higher levels of performance. This was the system’s "genius," said Uri Treisman, director of the Dana Center at the University of Texas at Austin.  Somewhat similar trends occurred in other regions of the country, as one state after another fixated on measurable academic standards, and struggled to raise them.
But when it comes to news on other fronts, standardized tests have had quite another story to tell. TAAS is a particularly graphic example. Pressures to excel on this exam have been so intense that they’ve leant new meaning to the term "teaching to the test," the phantom pressure that hangs over more and more American classrooms. This pressure has created a whole new industry dedicated to nothing other than preparation for these tests. In Texas, there are "TAAS camps," instructional videos for teachers, cram booklets, and tutorial software such as "HeartBeeps for TAAS," which, by mid-2000, an estimated 1,000 schools had purchased at $4,200 a copy.  In many schools, class-work was largely given over to test preparation from New Year’s through April.
Texas officials happily defend how seriously their schools take preparation for TAAS. That, after all, is the point of the test, and of the whole accountability system. William Mehrens, an education professor at Michigan State University and one of the testing industry’s most regarded experts, puts it this way: As long as a state’s curriculum and its test are closely matched, and both are sound, as he believes the Texas system is, then even if schools "teach to the test," students should come away with some real knowledge. Mehrens would be expected to defend TAAS, since he is generally a proponent of modern testing systems, and served as an expert witness for the state in the TAAS case. In fairness, a few diligent school districts have managed to stay true to Mehrens’ vision. An example is Mt. Vernon, a heavily black suburb of New York City.
In 1999, New York started using a new, "English language arts" exam, which asked fourth-graders to, among other things, chart the chronology of a story, understand the imagery of a poem, and write an essay using both works; another section of the test asked students to take notes while they listened to a story, and to then write a second essay that would prove they had understood the narrative. When roughly two-thirds of the district’s fourth graders failed this test (and a similar portion of eighth-graders failed their version as well), Ronald Ross, a forceful new superintendent, tried some academic tough love. After winning a 10 percent increase in the school budget, he hired a reading specialist and forced principals to start getting directly involved in the classroom. Teachers, meanwhile, were told to explicitly teach to the new test. So what did their students do? They started reading 30 minutes each night, writing in every subject, getting drilled in the difference between an essay that would indicate "mastery" on the new test, versus one that would be considered merely "proficient." They learned a graphic method of taking notes – and they took lots of sample tests. In other words, Ross created a shrewd, two-prong attack: Everyone practiced the tests, and, as he put it, "we said, ’What are the broad areas that this test looks at?’" Just judging from the numbers, Ross’ approach worked: the next year, half of Mt. Vernon’s fourth-graders passed the English exam, and the following year three-fourths did – a performance that trumped many wealthier districts in the state. One school’s pass rate jumped from 13 up to 82. Perhaps equally important, Mt. Vernon’s students (and teachers) seemed to be enjoying the work. 
The problem, however, is that Mt. Vernon’s experience is an exception, for reasons that aren’t likely to go away anytime soon. The problem begins with the fact that the U.S. has never had national academic standards, and, despite appearances, Bush’s "No Child Left Behind" law did nothing to change that. (While each state does have to show "adequate yearly progress" toward improved academic achievement, especially among the disadvantage, each state was given the right to set its own standards, and to meet them in its own way.) This naturally allowed quite a variety of academic environments across the 50 states. When each state then go shopping for tests on its standards, the nation’s handful of testing companies suddenly have to deliver up to 50 different products, which requires an amount of time, expense far beyond what the system has been able to provide. And the situation has not shown much sign of improvement. "The states are making matters worse," says Robert L. Linn, a professor of education at the University of Colorado, Boulder, and a nationally known testing expert. "They want to test later and later. They want the results sooner. And they all want the tests customized to their particular needs." As proof of how unrealistic these expectations are, Linn points out that results on NAEP, the respected national exam, are delivered more than a year later than any state test; NAEP also costs more than 10 times as much to administer. "The testing industry is stretched way too thin," Linn says. And Bush’s policies, he believes, "will stretch it even thinner." The net result, unfortunately, is quite a mess. As recently as 2002, while the curricula in various states was increasingly asking students to master broad areas of knowledge and sophisticated analytical skills, state tests were largely stuck quizzing students on their capacity for rote memory.  What everyone continually fails to realize, says Theodore Sizer, a leader in high school reform and the founder of the Coalition of Essential Schools, is that "Tests tend to test how one individual performs on that kind of test. It’s like taking a temperature in a hospital. It’s one important index, but it’s only one. We’re judging kids on the basis of their temperatures." And by all indications, there are plenty of ways to jigger the thermometer readings.
One is an old classroom trick: just cheat. In Texas, three Houston teachers and an administrator were forced to resign at one point after they secretly corrected test answers; in 1999, Austin’s school district was actually indicted for tampering with test documents.  Statewide, school officials added another dodge. Like many states, Texas has habitually slotted low-scoring students into special education and other programs, where they’re either exempt from the tests, or are excused from having their scores included in accountability measures. That of course boosts indications of achievement by everyone else. Over the years, Texas has begun cracking down on this sleight of hand. Nonetheless, in 2000, the state was still excluding many more minorities than whites from TAAS exams. With low-income students overall, 14 percent were being kept out of TAAS, as compared with only 6.6 percent for whites. 
Inside Texas classrooms, teachers came up with other innovations. In a study for the Harvard Civil Rights project, two university professors from Texas concluded that efforts to prepare for the exam were diminishing the quality of schoolwork – a phenomenon that Linn has noticed in many areas of the country. This was particularly prevalent among minorities, the Texas professors said; while they were being treated to test-preparation drills, white, middle-class students were getting involved in activities such as creative writing projects, science labs, and problem-solving approaches to mathematics.  One particularly gross example was a largely Hispanic high school in Houston. Despite having virtually no library budget, the school spent $18,000 – almost its entire instructional budget – for commercial test-preparation materials that replaced teachers’ lessons. Across the state, the researchers found, students were learning how to look for words linked to the right answer, instead of laboriously reading and thinking about a passage of text. "In many classrooms," Margaret Immel, a Rice University reading expert, told The Washington Post, "the joy and magic of reading is being replaced by drudgery."  Scores may well rise, the Texas professors noted, but "high school teachers report that many of their students are unable to use those same skills for actual reading. They are not able to make meaning of literature, nor to connect reading assignments to other parts of the course such as discussion or writing." By some indications, student preparation for college isn’t showing much improvement, either. A group of Houston high school seniors who passed TAAS were soon shocked when they did poorly on their college boards.  At the University of Texas, at Austin, admissions officers say that despite rises in TAAS scores, they’ve seen no improvement in the skills of their applicants. 
It may not be surprising, then, that despite Texas’s obsession with measurable achievement, some out-of-state measures of its progress – especially in reading – aren’t quite so glowing. On NAEP scores, for example, the state has never moved beyond average showings in reading. This raises questions not just about the quality of TAAS, but also, by extension, about programs the state uses to teach reading, such as Accelerated Reader. Don Peek bases much of his Pittsburg story on the notion that TAAS is an "upper level thinking test." Yet the Education Trust, which compiled its report for The Business Roundtable (presumably an organization that was looking for good news in George W. Bush’s home state), came to the opposite conclusion. "No one, including Texas education officials," the Trust reported, "would argue that the current TAAS tests do a very good job of assessing sophisticated kinds of knowledge and skills." The exams, it said, "are weighted toward less-challenging subject matter and have fairly low-level achievement benchmarks."
As proof, the Trust cited a survey by Education Week which found that Texas relies unusually heavily on multiple-choice questions, making it one of 12 states that include no essay or even short-answer questions on subjects other than reading and writing. Another study by the Education Trust, in 1999, found that the 10th-grade exam (the passing of which is a pre-requisite for graduation) included "far fewer items from higher-level math topics than did tests in Kentucky, Massachusetts and New York." Testing standards are so low that in early 2001, when Texas announced that the test would be revised in 2003, it issued a warning. If schools don’t act soon to improve instruction, the state said, three out of five students will fail it.  And the testing game was reset once again. As an indication of the legacy being left on the testing front by President George W. Bush, in 2002, Tony Sanchez, the Democratic candidate for governor in Bush’s home state of Texas was talking about increasing student testing to the point where it could be done every day. “We have the technology to do this,” Sanchez said during one campaign speech.
Testing the Numbers
At this point the questions become obvious. Here we have a standardized test in one state with several unique strengths, and some significant weaknesses, particularly in reading. We also have students in some schools beginning to perform well on that test after extensively using reading products made by Renaissance Learning, Inc. What are the connections? And what broader story do they tell about testing and classroom technology, and about our resulting concepts of learning?
Part of the story is told by the history of STAR, Renaissance’s computerized "diagnostic" program. The background on this little program matters because STAR is structured to be "computer adaptive" – one of the new forefronts in testing technology. In this form of examination (which has been used for years in Graduate Record Examination, but has yet to filter into the younger grades), as students answer each question, the computer automatically adjusts the difficulty of the next question that comes up. (Questions get harder when students answer correctly, and easier when they answer wrong.) This innovation’s great appeal is that it generates assessments in a fraction of the time of a traditional exam. STAR, for instance, consists of no more than several dozen brief questions, and can be finished in under 10 minutes. On the positive scale, this turns a computer-adaptive tests into a kind of scholastic smart bomb: by continually pushing at students’ limits, it gradually zeros in on the edge of their abilities, exposing strengths and weaknesses in greater detail than the typical test. Students at the low end, for instance, suddenly "have a chance to show you what they do know instead of just what they don’t know," says Linn, the testing expert from the University of Colorado. The reverse then happens for top students, who are used to acing tests aimed at the whole class and now have to face rounds of questions they can’t answer. Unfortunately, this also means that students in the great, gray middle are protected from these intellectual bombardments. Since their tests will bounce up and down around the middle of the scale, assessment of average students with computer-adaptive tests has been much less exact. For all their failings, standardized exams avoid these peculiarities – first, by offering everyone a much larger number of questions; and, second, by making the questions the same for everyone. All of which leaves testing experts feeling simultaneously intrigued by computer-adaptive testing’s untapped potential, and nervous about its hidden side-effects.
Some of those side-effects can be glimpsed in the rest of STAR’s story. Before contracting with Beck to make STAR, Paul approached TASA, the testing and evaluation firm that has contracted for the New York State board of regents exams. Steve Ivens, a researcher and former TASA vice president, remembers the negotiations breaking down "because we wouldn’t make it short enough for him." STAR’s brevity soon spawned additional troubles. To satisfy the program’s promise that it will assess individual skills, Renaissance gave the program the capacity to generate nine specific, "diagnostic" reports. These comment on everything from whether a beginning reader has mastered sound and word recognition to whether an advanced reader can use indexes and glossaries, preview chapters before reading, or take notes while studying. But the assessment is something of a fiction. All it knows is whether the students did well or poorly on the STAR test. Based on that, it guesses at where the problems may lie – guiding teachers with canned bits of evaluation from the computer’s database. For example, one such report says "These scores indicate that Kim likely reads many different types of literature for pleasure… He or she is able to read critically and uses reading skills to solve problems in different subjects." There is no evidence, though, that Kim can do anything of the kind. The assessment is then followed by equally general, canned bits of advice. ("Practice evaluating and making judgments about texts." "Acquire a working vocabulary of literary terms." And so on.)
At its heart, judging from the views of outsiders who have worked on the program, STAR was meant to be only a general indication of how a whole group is doing – and ideally a younger crowd, where the variety in skills is narrower. But, as is often the case with testing technologies, many teachers – at Renaissance’s encouragement – have begun using STAR as a more definitive measure, and in all grade levels. Some have even used it to place students in special classes – targeting them, sometimes wrongly, for either remedial work or "gifted and talented" programs. The test, says Michael Beck, was never designed for such consequential decisions.
That argument – that no single test should determine a student’s fate, let alone an entire school’s – has long been the testing experts’ mantra. Yet, year after year, schools across the country, and now national and state policymakers, ignore the experts’ advice. The issue here is more than a difference of opinion. Quantitative testing has always been a slippery puzzle. Whenever testing administrators get a handle on one problem, such as the complications of computer-adaptivity, other problems pop up somewhere else. Suddenly, smart bombing becomes a giant wack-a-mole game. Consider the situation in another corner of the Renaissance program.
When Tim Shanahan, the University of Illinois reading expert, was reviewing the studies on AR, he got a glimpse at some of why Peek’s old middle school students, and many others around Texas, had been able to raise their TAAS scores. After noticing how closely Renaissance quizzes mirror standardized tests, it occurred to him, he said, "that students are just getting a lot of testing practice." The deeper problem here is that standardized tests are never as standardized as they seem; they always contain variances–caused by the way questions are asked, by the way teachers taught the material, or by confusion in test directions. When students relentlessly practice testing’s routines, they gradually learn how to keep all these variables at bay. As Shanahan put it, "they get all those variance options lined up on one side of the equation. It’s sort of a fake way of raising your scores."
If Shanahan’s critique is on target, these troubles play out with radiance in Renaissance’s other main initiative – a relatively new and complex product called Accelerated Math. While the math program’s aim is, like AR’s, to stimulate work done away from the computer, it depends far more than AR on computer technology than does its literary sister. Its format is also akin to the way an increasing number of other companies, and school districts, are using computers to prepare students for standardized tests.
In essence, Accelerated Math is a system for generating math quizzes, and for tracking each student’s work on those quizzes. The product is loaded with handy functions, one of which is a scrambling procedure made possible by the program’s algorithms. On any given test, therefore, each student gets slightly different problems. This lets teachers turn students loose to help and learn from each other – "collaborative learning," in education-speak – without risk that they’ll steal their neighbors’ answers. As students pick away at their work, they go through very much the same motions that are required on standardized tests. Not only are most problems framed as multiple-choice questions (with the familiar four options), but their answers are also recorded on a narrow, computer-ready card. For each question, students fill in a tiny bubble, just as they do on the SAT and other exams. (Renaissance programmers even buy old standardized tests, which they use as a guide when creating questions.) For schools that prefer to test students with essay questions – the latest trend in testing design – the program offers what are called "extended response" quizzes, where students must show and explain their figuring. The program’s complexity does require more gear – a computerized scanner, and a top-of-the-line printer, since teachers typically generate hundreds of pages of quizzes each day. When everything’s working properly, though, teaching can become very easy.
That ease is the math program’s huge selling point, which Ann Lubas, one of Renaissance Learning’s lead presenters, made crystal clear in Las Vegas. In a packed afternoon session on Accelerated Math, Lubas told the crowd that when she first discovered the product as a teacher, she suddenly realized, "I didn’t have tests to grade. I didn’t have tests to take home and write. I’m in love with this program!" Many of her customers are also in love with the way Accelerated Math imitates standardized tests, since most test-preparation programs don’t bother to go that far. "Our math scores have really come up," an elementary school teacher from Merced, California, told me. A main reason, she said, is that "the terminology for the SAT 9 is getting reinforced."
The reinforcements don’t stop there. Accelerated Math also includes a database of computerized charts that pinpoint, say, which fourth-graders students are struggling with long division, or which high school trigonometry students still don’t understand sins and co-sins. The charts even include little red prompts to tell the teacher which students have been getting wrong answers long enough that it’s time to intervene. It’s hard to fault a support system that seems so dynamic, so multi-faceted, and so individualized. But the classroom reality does not quite align with the picture painted by test scores.
The first reality check involves the functionality and cost of high-end testing tools like these. In Las Vegas, when I stopped by a Renaissance booth for a demonstration of Accelerated Math, over the course of an hour or so, teachers continually came up to complain about one seemingly intractable problem after another. One teacher couldn’t move student records from grade to grade. Another said her school had bought the program when it was first released, and that it was always crashing. The program demonstrator explained that she had bought the pilot program – a "quick and dirty" version that was full of bugs, which the company had since abandoned. The teacher didn’t find this terribly reassuring. "We paid a lot for it, I think, about $1,500," she said. "I was very frustrated." Other schools have paid more. According to a fall, 2000, catalogue, a basic "Starter Kit," which includes a scanner, a year’s technical support, and material for one grade, cost $1,899. A "Super Kit," which adds the program’s diagnostic test but still covers only one grade, sold for $3,299.
What teachers probably don’t see is that Accelerated Math’s diagnostic program employs the same kind of guesswork that the reading program does – a weakness that plagues a good bit of educational software. For example, when Sarah, a fifth-grader, scores in the mid-range, the diagnostic program ("STAR Math") reports the following: "These scores indicate that Sarah has a firm grasp of whole number concepts and operations. She has a basic understanding of fractions and decimals, but she needs to keep working on fraction and decimal concepts and operations." Sarah may not have learned or even practiced any of those functions. It’s another canned evaluation – based on a national sample of fifth graders, and its assessment of their skills.
When asked about this, Terry Paul pointed out that teachers are entirely free to work students through detailed understandings of what each math procedure means. If anything, Paul argues, the product encourages that kind of individual attention. "I don’t think kids can get through the objectives if the teacher is not personalizing, and having those conversations. A lot of teachers have difficulty transitioning from traditional lecture to individual attention. You’ve got to have a system that allows that without chaos." As logical as Paul’s argument sounds, it leads to the second reality check with Accelerated Math – the quality of schoolwork it inspires. Some former Renaissance staffers recall being astonished at the classroom scenes they witnessed when they visited schools to help them work with the program. "I just shuddered," one former employee told me. "Teachers had totally abandoned the text, totally abandoned lectures. It turned a math class into a bunch of monkeys working problems." One teacher apparently concluded that the program did such a terrific job that she gave up lecturing and boxed up her textbooks.
But what’s wrong with encouraging students to work problems? If a program’s skill assessments have their limitations, wouldn’t the specificity and relentlessness of the assignments make up for it? The distinctions here are perhaps best explained by Judah Schwartz, the former co-director of the Center for Educational Technology at Harvard University, and a long-time specialist in the teaching of mathematics.
In Schwartz’s view, all classroom instruction methods fall into two main camps: those that strive to teach skills, and those that aim for understanding. While the two faculties are obviously related, they are not the same, and they breed very different teaching methods. (As proof, the decades-long war over traditional and progressive education is little more than an endless re-staging of this battle, with the traditionalists fighting for skills to take priority, and the progressives fighting for understanding.) This feud has become heatedly delineated in the "math wars" and "reading wars" that have been causing curricular turmoil in California, Massachusetts, Texas, and several other states. Like many educators who have stayed out of these skirmishes, Schwartz believes both skills and understanding are critical, and that neither one has the right to take precedence over the other. Numerous educational reform initiatives – progressive, traditional, and many in between – have faltered because they forget this obvious fact, and become far too comfortable leaning on only one leg.
This, Schwartz believes, is Renaissance Learning’s handicap. "Accelerated Math may be a good way to develop skills," he told me. "But I don’t think it develops understanding." More to the point, Schwartz says, "you wouldn’t know with this program whether understanding was weak or strong." Terry Paul, of course, disagrees, arguing that students’ success on advanced placement tests, and later in college math classes proves that the program is powerful. "The AP [Advanced Placement] exam is designed by professors of mathematics," Paul told me. "Most people think students who can do that have a pretty good understanding of math. On any practical method of defining understanding, it seems to work."
Things may not be quite that simple. Mike Russell, a senior research associate at Boston College’s Center for the Study of Testing, Evaluation and Educational Policy, has been surveying the way technology and testing interact in schools across the country. He found a number of teachers who are delighted with Accelerated Math, and other programs like it. Those teachers tend to be pleased that they have more time to work individually with students, and that the computer drills can be directly aligned to the states’ growing phalanx of academic standards – and thus to state tests. "When a teacher reaches that part of the curriculum standards, they can just go to the software," Russell says. But the students’ intellectual experience in these classes has been another matter. Accelerated Math, Russell concluded, "seems to be good for kids that need a lot of math work. If they’re good at math, it’s not of much use." Accomplished students, it seems, are ready to experiment and range widely, in a fashion that mechanized drills – no matter how much teacher attention goes with them – can’t accommodate. This is why test-prep programs have trouble shedding their curse of narrow-minded monotony. One of the most dramatic illustration’s of testing’s weakness occurred in the final days of 2002. This of course was a hot moment for high stakes tests, a time when more than half the states in the country were vigorously using them. Several days after Christmas, a massive study of academic testing, the largest ever thus far, reported the following: Student performance in test-intensive states may well rise on state tests. But when performance was considered on the big national exams—the Scholastic Aptitude Test, ACT (an SAT competitor), NAEP, and AP exams—most of these states slipped in comparison with the national average. To make matters worse, the study also found rising dropout rates in test-intensive states, coupled with a jump in enrollment in programs offering equivalency diplomas. Apparently, this was not only because students are intimidated by test pressure. The researchers also found signs that administrators, in a panic about raising test scores, occasionally encourage failing students to drop out. 
If anything, the canned responses in programs like Accelerated Math aggravate troubles of this sort. They can mislead teachers into believing that their students have developed a profound understanding of division or trigonometric equations when nothing of the kind has occurred. This can happen even when students get the right answers. As an illustration, a former math teacher told me about a game he used to play with high school students. "I would ask them what the sin of an angle is. They could all give me numerical answers, but no one could tell me what it means." He’s found the same patterns of ignorance in advanced math classes. "If you asked college calculus students to explain, in English, what the derivative of an equation is, most couldn’t do it. But if you put an equation on the board and asked them the derivative, they’d do fine."
But can these kinds of shortcomings be blamed on a computer program? Isn’t it up to the teacher to use the program any way they want – poorly or intelligently? In reality, those choices can be a mirage, another e-lusion. When tools are powerful, as Accelerated Math and many of its computerized cousins are, busy teachers tend to defer to their definitions of what work needs to be done and what work doesn’t; the teacher who put away her textbooks is a perfect example. "The tool is exceedingly seductive," Schwartz said. "People begin to forget about the rest of the teaching job, because the tool makes it possible. The machine screams for so much attention that nobody pays attention to anything else."
Standardized testing may have its failures, but it remains the boss. Considering the faith that millions of people continue to put into this mechanism’s data, it is important to look at what, in the aggregate, computer products of this type have done for test scores. Here again, Renaissance’s track record provides a fitting set-piece.
With more than half the nation’s public schools as customers, Renaissance Learning now has a hefty presence in American education, albeit mostly with reading instruction. Yet national reading skills during the company’s period of robust growth seem to have remained stubbornly flat. Judging from NAEP scores, between 1992 and 2000, the skills of scores top students rose slightly, while those of low-performing students fell more significantly. Interestingly, the gap between top and bottom students widened among all racial and ethnic groups. Specifically, two-thirds of the students tested fell below the level that the federal government considers proficient, and 37 percent couldn’t meet even the standard of basic knowledge. (This involves the ability to get beyond simple words and phrases and to draw conclusions from what one has read.)  Tennessee offers a particularly dramatic snapshot of this contradiction. Renaissance’s penetration with Accelerated Reader grew here during the 1990s from 20 percent of the state’s schools to nearly 70 percent. During that time, reading scores in the state generally declined. 
In the context of trends like these, why do teachers keep thinking that tools like Accelerated Reader and Accelerated Math will save them? Among other things, technology’s evaluation tools seem to continually get better. Once President G.W. Bush’s program took hold, that trend began picking up real speed, which could be observed by any administrator who sat in a school district’s buyer’s chair. One person who found himself in that position is Tim Shanahan, the director of the Center for Literacy at the University of Illinois. In late 2001, Shanahan took a leave of absence from the university to direct the reading program for the Chicago public schools. In the following months, Shanahan was barraged by vendors selling new software products, complete with "diagnostic" programs supposedly capable of compiling profiles of each student’s strengths and weaknesses. Like the Renaissance STAR products, this software breaks down test data to show performance on specific fronts – reading comprehension, word recognition, fluency, and so forth. Shanahan turned down almost every one of them. "It’s just a repackaging of test scores," he said. "They aren’t as personalized as they seem." The reason is that the categorization of skills is so arbitrary, and the number of questions in each category so limited, that the data isn’t reliable. The core data here – the base tests – are only designed to tell whether someone got an answer right or wrong, not why. Packaging that information into a "diagnostic" assessment, Shanahan concluded, "can be very inaccurate. You’re telling schools to prescribe instruction based on a flip of the coin, essentially." Nonetheless, sales of this genre of software have been plenty robust. "The stuff looks so good," Shanahan says. "It looks like you’re getting a lot more information, and with four-color charts. Schools believe if they teach to these patterns, they’ll raise achievement on standardized tests. The evidence of that is not really there."
The testing industry is fully aware of complications like these. They also know that no matter what they do to make tests a fair and sophisticated measure of real learning, someone will always find a short cut. So the industry toils on, tirelessly trying to stay one step ahead of the Terry Pauls of the world. The challenge here is plenty familiar. The promise of strict accountability and testing are old lovers in the education world. They have been featured partners during numerous back-to-basics reforms over the years – first in the 1890s, followed by the 1950s, the 1980s, and now again in the early years of the 21st century. Interestingly, each of those movements occurred during a particularly conservative phase of America’s political history.  This is not to slight those periods or their initiatives; the radical 1960s gave birth to their own excesses in experimental education. It should merely be remembered that any severe swing in the nation’s political mood produces fervent education visions. And those visions tend to be passing phases rather than enduring absolutes.
Back in 1994, when George Madaus wrote his critique of testing technology for Harvard XXEducationalXX Review, he offered a personal anecdote that has proved to have a long echo. In 1992, Madaus led a study of standardized math and science tests that were supposed to be state-of-the-art. In so doing, he discovered that the tests were "overwhelmingly measuring basic computational and processing skills" – not the "complex problem-solving skills presently demanded by the math and science communities." It sounded just like the TAAS story, which suggests a strange pattern: Almost every time a new test arrives, it’s touted as a true measure, at last, of the sophisticated skills everyone has been crying for. Within a few years, when schools start figuring out how to ace it, everyone realizes the test is far too simple after all. And the cycle of utopia and disappointment starts all over again.
If schools are so familiar with this merry-go-round, why do they keep falling for its allure? One reason seems to be that, as on any circus ride, new technologies are beguiling toys. More than almost any other social enterprise – more than science, more than art, more than literature, more even than industry itself – technology runs on the glow of novelty. The most successful technology developers know this; when building a new product they hope will have legs, they usually couple its novelty with two additional attractions: greater ease and greater speed. This combination (it’s new, it’s easy, it’s fast) is a powerful triumvirate. To the modern mind, it equals progress. And that, to Madaus, is precisely the problem, especially when technology mixes with testing.
Like many old-line educators, Madaus gets nervous when something comes along that proposes to lighten the schools’ burdens. Those temptations, he argues, sideline alternative ways of doing things that might have been more time-consuming but were also more valuable. "A danger of high-stakes testing programs, as with many technologies," he writes, "is that they depreciate certain ends by making other ends more attainable and, hence, more attractive." It’s an old pattern. Everyone, particularly adventurous Westerners, wants to believe that a better technological solution always lies just over the next horizon. And this, Madaus fears, "will blind policymakers and the public to the reality that we cannot test, examine or assess our way out of our educational problems."  William F. Goodling, a former Pennsylvania Congressman and chair of the House committee on education and the workforce, once put it this way: "If testing is the answer to our educational problems, it would have solved them a long time ago." 
The ultimate point here is painfully obvious: Learning, it seems, is can be only partly measured quantitatively. It’s an enterprise, rather, that is deeply psychological, frequently emotional, and thus inescapably subjective. To ignore this fact, to force millions of teachers and students to turn all we have learned about the mysteries of the mind and the human soul into a narrow numbers game is an insult to science, and an abrogation of social progress. This is not to suggest that there shouldn’t be state or even national academic standards. But there has to be better ways to hold a school, or a state, to this standard of "accountability." One way would be to put some real money and oversight into testing. But given the government’s long history of seeking the cheap and easy way out in America’s classrooms, that seems unlikely. Another idea is to follow the testing experts’ long ignored advice – that is, to use a variety of measures, rather than a single, year-end test, to rule on the future of a student, a school, or an entire district. One of those measures might allow more room for human judgment. This seems impossibly subjective; if nothing else, it too may cost more money, since it probably requires hiring more smart adults in the schools. But if we are going to evaluate a young person’s progress, it only stands to reason that the wisdom and experience of an older person, who knows a youngster outside the test room, should carry some weight.
If we avoid this challenge and continue seeking refuge in quick and easy numbers, the political world’s much hoped for higher standards are likely to remain very elusive. Bush’s "No Child Left Behind Act" almost guaranteed this fact, signaled by its title. Leaving no one behind means making sure everyone passes the finish line. And everyone can’t accomplish that, at least in a timely fashion, unless the finish line is dropped back. In 2002, that is precisely what an assortment of states started doing with their numerical standards, as they realized they couldn’t meet Bush’s requirements.  In the education world, this is known as the Lake Wobegon game – named after the mythical town in Garrison Keillor’s public radio show, where "all the children are above average."
Going For The Gold
Within the educational software world, a good deal of excitement about any company’s products, from both students and teachers, is sparked by their seductive extras – "bells and whistles," as educators call them. Some companies, like the makers of HyperStudio or Reader Rabbit, use entertaining treats, such as musical ditties, multimedia cartoons, and other beguiling visuals, which are gradually handed out along the way. (HyperStudio enthusiasts take great delight in their multimedia excesses; at company conferences, a favorite line among HyperStudio loyalists is "Let’s Overdo It!") Renaissance Learning avoids this sort of visual chaos, choosing instead to focus on a single but powerful whistle: its system of points and prizes. Whatever the scheme, all these temptations put the educational software industry squarely in the middle of yet another hot academic debate: Does it help or hurt to lure students into doing academic work with rewards?
The use of rewards in school is nothing new. For decades, teachers have been routinely handing out candy or commendations or special privileges to students who do exemplary work. A handful of other modern reading programs, both high-tech and low-tech, draw on this tradition as well. Because Renaissance’s system of rewards is so overt, it has become something of a lightning rod in this debate, which has served to illuminate the age-old question of how to properly motivate young minds.
Company trainers are all too familiar with the complaints about Renaissance’s points and glittering prizes, and they usually try to deflate them before they even arise. During Don Peek’s Pittsburg story, for example, he paused at one point to explain his school’s rationale for the rewards. "Our kids were poor!" he told his audience. "Those points were the only money they had in their pockets. But ’ahm ’on tell you, after several years, we didn’t really need all of that stuff. It was just a jump-start, ’cuz we had to get ’em excited about it. Then they’re going to find those favorite authors, and those favorite genres. And they’re going to grow on their own intrinsically."
Peek hit the issue on the nose; he even framed it the way the experts do. The central question here is whether it’s possible to boost learning purely through "intrinsic motivation" (that is, by getting students to do something for its own sake, for its own intrinsic pleasures). Or do students need to start with "extrinsic" motivators? (Pizza parties, hall passes, multimedia cartoons, shoot-em-up characters, the list can go on and on.) In today’s high-stimulus world – crowded as it is with commercial temptations on every corner, every channel, every web-site – it seems that Renaissance Learning knows what has to be done to hold a youngster’s attention.
Once again, though, the data doesn’t quite play along. Among the many critics of rewards, the most noted is Alfie Kohn, author of the book "Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A’s, Praise, and Other Bribes." Kohn expects a lot from us; one could even say that his vision is idealistic. Grades, prizes, praise – to Kohn, they’re all modern implements of behaviorism, as defined and popularized in the 1960s by the pre-eminent lab rat scientist, B.F. Skinner. Kohn’s main argument is that rewards of any type train children to behave like pets: They learn to do tricks for treats. (This is true, he argues, in the adult world as well, with incentive plans being the prime culprits.) "Do this and you’ll get that," Kohn writes.  When a teacher, parent, or boss sets up this dynamic, Kohn believes, an underling’s engagement with the activity is immediately bifurcated. Are students reading because they like the story, or because they like the pats on the back and the free basketballs? It’s hard to tell.
Although Kohn’s criticisms may sound extreme, they’re based on an impressive bit of examination. Some involves little more than rhetorical logic, but a fair portion derives from scientific studies. These are not the private, selectively presented kind of research that has characterized Renaissance literature; for the most part, Kohn’s sources are carefully controlled studies, vetted over the years by a succession of neutral experts. And, in a different literature survey, where studies of Accelerated Reader and other reading "incentive" programs were reviewed, the conclusions have been the same. As its author, Jeff McQuillan, reported, "…there is no clear causal relationship in any of the studies conducted so far between the use of rewards and an improvement in reading attitudes, achievement, or habits." In fact, in some if these studies, students given rewards for their work did worse than an equivalent group of students doing the same activity without rewards.
The most important question is how students do once rewards are out of the picture. "If the jump-start is to work," McQuillan wrote, "we should know if the engine is still running a few miles down the road." Apparently, it wasn’t. In general, students lost interest in reading, forgot what they’d read, or drifted to books that were easy. "The internal motivation that is supposed to ’kick in’ doesn’t," McQuillan concluded. "Instead, the external rewards in effect short-circuit the motor. Rewarding children to read may therefore lead to less reading in the long run, not more." 
As might be expected, the issue is not quite this simple. In fairness to Terry Paul, there are a few important distinctions that McQuillan, Kohn, and many others in the anti-reward crowd tend to gloss over. Mike Milone, a private researcher who has consulted for Renaissance Learning (and who trained as a "behaviorist") asserts that what matters are the specific kinds of rewards people get. To many of us, all of life is driven by rewards, in an endless array of ethical shapes and sizes. They include everything from a mother’s love and smiles, to short-term treats and privileges, to enduring pleasures such as success and the respect of our peers. Renaissance’s points and prizes, Milone acknowledges, are "the lowest kind of rewards there is." The teacher’s challenge, therefore, is to show students how to climb life’s ladder to the morally meaningful.
Unfortunately, the software industry’s ladders may be missing a few rungs. In Renaissance’s case, the main reason that academic studies show performance falling off once rewards disappear, Milone argues, is that teachers "have failed to help students make the transition to higher level rewards." The research actually backs Milone up. It also indicates that, while this could change, that feat has generally remained beyond the skill of the average teacher.  It also seems to remain beyond the skill, or the interest, of the average software producer.
Consider the young life of another math software program, one that has stood quite tall in the family of "educational games." The program is a small, $20 adventure package called "The Logical Journey of the Zoombinis." Like all good games, this one has proven substantive enough to often hold the interest of young and old alike. (Children have been so captivated by it that they’ve created their own Zoombini stories; some have even invented a Zoombini language.) Created in the mid-1990s by a Cambridge, Massachussetts, non-profit called TERC, the game bills itself as dedicated to "the art of mathematical play." The central plot revolves around a band of tiny people who have lost their homeland, and who must overcome a series of death defying obstacles to find a new home. Since each Zoombini has slightly different physical characteristics, the fun – and the intellectual challenge – is in figuring out what it takes to navigate each adventure (specifically, which combinations of Zoombinis, and which sets of choices at each challenge, will get everyone through). Infantile as this may sound, the game allows for 625 possible combinations, a tiny sample of which will quickly stump most adults. These combinations, it should be noted, are structured by the rules that govern such high-minded mathematical procedures as database analysis, algebra, base-5 numbers, algorithms, and mathematical objects such as vectors. The solutions also change each time the game is played. The whole experience acquaints youngsters with important principles of logic – how to look for patterns, for instance, how to reason and organize evidence, and how to systematically test one’s hunches. While the Zoombinis’ journey has its share of cartoon-like effects, it quickly becomes clear that the main rewards in this adventure are relatively intrinsic – that is, they lie solving the game’s logical puzzles.
As an indication of how education fads can ruin a good thing, when the Zoombini game was updated, the software producers put a little assessment system on the back end. The idea was to help teachers track student performance, in keeping with Bush’s emphasis on accountability. This meant that youngsters were told, for instance, that their skills at "mudball wall" were unsatisfactory. "It’s ridiculous," says Gary Stager, the adjunct professor of education at Pepperdine University and a partner in techno-rebellion with Seymour Papert of MIT. "That’s like being graded in chess." Many educators hardly blinked at the Zoombini update, though, because so many educational software products have been adding equivalent functions, to their general delight.
All is not bleak, however. While computers have their limitations, they are supremely capable of tracking aspects of what their users are doing, potentially in some detail. It was only a matter of time, therefore, before an enterprising scientist discovered ways to blend this power with academic work, yielding a truly potent diagnostic program. Apparently, that is exactly what several researchers in cognitive science at Carnegie-Mellon University did. After 12 years of experimental research, much of which included studying patterns of the brain, they developed a program called, aptly enough, "Cognitive Tutor." It too focuses on math skills (specifically algebra and geometry). While students work at the program, the computer creates a "cognitive model" of what they’re doing – based on choices such as how long they take to solve certain problems, which questions they get right or wrong, how often they ask for help and on what topics. Like a good live tutor, the product does not simply correct students when it intervenes but asks them questions about their work, and does so with increasing specificity. This enables the program to assign additional problems, targeted with some accuracy to topics that need attention.
Despite Cognitive Tutor’s power, its creators realized that the work it fostered is still low-level, when compared with what a teacher can do. So they turned the program into an entire math curriculum, restricting the computer work to 40 percent of class time. During the other 60 percent, students interact with the teacher, who guides discussion about the math problems. Those problems are not narrow, multiple choice questions of the sort that speckle Accelerated Math and other standard classroom exercises; instead, students are given just a few relatively complex mathematical dilemmas, posed in text form, which they then work out in pairs or in teams. "The students are doing the mathematics, not the teacher," says Bill Hadley, who spent 28 years as a teacher in Pittsburgh schools before becoming president of Carnegie Learning, the company that makes Cognitive Tutor. Like many courseware makers, Hadley believes he has definitive statistical proof that his program improves achievement. "We are the poster-child for data," Hadley told me, sounding very much like Terry Paul. All boasts aside, Carnegie Learning has been more careful with its research than its competitors have. Most examinations of Cognitive Tutor’s effects are robust, "controlled" studies of truly equivalent groups. And virtually all show gains that are not only dramatic, but that have hold on for years.  By all indications, this program, by forcing live instruction to blend with pinpoint technology, builds understanding as well as basic skills.
Unfortunately, it is also relatively alone. Like all maturing industries, the courseware business has been going through a shake-out, which has of course been sped by the recent downturn in the economy, and in the technology sector specifically. Research and development budgets have thus been slashed, which is why superficial add-ons like the Zoombini assessment have been more the norm than the intense diagnostic work of a Cognitive Tutor. Interestingly, in the early years of the new century, the brightest business opportunities in courseware lay in the higher education market. Any economic downturn brings a rise in the number of out-of-work people returning to school. And demand for back-office administrative services and student record keeping systems are most intense at the collegiate level anyway; as terrorism worries accumulated, the pressures on those systems only grew, following new reporting requirements on international students. A fall, 2002, report by Gerard Klauer Mattison, one of a handful of investment firms that closely track the education market, offered an enlightening snapshot of the way industry looks at computerized learning. The company noted that while education spending, per student, has grown more than 130 percent since 1971, the nation’s primary achievement measure during that time – the NAEP scores – “have shown virtually no improvement.” Yet GKM was still “bullish.” After a financially flat period in 2002 across the field, from kindergarten through adult training courses, companies that do business with schools should, it said, start enjoying an annual growth rate of 6.8 percent. This would raise industry revenues from $102 billion in 2001 to $142 billion in 2006. The firm also expected the economy to rebound as early as 2003, and total education spending to pick up soon thereafter, reaching $1.08 trillion in 2006 – double what it was in 1993. Along the way, GKM said, the “greatest industry risk is government regulation.” This particularly applied to for-profit education companies, such as the Edison Project, which “may be subject to decision-making that is based more on politics than on business fundamentals.” Offsetting that danger, GKM said, are several “regulatory improvements.” Chief among those are President Bush’s “No Child Left Behind” law, which created incentives for new assessment mechanisms; and a June, 2002, U.S. Supreme Court decision upholding the use of vouchers in Cleveland, Ohio. “As private managers demonstrate superior results,” GKM said, “we believe more local and state governments will choose to outsource their schools.” [Source: “Back to School 2002,” a report by Gerard, Klauer, Mattison, an investment firm based in New York and Los Angeles, issued Sept. 10, 2002, pp. 2, 4, & 21.]
But the K-12 market was quickly finding its second wind, helped in no small part by President Bush’s emphasis on testing. The Princeton Review, Score, the Sylvan Learning Systems, Kaplan – all these old test prep outfits were creating new computer services, both online and off-line, to help students find quick ways to meet the new standards. One general development in the courseware market seems to have been a shift from products that might replace teachers, or some of their functions, to those that simply make their lives easier. "The smart companies realized the teachers would then support them instead of opposing them," says Jeffrey Silber, an education market analyst with Gerard Klauer Mattison. Renaissance Learning clearly learned this lesson a long time ago. That insight, however, has also contributed to the company’s main business problem. Renaissance has traditionally sold its wares mostly by exciting teachers, then depending on them to sell their bosses back home in the principals’ offices. By 2002, with school budgets getting tight, district officials were pulling in on the spending reins. This meant Renaissance now had to build a sales campaign that could reach district administrators. This was no small challenge, because Renaissance’s competitors had already been doing business with these officials for years.
The Inside Story
From all accounts, the founders of Renaissance Learning, Inc., have known about the limitations of their materials – or at least their conflicts with prevailing wisdom – for a long time. Over the years, various senior employees say they have repeatedly urged the Pauls to pull back on the claims they make about what their products can do for schools. Some have begged the Pauls to acknowledge what the research really shows: that once other influential factors are properly accounted for, the program (if properly used) does provide some boost – but only a marginal one. The Pauls consistently resisted these pleas. Sometimes they seemed to disagree with them; sometimes they seemed to fear them. In at least one case, they acknowledged that few schools would shell out the sort of cash and time commitment the product requires for only a marginal boost.
While internal questions floated around company headquarters, criticism of the program occasionally surfaced on the outside through the media. Whenever this occurred, the company immediately went on the offensive, launching an aggressive public relations campaign to quell doubts about the program before they had a chance to catch on. Perhaps the most dramatic demonstration of the company’s armored strategy occurred, interestingly enough, in its prime stomping grounds – the state of Texas.
In the fall of 1996, the School Library Journal published an article entitled "Hold the Applause! Do Accelerated Reader and Electronic Bookshelf Send the Right Message?"  The article was written by Betty Carter, an associate professor in the School of Library and Information Studies at Texas Women’s University in Denton. Carter, now a full professor, drew her arguments largely from her own study of several AR-using grade schools and librarians. In addition, one of her graduate students had surveyed fifth-grade AR books in non-fiction and found that 89 percent had never been reviewed by a reputable publication. And the 11 percent that had were not always reviewed positively. Carter’s article did not mince words. It criticized both programs, which function similarly, for their limited choice of material; for their emphasis on testing (which sends the message, she said, that "there is only one way to read a book"); for their drain on school budgets; and for their system of points and prizes.
Within weeks of the publication of Carter’s article, Renaissance Learning was in full siege. Judi and Terry Paul sent a four-page, footnoted, response to the library journal (a shortened version of which the journal published as a letter to the editor), and copied it, with an urgent cover letter, to the company’s entire staff. "We need to make our case directly," said company president Mike Baum, in a cover letter, "to people who have read Carter and need to know the facts."
Things soon got worse. In April of 1997, the Ft. Worth Star-Telegram published a front-page article on AR, headlined "Reading Rewards: Program’s Popularity Overrides Criticism".  The story was a relatively balanced account, which quoted Carter leveling a few of her earlier criticisms and further complaining about the prevalence of cheating. Judi Paul got in a few zingers, too. She called Carter "a racist white woman with middle- and upper-class views." Paul’s point, an essential Renaissance theory, is that underprivileged children, particularly minorities, don’t generally grow up in households where reading and learning is practiced and valued, so they need extra treats to get started. But Carter got the last word. Paul’s comment, Carter said, "implies a paternalism that learning to read for poor minority children is something different." After the story’s publication, the Pauls went on the offensive again. First, Judi Paul wrote a letter to the editor of the Star-Telegram and sent a six-page response directly to librarians and other educators throughout Texas. “When I used the word ‘racist,’” she told the newspaper, “I did not mean that the critic herself was a racist.” Her point, she said, was that these criticisms “have a racist effect.” And, she added, “Accelerated Reader does not dispense rewards. Educators do.” Later, when Scholastic released a competing product called Reading Counts, Terry Paul distributed an internal memorandum that accused Scholastic of using “marketing spin” and “pseudo-scientific gloss” to sell a product that was “rotten at the core.” [Sources: Fort Worth Star-Telegram letter to the editor, May 9, 1997, by Judith Paul, Chairwoman, Advantage Learning Systems, Inc.; internal communication, from Terry Paul, addressed to “ALS and Institute Employees” dated March 23, 1999.]
Scorched-earth assaults naturally leave burning embers. Carter, who was active in the civil rights movement in the 1960s, still smolders about being called a racist. In the end, what seems to burn Carter most is that automated programs like Accelerated Reader dumb down the entire library experience. In her view, they take librarians’ hard-earned skills – their knowledge of literature, their knack for matching books to youngsters’ developing interests and individual reading abilities – and shove them to the side. They also believes they compromise a library’s central principle of freedom.
She has a point. Computerized programs of this genre claim to expand a youngster’s options, but in reality they do the opposite. In Renaissance Learning’s case, when schools invest in the company’s reading software, what they’re really buying is a succession of contorted constrictions. First come shelves full of books that have been stratified by an inaccurate set of "readability levels." Next comes the automated quizzes and the point system, which simplifies the subtleties and variety in this material even further. Finally come the students, whose choices and sense of their own capacities have been narrowed yet again by quasi-fictionalized "zones of proximal development." Not only are all these delineations confining; not only are they costly; they’re also, to Carter, unnecessary. When students pick a book, she said, "if they can’t read it, they just don’t."
Buried in the middle of the academic questions about Accelerated Reader and Accelerated Math is a very basic one, given the concerns at issue in this book. Why is a computer needed with these programs at all? At least with reading, it’s certainly the consensus among the experts that the heart and soul of literacy happens in the books themselves, in the mental exercises young readers do with a story, and in their interpretative interactions with teachers, parents and peers. So what does this machine really contribute to the endeavor?
"Let’s find something to do with technology that really adds something," says Tim Shanahan of the University of Chicago. "What programs like Accelerated Reader are doing is on the edge of using technology. Teachers have to do a bunch of new things to make Accelerated Reader work. They should be doing stuff to teach reading." It is perhaps worth noting what the National Reading Panel concluded about computerized programs in general, including those that try to put technology to more active use. As a whole, the panel said, the genre seems somewhat promising. It found indications that some high-end software products (such as those featured earlier in the "Apple Classrooms of Tomorrow") might help students gain facility with word and sound recognition and overall vocabulary. Computers might also stimulate comprehension, the panel surmised, by offering extra avenues for understanding, such as multimedia displays and hypertext links to the Internet; and for creating a powerful tool to practice writing, which is still considered one of the best ways for a reader to build his or her own lasting sense of meaning. The panel also acknowledged the computer’s capacity to boost motivation. Nonetheless, the panel warned that the studies on computer use in reading weren’t extensive or solid enough for any definitive recommendation. Regarding the machine’s power as a motivator, the panel’s report cautioned that "this effect may diminish as computers become ever more common."
The equivocation in these conclusions, as we have seen, enjoys a long and persistent history.  It may be useful, therefore, to conclude with one particularly graphic illustration of the confusing picture that high-tech schools can present.
Of the dozen or so big school reform models that have come into vogue in recent years, only one has been dedicated to improvement through technology: the "Co-nect Schools," whose flagship, the ALL School, was featured in this book’s Introduction. Founded in 1992 by the Educational Technologies Group at BBN Corporation, by 2001, Co-nect had enlisted 58 schools in eight states. And, as with most reform models, its literature listed a handful of schools and school districts showing gains made in standardized test scores after signing on to its program. But the Co-nect promoters naturally wanted something a little firmer than test scores, which can of course be increased by many things. So in 1993, it commissioned Michael Russell, a senior research associate and specialist in technology at Boston College’s Center for the Study of Testing, Evaluation, and Educational Policy. Russell spent the next five years studying more than 25 different Co-nect schools in Florida, Texas, Tennessee, Ohio, and Maryland. He controlled for how many years of technology experience each school had, and even compared Co-nect schools with other schools of similar demographics. "I couldn’t find anything," Russell told me. "There was absolutely nothing to say." But what about Co-nect’s reports of rising test scores? "Some schools did show some rises, and some were big ones," Russell said. "Others showed nothing, and others were down. When we averaged everything together, it came out to zero." Later, in 2001, a three-year study by the RAND Corporation came to similar conclusions. Almost half of the Co-nect schools it studied had made no gains at all in math when compared with neighboring schools, and more than half had made no progress in reading. 
During Renaissance Learning’s Las Vegas gathering, I spent a little time, as one normally does at conferences, wandering the exhibit hall. The hall was filled with demonstration stands for various Renaissance products, and rows of other booths where companies that have attached themselves to Renaissance Learning could sell their wares. Housed in a cavernous room, the exhibit sat just across from a temporary "Renaissance store" where the company sold its own products, at a now-only discount rate. As I strolled past tables of T-shirts, buttons, children’s books, and various prizes, I stopped occasionally to talk to some teachers. One was Gloria Rankin, an elegantly dressed fifth-grade teacher from Memphis, Tennessee, who had just finished visiting a booth that sold dozens of goodies – basketballs and soccer balls, CD players, skateboarding nick-knacks, walkie-talkies, even a lava lamp – all conveniently marked with their assorted point values. When asked what she thought of the whole Renaissance program, Rankin paused. "I’m disappointed," she said, "that we can’t teach students to read without a lot of gimmicks."
Rankin came to education from the private sector, where she had helped manage an entertainment promotion company and then worked as an administrator at a private college. She started teaching in 1995, whereupon she was promptly initiated into the realities of education theory. Like many large, urban school districts, Memphis was burdened with a student population that was largely ethnic and poor. In the 1990s, it also had a superintendent, Naomi House, who was an avid believer in school reform. In her eight years with the district, House thrashed her way through an endless series of school reform initiatives, which were drawn from the famous New American Schools project and were eventually imposed on all 160 schools in the district. Apparently, House’s game plan failed. After six years of reform efforts, and $12 million in expense, achievement in Memphis hadn’t budged; in many schools, it actually declined.  The last of those reform efforts was the famously dogmatic reading drill program, "Success for All." Despite this low-tech program’s test-score accomplishments in other districts, Memphis teachers soon renamed it "Stress for All, Success for None." In 2000, when a new superintendent arrived, he was decidedly underwhelmed by Success for All and dumped the program, along with the rest of the district’s campaign for whole-school reform. But the appetite for novelty is hard to kill. A few schools were now considering yet another reading package called "Soar to Success," sold by Houghton-Mifflin; others were testing out Accelerated Reader. To facilitate these explorations, the district had sent a small brigade of administrators to Vegas. For some classroom perspective, it also sent one librarian and Rankin, who was now strolling the MGM Grand to figure out what this latest round of reform might entail.
"Why do people in education keep re-inventing the wheel?" Rankin asked me. "It is the biggest time waster, and money waster. When are we going to stop doing that?" And now, she said, comes George W. Bush, whose "Leave No Child Behind" program in her mind is one more new wheel. Rankin had no quibble with Bush’s desire to hold schools, and specifically teachers, rigorously accountable for students’ progress. "But do people realize that a lot of kids aren’t prepared to come to school?" Curiously absent from Bush’s supposedly comprehensive campaign, Rankin noticed, is "any mention, or responsibility, given to the parents."
I told Rankin that I thought she’d hit on something; that it’s odd the way everyone always points at schools and teachers but is afraid to point any fingers at parents. "It’s not about finger-pointing," Rankin said. "It’s about training." Her argument was that today more than ever, families, particularly poor families, are in disarray, and in need of intensive support and guidance on how to handle the basic job of parenting. "What part should they play to prepare their child for school?" she asked. That is a big question, one far removed from the automated topics at issue that week in Las Vegas.
At the end of the Renaissance gathering, as conference-goers bid their good-byes, they were reminded of, and warmly invited to, the company’s next annual conference, to be held in San Antonio, Texas, in early 2002. Renaissance staff members were clearly excited about the conference. They were already heavily broadcasting its theme: "Let the Dream Continue."
- "Accountability Systems: Implications of Requirements of the No Child Left Behind Act of 2001," by Robert L. Linn, Eva L. Baker, and Damian W. Betebenner, Educational Researcher, Vol. 31, No. 6, August/September, 2002, p. 3; "Galileo’s Dilemma: The Illusion of Scientific Certainty in Educational Research," by Douglas B. Reeves, Education Week, May 8, 2002, p. 44
- The volume of research looking for links between technology use and academic achievement is so large that a full listing would overwhelm these pages. To cite but a few examples, in 1972, J.F. Vinsonhaler & R.K. Bass published an article in Educational Technology (issue 12, pp. 29-32) entitled “A Summary of ten major studies on CAI [computer based instruction] drill and practice.” Fifteen years and many studies later, Henry J. Becker, one of the leading researchers in this field, presented a paper at the 1987 annual meeting of the American Educational Research Association in Washington, D.C., entitled “The impact of computer use on children’s learning: What research has shown and what it has not.” Since Becker’s presentation, there’s been a record-breaking abundance of these studies throughout the 1990s, which are dealt with later in this chapter.
- “Effectiveness of Computer Based Instruction: An Updated Analysis,” by Chen-Li Kulik and James. A. Kulik, Computers in Human Behavior, 7 , 1991, pp. 75-94; “Research on Computers and Education: Past, Present, and Future,” a report to the Bill & Melinda Gates Foundation by Jeffrey T. Fouts, February, 2000, pp. 6-9 and throughout.
- “Pennsylvania tests essay-grading software,” by Cara Branigan, eSchool News, Jan., 2001, p. 1; “NRC Panel: Rethink, Revamp Testing,” by Lynn Olson, Education Week, April 11, 2001; “School spending to soar on test-prep and assessement,” by Cara Branigan, eSchool News, Oct., 2001, p. 12 (in this article, Eduventures.com projects technology spending in schools to be near $15 billion in 2002); “Business Intelligence: Insights From the Data Pile,” by Leslie Berger, The New York Times, January 13, 2002.
- “The 1992 National Reading Study and Theory of Reading Practice,” 1992, The Institute for Academic Excellence, Madison, WI.
- “National Study of Literature-Based Reading: How Literature-Based Reading Improves Both Reading and Math Ability,” 1993, The Institute for Academic Excellence, Madison, WI.
- “The Impact of the Accelerated Reader on Overall Academic Achievement and School Attendance,” a paper given at the National Reading Conference, Literacy and Technology for the 21st Century, Atlanta, Georgia, October 4, 1996
- “Report of the National Reading Panel: Teaching Children To Read: An Evidence-Based Assessment of the Scientific Research Literature on Reading and Its Implications for Reading Instruction,” National Institute of Child Health and Human Development, April 13, 2000, as commissioned by a 1997 Congressional directive.
- “Reading Achievement: Effects of Computerized Reading Management and Enrichment,” by Janie Peak and Mark W. Dewalt, ERS-Spectrum, Vol. 12, No. 1, Winter, 1994
- “The effects of incentives on reading,” by Jeff McQuillan, California State University, Fullerton, Reading Research and Instruction, Winter 1997, 36 (2), pp. 114-120.
- “Computerized Self-Assessment of Reading Comprehension with the Accelerated Reader: Action Research,” by Stacy R. Vollands, Keith J. Topping, and Ryka M. Evans, Reading & Writing Quarterly, vol. 15, 1999, pp. 197-211.
- “The learning effectiveness of technology: A call for further research,” by T.H. Jones and R. Paolucci, Educational Technology Review, Spring/Summer, 1998, pp. 10-14
- “The Kept University,” by Eyal Press and Jennifer Washburn, The Atlantic Monthly, March, 2000. In May of that year, the Associated Press reported that Dr. Marcia Angell, editor of The New England Journal of Medicine, recently “wrote a withering critique of the research system, saying science was being compromised by the growing influence of industry money.” (“Harvard Keeps Strict Rules On Outside Research Work,” The New York Times, May 27, 2000).
- “Critical Thinking and Literature-Based Reading,” a Report from the Institute for Academic Excellence, Nov., 1997
- “Mind in Society: The Development of Higher Psychological Processes,” by L.S. Vygotsky, Harvard University Press, 1978, p. 86.
- Internet archives for ‘Accelerated Reader’ in LM_Net, a discussion forum for librarians: http://ericir.syr.edu/Virtual/Listserv_Archives/LM_NET.shtml; “Accelerated Reader: What are the lasting effects on the habits of middle school students exposed to Accelerated Reader in elementary grades?” by Linda M Pavonetti, Kathryn M Brimmer, and James F Cipielewski Journal of Adolescent & Adult Literacy, published by the International Reading Association, December 2002
- Ibid, Pavonetti, Brimmer, and Cipielewski “That Book Isn’t on My Level: Moving Beyond Text Difficulty in Personalizing Reading Choices,” by Jo Worthy and Misty Sailors, University of Texas, Austin. Published in The New Advocate, Summer, 2001, based on a paper presented at the National Reading Conference in Scottsdale, Ariz., December, 2000.
- Op. Cit., LM_Net Of all the literature taking public schools to task for their failings, the most heavily promoted is the 1983 report, “A Nation At Risk.” In the years since the report’s publication, a solid body of opposing literature has proved that the alarms it sounded were quite hyperbolic. (See, for example, The Manufactured Crisis, by David Berliner, i.d. TKTK.) Hyperbole aside, a good portion of this report, and many others, have had solid points to make. As noted elsewhere, one of the best arguments on this score, narrow as it may be, is Left Back: A Century of Failed School Reform, by Diane Ravitch, Simon & Schuster, 2000
- “Schools Taking Tougher Stance With Standards,” by Tamar Lewin, The New York Times, September 6, 1999, page A1; “Academic Standards Eased as Fear of Failure Spreads,” by Jacques Steinberg, The New York Times, December 3, 1999, p. A1; “Soccer Moms vs. Standardized Tests,” by Charles J. Sykes, Op-Ed column, The New York Times, December 6, 1999, p. A29.
- “Right Answer, Wrong Score: Test Flaws Take Toll,” by Diana B. Henriquez & Jacques Steinberg; “When a Test Fails the Schools, Careers and Reputations Suffer,” by Jacques Steinberg & Diana B. Henriquez, The New York Times, May 20 & May 21, respectively, p. 1.
- “Testing Reasoning and Reasoning About Testing,” by Walt Haney, Review of Educational Research, Winter, 1984, Vol. 54, No. 4, pp. 627-628
- “A Technological and Historical Consideration of Equity Issues Associated with Proposals to Change the Nation’s Testing Policy,” Harvard Education Review, Spring, 1994, pp. 76-95.
- Cite TK, social promotion info “The Growing Revolt Against the Testers,” a column by Richard Rothstein, The New York Times, May 30, 2001
- “Dukakis’ State Miracle: Long on Jawboning,” by James Risen, The Los Angeles Times, August 7, 1988, p. 1; “Dukakis’ Miracle Losing Luster,” by Nicholas M. Horrock, The Chicago Tribune, September 4, 1988, p. 1.
- USA Today, March 14, 2000, p. 14A
- Boston Globe, June 10, 1999, page 1A
- “The Myth of the Texas Miracle in Education,” by Walt Haney, Education Policy Analysis Archives, Vol. 8, No. 41, August 19, 2000; “Revisiting the Myth of the Texas Miracle in Education: Lessons about Dropout Research and Dropout Prevention,” by Walt Haney, Lynch School of Education, Boston College, a March, 2001 revision of a paper prepared for a conference sponsored by Achieve and the Harvard Civil Rights Project, January 13, 2001, Cambridge, Mass.
- During the months of litigation prompted by the TAAS case, in 1999 and 2000, a considerable amount of literature was generated that attempted sort out the claims and counter-claims about the fairness and effectiveness of Texas’s standardized testing system. One of the more exhaustive reviews of the story, albeit one weighted toward the state’s arguments, was compiled by S. E. Phillips, who had served as one of the state’s expert witnesses. The compilation became a package of nine articles published in the academic journal “Applied Measurement in Education,” Vol. 13, No. 4, 2000. One of those articles (“GI Forum v. Texas Education Agency: Psychometric Evidence,” pp. 343-386), written by Phillips, summarizes each side’s main arguments, and the court’s findings on each count.
- In 1996 and 1998 NAEP scores, Texas fourth- and eighth-graders placed ahead of almost every other state in math and writing. According to the Education Trust’s report (“Real Results, Remaining Challenges: The Story of Texas Education Reform,” a report commissioned by The Business Roundtable, April, 2001), minorities did particularly well, trumping most or all of their peers across the nation. Texas black students led the pack, scoring even better than white students in seven states in the NAEP writing test. There has been good news to report in international measures as well. In the 1999 Third International Math and Science Study (TIMSS), Texas scores were only average in science, but they shined in math. Despite having the most low-income and minority students of all 13 participating states, according to the Education Trust, Texas essentially led them all on math scores. (On one math measure, Michigan topped Texas by one point among the participating states. However, on two other measures – those scoring internationally in the top 10 percent, and those in the top quartile – Texas students led the field by several percentage points.)
- “Real Results, Remaining Challenges: The Story of Texas Education Reform,” a report by the Education Trust, commissioned by The Business Roundtable, April, 2001.
- “Bush’s ‘Texas miracle’ in schools? TAAS tells,” editorial, The Sacramento Bee, March 15, 2000
- “The Test Mess,” by James Traub, The New York Times Magazine, April 7, 2002, pp. 46-78, esp. 49-50
- “States Teeter When Balancing Standards With Tests,” by Richard Rothstein, The New York Times, May 1, 2002, p. A21
- “’Texas Miracle’ Doubted: An Education ‘Miracle’ or Mirage?” by John Mintz, The Washington Post, April 21, 2000, page A1
- annual TAAS Participation Reports, Texas Education Agency, http://www.tea.state.tx.us/perfreport/aeis
- The Harmful System of the TAAS System of Testing in Texas: Beneath the Accountability Rhetoric,” by Linda McNeil, Rice University, Houston; and Angela Valenzuela, University of Texas, Austin, Harvard Civil Rights Project, 2000
- "Rigorous School Tests Grow, But Big Study Doubts Value,” by Greg Winter, The New York Times, December 28, 2002, p. 1. The article reports on a study financed by the National Education Association and conducted by Arizona State University’s Educational Policy Research Unit and its Center on Education Policy. As might have been expected, soon after the study’s release, it was criticized by another research team, from Stanford University, which released its own findings that test pressure was actually improving achievement. See “Researchers Debate Impact of Tests,” by Debra Viadero, Education Week, February 5, 2003, p. 1 & 12.
- The National Assessment of Educational Progress, 1992-2000; “Gap Between Best and Worst Widens on U.S. Reading Test,” by Kate Zernike, The New York Times, April 7, 2001, p. 1
- According to Tennessee testing data, from 1996 to 2001, most students in grades three through eight have fallen over the years, in relation to national norms. Specifically, third-graders have gone from the 59th percentile to the 51st, fourth-graders from the 58th to the 52nd, fifth-graders from the 53rd to the 55th, sixth-graders from the 51st to the 52nd, seventh-graders from the 54th to the 52nd, and eigth-graders from the 56th to the 54th. (http://www.state.tn.us/education/mstat.htm)
- “Learning from Past Efforts to Reform the High School,” by Thomas James and David Tyack, Phi Delta Kappan, February, 1983, p. 406
- “Consequences of Assessment: What is the Evidence?” by William A. Mehrens, Education Policy Analysis Archives, July 14, 1998, Vol. 6, No. 13, p. 1
- “The Test Mess,” by James Traub, The New York Times Magazine, April 7, 2002, pp. 46-60; The Los Angeles Times, Sept. TKTKT; “How U.S. Punishes States With Higher Standards,” by Richard Rothstein, The New York Times, Sept. 18, 2002
- “Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A’s, Praise, and Other Bribes.” by Alfie Kohn, 1993, Houghton Mifflin
- “The effects of incentives on reading,” by Jeff McQuillan, California State University, Fullerton, Reading Research and Instruction, Winter 1997, 36 (2), pgs. 111-125
- As with many areas of educational research, there are mounds of studies on this topic. One of the most consolidated presentations of the main arguments on both sides was published in the Review of Educational Research, Spring, 1996, Vol. 66, 1. The issue included four articles by eight leading writers and researchers on the subject of rewards. On the critical side were Mark R. Lepper, Mark Keavney, and Michael Drake; Richard M. Ryan and Edward L. Deci; and Alfie Kohn. Defending rewards were Judy Cameron and W. David Pierce. In 1999, Deci, Koestner, and Ryan responded with an exhaustive re-examination in Psychological Bulletin that criticized rewards once again. (See "A Meta-Analytic Review of Experiments Examining the Effects of Extrinsic Rewards on Intrinsic Motivation," 1999, Vol. 125, No. 6, 627-668). Judgments obviously can be subjective on this topic. But in terms of depth, detail, and scientific rigor, it certainly looks like the critics of rewards have carried the day.
- A number of studies have looked persuasively at the basis, and the effects, of the Cognitive Tutor program. Among the most tangible are: “Intelligent Tutoring Goes to School in the Big City,” by K. R. Koedinger, J. R. Anderson, W. H. Hadley, & M. A. Mark, International Journal of Artificial Intelligence in Education, vol. 8, 1997, pp. 30-43; “An Effective Metacognitive Strategy: Learning by Doing and Explaining with a Computer-Based Cognitive Tutor,” by V. A. W. M. M. Aleven and K. R. Koedinger, Cognitive Science, vol. 26, 2002, pp. 147-179
- “Hold the Applause! Do Accelerated Reader and Electronic Bookshelf Send the Right Message?” by Betty Carter, School Library Journal, pp. 22-25
- “Reading Rewards: Program’s Popularity Overrides Criticism,” by Jennifer Packer, Fort Worth Star-Telegram, April 21, 1997, p. 1.
- For those who enjoy tracking studies of technology’s effects on achievement, several additional examinations may be of interest. One, reported in late 2002, was conducted by two University of Chicago economists, who reviewed the results of the e-rate program in California between 1998 and 2000. The researchers found that despite the massive infusion of Internet services that the program had brought to schools (particularly poor schools), those Internet services had done nothing for test scores in math, reading, and science. A five-year study of Internet use in one district, given the pseudonym, “Waterford,” came to similar conclusions. (See “Narrowing the Digital Divide,” BusinessWeek, December 9, 2002, p. 28; Bringing the Internet to School: Lessons from and Urban District, by Janet Ward Schofield and Ann Locke Davidson, Jossey-Bass, 2002.) Yet another study, released in early 2003, was conducted by James A. Kulik, the Michigan University researcher and well-known meta-analyst of educational technology studies. As with his previous reviews, Kulik concluded that simple computerized tutorial programs were more effective than more experimental programs such as simulations. But the gains he found were minimal in some cases, and inconsistent in others. Furthermore, as with Kulik’s previous work, this review failed to account for poor research designs in the sample of studies he was examining. And, since the drilling programs and tutorials that Kulik favored are less and less in use, Kulik’s findings were dismissed by some technology experts as being out of date. (See “Study probes technology’s effect on math and science,” by Cara Branigan, eSchool News, February, 2003, p. 14; and “More grizzly than I can bear,” a column by Gregg Downey, publisher of eSchool News, February, 2003, p. 6.) An earlier study by the Education Testing Service came to the opposite conclusion. Published in 1998, this study, which has been widely circulated by technology advocates, found that the standard “drill-and-practice” computer programs were actually having a negative effect on achievement in elementary grades, while simulations, spreadsheets, math learning games, and other innovative programs were stimulating achievement. (See “Does It Computer? The Relationship Between Educational Technology and Student Achievement in Mathematics,” by Harold Wenglinsky, Educational Testing Service, 1998.) Unfortunately this study also failed to account for the most obvious possibility: that the comparisons were conducted without a level playing field. That is, students using innovative computer programs were being compared to students who weren’t doing anything that was innovative, with or without computers.
- “Implementation and Performance in New American Schools: Three Years Into Scale-Up,” by Mark Berends, Sheila Nataraj Kirby, Scott Naftel, and Christopher McKelvey, a study by The RAND Corporation, 2001, pp. 79-133
- “unrequited promise,” by Jeffrey Mirel, Education Next, Summer, 2002, pp. 70-72
About the Author
|Todd Oppenheimer has written for such publications as The New York Times, The Washington Post, The Atlantic Monthly, Columbia Journalism Review, and Newsweek, where he was associate editor of the magazine’s digital media division. He has won numerous awards for his writing and investigative reporting, has been featured on radio and television shows such as ABC’s “Nightline,” and in 1998 was named San Francisco’s School Volunteer of the Year. The Flickering Mind is based on “The Computer Delusion,” a cover story Oppenheimer wrote for The Atlantic’s July 1997 issue, which won the year’s National Magazine Award for public interest reporting.|
APA Citation: Oppenheimer, T. (2007). Critiques of instructional technology. In M. K. Barbour & M. Orey (Eds.), The Foundations of Instructional Technology. Retrieved <insert date>, from http://projects.coe.uga.edu/itFoundations/