We found reports and research about individuals and groups of individuals from across the nation whose lives have been tragically and often permanently affected by high-stakes testing. We found hundreds of instances of adults who were cheating, including many instances of administrators who “pushed” children out of school, costing thousands of students the opportunity to receive a high school diploma. We also found administrators and school boards that had drastically narrowed the curriculum, and who forced test-preparation programs on teachers and students, taking scarce time away from genuine instruction. We found teacher morale plummeting, causing many to leave the profession.
Supporters of high-stakes testing might dismiss these anecdotal reports as idiosyncratic or too infrequent to matter. But all of these problems could have been foretold. A little-known but powerful social science law known as Campbell’s law explains the etiology of the problems we document. Ignorance of this law endangers the health of our schools and erodes the commitment of those who work in them.
Campbell’s law was formulated in 1975 by the late Donald T. Campbell, a respected social psychologist, evaluator, methodologist, and philosopher of science. Campbell’s law stipulates that “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Testing experts George Madaus and Marguerite Clarke agree with Campbell, noting that whenever you have high stakes attached to some indicator of performance, you have a corrupted measurement system. The higher the stakes, the more uncertain are the conclusions you can draw from the measures you have. Put another way, the higher the stakes, the more likely it is that the construct being measured has somehow been changed. High stakes, therefore, lead inexorably to invalidity.
Evidence of Campbell’s law is everywhere. In business, if stock market price is the indicator and incentives such as big bonuses are given for short-term stock gains, then a system has been created to encourage poor or even counterproductive management practices, as well as outright fraud. In medicine, malpractice suits are an indicator of the quality of health care received and determine the reputations of physicians. So high stakes are associated with the threat of malpractice suits and thus contribute to the spiraling costs of health care, as physicians prescribe unnecessary tests and interventions. At the same time, financial incentives reward those who spend less time with patients, eroding the quality of care. Examples of corruption, cheating, gaming the system, taking short cuts, and so forth are found wherever high stakes are attached to performance in athletics, academia, politics, government agencies, and the military.
High-stakes testing is exactly the kind of practice Campbell warned us about (see sidebar "Campbell’s Law in Action"). Close SidebarCampbell's Law in Action
Our literature search has turned up endless examples showing how high-stakes testing corrupts education. A smattering of anecdotes:
Narrowing the curriculum: A 2004 report from the Education Policy Analysis Archives quotes one Colorado teacher as saying, “We only teach to the test even at second grade, and have stopped teaching science and social studies. We don’t have assemblies, take field trips, or have musical productions at grade levels. … Our second graders have no recess except 20 minutes at lunch.”
Pushing students out: Martin B., 16, came home one day and asked his mother if he should quit school. His English teacher had told the students, “Don’t you know … that you will all fail the AIMS [state high school exit exam]?”
Cheating: Fifth graders in one Texas elementary school performed in the top 10 percent on the state reading exam. The next year, as sixth graders entering a new middle school, they scored in the bottom 10 percent. Teachers in the elementary school admitted that cheating on the exam was standard operating procedure.
Misreporting scores: In 2004, the Wall Street Journal reported on an Ohio sixth grader who attended a school for the gifted but whose test scores were credited to the neighborhood school he did not attend. The logic: If no “credit” was given to neighborhood schools, they would never identify students as gifted, for fear of losing high-scoring students to gifted programs.
Undermining teaching practice: A dedicated eighth-grade math teacher told us that one year, when his students’ test scores were high, he was asked to lead “remedial” workshops for less successful colleagues. The following year, his class had more special needs students and English-language learners. Despite his best efforts, the scores were not as good, and the principal requested that he attend the same workshops he once taught so he could “improve” his teaching.
Sharon L. Nichols is an assistant professor at the University of Texas at San Antonio. David C. Berliner is the Regents’ Professor of Education at Arizona State University in Tempe. This article is adapted from their book Collateral Damage: How High-Stakes Testing Corrupts America’s Schools (Harvard Education Press 2007). Serious, life-altering decisions that affect teachers, administrators, and students are made on the basis of testing. Tests determine who is promoted and who is retained; who will receive a high school degree and who will not. Test scores can determine if a school will be reconstituted and whether there will be job losses or cash bonuses for teachers and administrators. Under these conditions, we must worry that the process that is being monitored by these test scores—the quality of our children’s education—is also becoming corrupted and distorted, rendering the test scores themselves meaningless.
It is a legitimate request for the citizenry who have designed and paid for schools to want external measures of how those schools, teachers, and students are doing. However, there are many forms of evaluation that, separately or in combination, can avoid the pitfalls associated with high-stakes tests. A more effective system of assessment could combine low-stakes tests with some or all of the following:
Formative assessments. Most tests in the United States are assessments of learning. The tests are designed to tell us what and how much students know at any one point in time. By contrast, formative assessment is assessment for learning, used to improve teaching and learning. They often entail a range of activities embedded into the curriculum. Tests and other classroom activities (classroom discussion, projects, homework) are specifically designed to provide feedback to teachers and students regarding what they know, what they don’t know, and where they might go next.
An independent inspectorate. Australia, England, Holland, Germany, Sweden, and a few other countries have a school inspectorate devoted to visiting schools and providing feedback on their performance. To evaluate whether a school is performing satisfactorily means, first and foremost, having inspectors watch teachers teach. Inspectors make judgments about the depth and breadth of the curriculum, its conformity to national or state standards, and the competency of teachers to implement it in an exemplary manner. They also check to see if improperly certified teachers are employed at the school, and may hold focus groups to determine community satisfaction. Inspectors visit with students to evaluate whether their motivational needs are being met and assess the school’s plans for staff development.
End-of-course examinations. Yet another alternative to high-stakes testing is to build a low-stakes accountability system that involves teachers at the district level in making the tests themselves. Imagine local teachers meeting and working on understanding the subject-matter standards, sharing designs and teaching tips for the classroom teaching of the standards, sharing course syllabi, and making decisions about text selections. Imagine also that teachers are paid for these activities, for picking the cut scores to determine student proficiency, and for scoring the tests. Having teachers score tests in groups is a great way to stimulate discussion of curriculum content and student capabilities. Several states have taken steps to implement these types of end-of-course evaluation systems.
Performance tests. Performance tests are student projects or portfolios of student work that are presented for evaluation by a panel of judges. The judges are asked to determine whether a student has mastered a sufficient body of knowledge to be considered competent. The format places the teacher in the role of mentor, coach, and advisor rather than judge, and teachers invariably work hard to prepare students to do well. This is a democratic form of accountability, since the public is invited in to see what has been learned. New York’s Central Park East School, the Coalition of Essential Schools, and International Baccalaureate programs use performance tests.
Value-added assessment. More and more educators and politicians are pushing for value-added assessment, which looks at the achievement of individual students and schools over time and perhaps—if the statistics ever are refined enough—can pinpoint the effects of particular teachers. Although value-added models of growth still need to be refined, they appear promising. However, if achievement-growth reports become high-stakes, as now occurs with the NCLB test scores used throughout the nation, then value-added models of assessment will suffer the same problems as the current accountability tests.
We believe that the costs associated with high-stakes testing are simply not worth it. Campbell’s law informs us that high-stakes testing of the type associated with NCLB can never be used successfully in our schools. Despite the sheer number of examples showing negative effects, however, many people still believe high-stakes testing is a ¬viable way to improve education. They defy a perfectly valid social science principle—at their peril.
Sharon L. Nichols is an assistant professor at the University of Texas at San Antonio. David C. Berliner is the Regents’ Professor of Education at Arizona State University in Tempe. This article is adapted from their book Collateral Damage: How High-Stakes Testing Corrupts America’s Schools (Harvard Education Press, 2007).