Volume 23, Number 2
High-Stakes Testing and the Corruption of America's Schools
Since the fall of 2003, after NCLB required high-stakes testing in all 50 states, we have systematically scoured news outlets and scholarly journals for accounts of the impact of high-stakes testing. We have amassed a significant collection of evidence highlighting the distortion, corruption, and collateral damage that occur when high-stakes tests become commonplace in our public schools.
We found reports and research about individuals and groups of individuals from across the nation whose lives have been tragically and often permanently affected by high-stakes testing. We found hundreds of instances of adults who were cheating, including many instances of administrators who “pushed” children out of school, costing thousands of students the opportunity to receive a high school diploma. We also found administrators and school boards that had drastically narrowed the curriculum, and who forced test-preparation programs on teachers and students, taking scarce time away from genuine instruction. We found teacher morale plummeting, causing many to leave the profession.
Supporters of high-stakes testing might dismiss these anecdotal reports as idiosyncratic or too infrequent to matter. But all of these problems could have been foretold. A little-known but powerful social science law known as Campbell’s law explains the etiology of the problems we document. Ignorance of this law endangers the health of our schools and erodes the commitment of those who work in them.
Campbell’s law was formulated in 1975 by the late Donald T. Campbell, a respected social psychologist, evaluator, methodologist, and philosopher of science. Campbell’s law stipulates that “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
Testing experts George Madaus and Marguerite Clarke agree with Campbell, noting that whenever you have high stakes attached to some indicator of performance, you have a corrupted measurement system. The higher the stakes, the more uncertain are the conclusions you can draw from the measures you have. Put another way, the higher the stakes, the more likely it is that the construct being measured has somehow been changed. High stakes, therefore, lead inexorably to invalidity.
Evidence of Campbell’s law is everywhere. In business, if stock market price is the indicator and incentives such as big bonuses are given for short-term stock gains, then a system has been created to encourage poor or even counterproductive management practices, as well as outright fraud. In medicine, malpractice suits are an indicator of the quality of health care received and determine the reputations of physicians. So high stakes are associated with the threat of malpractice suits and thus contribute to the spiraling costs of health care, as physicians prescribe unnecessary tests and interventions. At the same time, financial incentives reward those who spend less time with patients, eroding the quality of care. Examples of corruption, cheating, gaming the system, taking short cuts, and so forth are found wherever high stakes are attached to performance in athletics, academia, politics, government agencies, and the military.
High-stakes testing is exactly the kind of practice Campbell warned us about (see sidebar "Campbell’s Law in Action"). Serious, life-altering decisions that affect teachers, administrators, and students are made on the basis of testing. Tests determine who is promoted and who is retained; who will receive a high school degree and who will not. Test scores can determine if a school will be reconstituted and whether there will be job losses or cash bonuses for teachers and administrators. Under these conditions, we must worry that the process that is being monitored by these test scores—the quality of our children’s education—is also becoming corrupted and distorted, rendering the test scores themselves meaningless.
Alternatives to High-Stakes Testing
It is a legitimate request for the citizenry who have designed and paid for schools to want external measures of how those schools, teachers, and students are doing. However, there are many forms of evaluation that, separately or in combination, can avoid the pitfalls associated with high-stakes tests. A more effective system of assessment could combine low-stakes tests with some or all of the following:
. Most tests in the United States are assessments of
learning. The tests are designed to tell us what
and how much
students know at any one point in time. By contrast, formative assessment is assessment for
learning, used to improve
teaching and learning. They often entail a range of activities embedded into the curriculum. Tests and other classroom activities (classroom discussion, projects, homework) are specifically designed to provide feedback to teachers and students regarding what they know, what they don’t know, and where they might go next.
An independent inspectorate
. Australia, England, Holland, Germany, Sweden, and a few other countries have a school inspectorate devoted to visiting schools and providing feedback on their performance. To evaluate whether a school is performing satisfactorily means, first and foremost, having inspectors watch teachers teach. Inspectors make judgments about the depth and breadth of the curriculum, its conformity to national or state standards, and the competency of teachers to implement it in an exemplary manner. They also check to see if improperly certified teachers are employed at the school, and may hold focus groups to determine community satisfaction. Inspectors visit with students to evaluate whether their motivational needs are being met and assess the school’s plans for staff development.
. Yet another alternative to high-stakes testing is to build a low-stakes accountability system that involves teachers at the district level in making the tests themselves. Imagine local teachers meeting and working on understanding the subject-matter standards, sharing designs and teaching tips for the classroom teaching of the standards, sharing course syllabi, and making decisions about text selections. Imagine also that teachers are paid for these activities, for picking the cut scores to determine student proficiency, and for scoring the tests. Having teachers score tests in groups is a great way to stimulate discussion of curriculum content and student capabilities. Several states have taken steps to implement these types of end-of-course evaluation systems.
. Performance tests are student projects or portfolios of student work that are presented for evaluation by a panel of judges. The judges are asked to determine whether a student has mastered a sufficient body of knowledge to be considered competent. The format places the teacher in the role of mentor, coach, and advisor rather than judge, and teachers invariably work hard to prepare students to do well. This is a democratic form of accountability, since the public is invited in to see what has been learned. New York’s Central Park East School, the Coalition of Essential Schools, and International Baccalaureate programs use performance tests.
. More and more educators and politicians are pushing for value-added assessment, which looks at the achievement of individual students and schools over time and perhaps—if the statistics ever are refined enough—can pinpoint the effects of particular teachers. Although value-added models of growth still need to be refined, they appear promising. However, if achievement-growth reports become high-stakes, as now occurs with the NCLB test scores used throughout the nation, then value-added models of assessment will suffer the same problems as the current accountability tests.
We believe that the costs associated with high-stakes testing are simply not worth it. Campbell’s law informs us that high-stakes testing of the type associated with NCLB can never
be used successfully in our schools. Despite the sheer number of examples showing negative effects, however, many people still believe high-stakes testing is a ¬viable way to improve education. They defy a perfectly valid social science principle—at their peril.
Sharon L. Nichols is an assistant professor at the University of Texas at San Antonio. David C. Berliner is the Regents’ Professor of Education at Arizona State University in Tempe. This article is adapted from their book Collateral Damage: How High-Stakes Testing Corrupts America’s Schools (Harvard Education Press, 2007).