Why More Standardized Tests Won’t Improve Education

By Pamela Grundy
September 2, 2011

American students are spending growing amounts of time preparing for and taking high-stakes standardized tests. The federal government requires students to take annual state tests in math, English, science and social studies. Some states and districts have gone even further, requiring standardized tests for every subject, including art, music, journalism and physical education. Test scores are used to bar students from moving from one grade to another, to determine teacher and administrator pay, and to label schools as “failing” – a step that often leads to closure.

This expansion is occurring even though high stakes standardized testing has never been shown to improve student achievement or teacher performance, and even though the testing mandated by the federal No Child Left Behind legislation is widely considered to have undercut, rather than raised, national achievement levels.

The scholarly consensus that documents the limits of standardized testing is quite clear. For example, a comprehensive, nine-year study of testing and evaluation commissioned by the National Academy of Sciences recently concluded that: “available evidence does not give strong support for the use of test-based incentives to improve education.” (1)

A second National Academy report questions the use of test scores to evaluate teachers, noting that such scores “have not yet been adequately studied for the purposes of evaluating teachers and principals,” and “face substantial practical barriers to being successfully deployed in a personnel system that is fair, reliable, and valid.” (2)

Given the lack of proven links between testing and achievement, as well as extensive evidence about the limitations and problems of high-stakes testing, Parents Across America opposes current efforts to expand the use of standardized tests. We recommend instead a significant reduction in the number of such tests, along with alternative means for evaluating teacher quality, school strength and student progress.

Limits of Standardized Test Scores

Limited Reliability: Many factors can influence standardized test scores, including variations in test makeup, whether a student “tests well,” language and cultural factors and how a student happens to feel on testing day. As a result, scholars agree that a single test score, or set of scores, does not reliably measure what students have actually learned in a particular class or at a particular school. (3) Efforts to manipulate scores to obtain other information, most notably the degree of “value” that individual teachers “add” to student performance, have also proved unreliable. (4)

Limited Scope: Standardized tests do not cover many skills that parents want their children to develop while at school, including teamwork, creativity, how to ask good questions, how to persist with difficult projects, and how to apply skills to real-world challenges. (5)

Limited Coverage of Higher-Level Skills: Even in their area of strength, curriculum content, most standardized tests leave out a great deal of material. According to the National Academy, the omitted material is generally “the portion of the curriculum that deals with higher levels of cognitive functioning and application of knowledge and skills.” (6)

Significant Improvements Unlikely: Criticism of standardized testing has lead to much talk about “better assessments,” especially related to the Common Core Standards. But despite significant investments of time and money, such “better assessments” have yet to materialize. In general, efforts to create tests that measure creative problem-solving and higher-level thinking have generally resulted in far longer tests with even more limited reliability. (7)

Problems with Using Standardized Test Scores in High-Stakes Evaluations of Students, Teachers and Schools

Loss of Teaching Time: The logistics of administering high-stakes standardized tests, with the required proctors, makeup tests, and special accommodations, disrupts school routines, pulls teachers out of classrooms and reduces time for teaching and learning. As the number of tests increases, so does the loss of teaching time. In many districts, teachers are also pulled out of classrooms to score the exams. (8)

Teaching to the Test: Studies show that the pressures of high-stakes testing have spread the practice of “teaching to the test” across the country. Teaching to the test involves a narrow focus on the specific skills and content covered by the test, as well as considerable time spent on practice tests and test-taking strategies. Such strategies have often produced significant gains on specific state tests. But when the same students are tested with different tests that cover similar subjects, such as the National Assessment of Educational Progress, most of these gains evaporate. (8)

Narrowing the Curriculum: No Child Left Behind, which judged schools solely on their students’ math and reading test scores, prompted schools across the nation to abandon science, social studies, art, music, physical education and other subjects in pursuit of high scores in the tested subjects. Even within tested subjects such as math and English high-stakes testing often narrowed the focus to material that could be covered in a standardized test format. (10)

Undermining Creativity and Inventive Thinking: By their very nature, standardized tests do not reward creative thinking. A focus on test scores and other goal-based incentives stifles rather than encourages the creativity and inventiveness that is essential to a dynamic society and economy. (11)

Obscuring Real Achievement Gaps: Because standardized tests do not generally cover higher-level skills, they do not effectively measure the “achievement gap” between racial, economic or other groups. For example, imagine a low-performing school that focuses on the limited range of material covered by standardized tests at the expense of higher-level skills. Higher scores might make it appear that those students were “closing the gap” with students at higher-performing schools who spent much more time on higher-level skills. But even as the gap in lower-level skills shrank, the gap in higher-level skills would be growing. (12)

Stress on Students: High-stakes testing places tremendous stress on students as well as schools, often undermining students’ self-confidence and love of learning.

Cheating: As testing stakes rise, so does the pressure to cheat. As test-based accountability has spread, cheating scandals have become a regular feature of national education news. School systems are thus being forced to spend growing amounts of time and money providing test security and investigating improprieties. (13)

Loss of Talented Teachers: The move toward using test scores as a key component of evaluations and “merit pay” schemes may well push many talented teachers out of the profession. (14)

High Financial Costs: As a result of budget cuts, states and districts around the country are being forced to lay off teachers, lose valuable programs, and raise class sizes. At the same time, they are spending an every-increasing portion of their resources on tests. Even larger costs are expected when the new assessments aligned with the Common Core are created. Testing companies are some of the most profitable and fastest growing corporations in the nation.

Damage to Individuals and Communities: The limits on standardized tests means that testing experts regularly warn school systems not to make important decisions based on test scores alone. However, many systems continue to follow that practice. Test scores are being used with increasing frequency to rate teachers, deny students promotion from one grade to another, and justify closing “low-performing” schools,” despite a lack of evidence that such strategies have any educational value. The results have unfairly harmed individuals and communities across the country.

Problems with the “Multiple Measures” Strategy in Teacher Evaluations

Using “Multiple Measures” Does Not Reduce Testing: Combining standardized test scores with other kinds of information in teacher evaluation systems – known as the “multiple measures” strategy – does nothing to reduce the disruption testing brings to school routines and student learning. If standardized test scores form an integral part of a evaluation system, as many states and systems are now requiring, students will have to take the same number of tests, and will likely experience the same amount of teaching to the test, regardless of whether test scores play a large or small role in the overall evaluation.

Mathematical Intimidation: Our society tends to value “hard” data – data based on numerical measurements – over “soft” data – data based on observation and other methods – even though both kinds of data have limitations. (15) If standardized test scores form a substantial part of a school or a teacher’s evaluation, there is likely to be considerable pressure to align other measures with the test score data. For example, how many principals would be willing to give a stellar assessment of the capabilities of a teacher with a mediocre test score rating? In fact, in New York City, principals have been ordered to deny tenure to teachers with an “average” rating on their student test scores, without taking any other factors into consideration.

Alternatives

Richer Evaluation Measures: Students, teachers and schools should be evaluated by a rich set of measures that does not require extensive standardized testing and which covers multiple kinds of skills and the full range of teaching and learning. (16)

Judicious Use of Test Scores: Used judiciously, data from relatively infrequent, low-stakes standardized tests has some value as a snapshot of student abilities that can diagnose areas of strength and areas that need improvement.

Portfolio Assessments: Teacher and student portfolios that compile a variety of work – assignments, tests, projects, etc. – provide a far richer portrait of teaching and accomplishment than is possible with standardized testing. (17)

Peer Reviews: Some districts have effectively turned to peer reviews for teacher evaluation. Montgomery County, Maryland, for example, has received widespread attention for its “Peer Assistance and Review” plan, in which principals and “consulting teachers” evaluate teachers through observation, work closely with those in need of support or improvement, and make firing decisions when necessary. Unfortunately, the US Department of Education has ruled that the district has to stop using this system because the system does not link its ratings of teachers to student test scores – a condition of Maryland’s federal “Race to the Top” grant. (18)

Conclusion

Continuing to expand high-stakes standardized testing will prove a boon for companies that provide testing materials, data analysis and physical equipment. Online assessments such as those currently planned for the new Common Core standards, will prove especially profitable, as they will require school systems to invest billions of dollars in equipment and services.

Children, on the other hand, will suffer. As more than a decade of experience attests, the drawbacks of testing far outweigh any benefits derived from the “data” they provide.

Parents Across America is especially concerned about the use of federal power in the nationwide push to expand standardized testing. Federal policy has been a major force behind the growth of testing, most notably through the 2003 Elementary and Secondary Education Act (also known as No Child Left Behind). The Obama Administration’s “Race to the Top” program, as well as its “Blueprint” for revising ESEA, threatens to force systems to further expand high-stakes testing, to the detriment of children and schools across the country. (19)

We call on Congress to resist the pressure to expand such testing, and on advocates throughout the country to inform their own communities about the dangers of this approach to improving education.

Notes

1. Michael Hout and Stuart W. Elliott (eds.), Incentives and Test-Based Accountability in Education (Washington, D.C.: National Academies Press, 2011), 72. http://www.nap.edu/catalog.php?record_id=12521.

2. Board on Testing and Assessment; National Research Council, “Letter Report to the U.S. Department of Education on the Race to the Top Fund,” 5 October 2009, 8. http://www.nap.edu/catalog/12780.html.

3. Daniel Koretz, Measuring Up: What Educational Testing Really Tells Us (Cambridge, Mass., Harvard University Press, 2008), esp. 43-45; 11-42; 260-66.

4. Sean P. Corcoran, “Can Teachers Be Evaluated by Their Students’ Test Scores? Should They Be?” Annenberg Institute for School Reform, 2010, http://www.annenberginstitute.org/pdf/valueAddedReport.pdf. Eva Baker, et. al, “Problems with the Use of Student Test Scores to Evaluate Teachers,” Economic Policy Institute Briefing Paper, August 29, 2009, http://epi.3cdn.net/b9667271ee6c154195_t9m6iij8k.pdf.

5. Anthony Cody, “Complex Thinking Is Not Tested and Won’t be Taught,” http://blogs.edweek.org/teachers/living-in-dialogue/2011/07/complex_thinking_is_not_tested.html.

6. Hout and Elliott, Incentives and Test-Based Accountability in Education, 30.

7. Koretz, Measuring Up, 59-65.

8. In Charlotte, North Carolina, for example, a newly instituted set of K-2 tests required between one and two hours of one-on-one testing for each child, requirements that absorbed enormous amounts of staff time. See: http://www.washingtonpost.com/blogs/answer-sheet/post/school-district-field-tests-52-yes-52-new-tests-on-kids/2011/04/20/AFFbGXFE_blog.html#pagebreak.

9. Koretz, Measuring Up, 235-59.

10. Diane Ravitch, The Death and Life of the Great American School System: How Testing and Choice Are Undermining Education (New York: Basic Books, 2010), 107-111.

11. For an account of creativity, entrepreneurship and standardized testing, see Yong Zhao, “Entrepreneurship and Creativity: Where Do They Come From and How Not to Destroy Them,” http://zhaolearning.com/2011/02/26/entrepreneurship-and-creativity-where-do-they-come-from-and-how-not-to-destroy-them/. See also Daniel Pink’s discussion of “the surprising science of Smotivation,” which examines ways that testing and performance incentives have been shown to stifle creativity: http://www.youtube.com/watch?v=rrkrvAUbU9Y.

12. Alfie Kohn, “How School Reform Damages Poor Children,” http://www.washingtonpost.com/blogs/answer-sheet/post/how-school-reform-damages-poor-children–kohn/2011/04/26/AFTTCbtE_blog.html.

13. One of the most prominent scandals unfolded in Atlanta, where a 10-month investigation found cheating at close to half of the city’s schools. For more details, see: http://www.ajc.com/news/volume-1-of-special-1000798.html.

14. Parents Across America, “Tying Teacher Salaries to Test Scores Doesn’t Work,” http://parentsacrossamerica.org/performancepay/.

15. John Ewing, “Mathematical Intimidation: Driven by the Data,” Notices of the American Mathematical Society, May 2011, http://www.ams.org/notices/201105/.

16. See Monty Neill, “A Better Way to Evaluate Students and Schools,” http://www.washingtonpost.com/blogs/answer-sheet/post/a-better-way-to-evaluate-students-and-schools-/2010/12/20/ABzjWuF_blog.html.

17. See, for example, the Work Sampling System developed at the University of Michigan, http://fairtest.org/work-sampling-system.

18. New York Times, 5 June 2011, http://www.nytimes.com/2011/06/06/education/06oneducation.html?_r=1.

19. See Pamela Grundy and Carol Sawyer, “The Push Behind a New Flurry of Testing,” http://www.newsobserver.com/2011/05/15/1196022/the-push-behind-a-new-flurry-of.html.