It's no wonder then that education reformers through the years have had concerns about standardized testing. From the 1920s on, the dominant form of standardized tests have been norm- and criterion-referenced multiple-choice tests. (Norm-referenced tests compare scores with a national sample; criterion-referenced tests compare students against a predetermined standard of achievement.) But do such tests accurately measure critical thinking and problem-solving skills? Do they assess children's performance fairly? Bolstered by new discoveries in cognitive psychology, education reformers and civil rights activists concurred in the 1990s that the answer is no. A campaign to reform assessment began.
In the new assessments envisioned by these reformers, student performance would be evaluated on the basis of authentic tasks in a real-life context. Rather than respond to a contrived prompt, students would write about issues that were meaningful to them. For instance, they might be asked to write a detailed "letter to a cousin" in England explaining the context and importance of the Lincoln-Douglas debates, thus engaging thoughtfully with real history. Students would be invited to demonstrate their ability to think and to use knowledge. For example, they might be asked to investigate the popular belief that the ratio of black to brown fur on woolly-bear caterpillars can predict the severity of the winter. After investigation, students would analyze the data and draw a scientifically defensible conclusion.
These performance tasks would be an important element of profound changes in education. Youngsters would study fewer things in greater depth rather than skim the surface and memorize disconnected facts and meaningless procedures. Teachers would serve more as guides than lecturers, adapting to the learning styles and backgrounds of different students. The traditional goal of all students reaching high standards of excellence would be complemented by the idea that not all students need reach identical high standards or take the same path to excellence.
These reforms, unfortunately, have not come to pass in most state testing programs. The "new" tests are little more than variations on the previous standardized tests. Today, all state programs include one or more of the following components: criterion and/or norm-referenced multiple-choice items; some open-ended short-answer questions; and a writing sample. The latter two are recent additions.
"Non-authentic" has been the longtime criticism of multiple-choice questions. After all, how many problems in the real world can be solved by one of four brief answers? And while open-ended and short-answer questions appear progressive, after scrutiny they are often revealed as being not much better than multiple choice. A promise of open-ended questions is that they can measure areas of student learning that cannot be assessed through multiple choice — critical, divergent, complex, and creative thinking; synthesis; and evaluation. However, many open-ended questions on the new state tests are not much more than multiple-choice questions with the answer options removed. For example, a Wisconsin state test for fourth-grade students requires test takers to "list two reasons why [sic] geese migrate each year"— asking simply for recall of memorized information.
There are, of course, exceptions. One test question, from the New Standards Reference Examination, which is administered in several states, postulates that a child received three teddy bears when he was born, and two more on each birthday. It then asks how old the child will be when he has 13 bears and how many bears he will have when he is 38 years old. The test then directs students to "show how you found the answers to both questions." This type of question reveals whether the student added, devised a formula to calculate answers, made a table, or used some other means to find the answers. It shows the thought processes used to solve problems.
By and large, however, short-answer questions keep the focus on isolated fragments of learning, as do most multiple-choice items. Facts and procedures are, of course, part of knowledge. Problems arise, however, when knowing facts becomes the primary goal of learning. As tests increase in importance but remain fundamentally unchanged, I believe they will guarantee the continued narrowing of instruction. The damage to the education of children is widespread but most severely affects students from low-income communities, disproportionate numbers of whom are children of color. Studies done by FairTest and other researchers, such as George Madaus and his colleagues at Boston College (see Multimedia Resources), have found that testing is more extensive, and more apt to have high stakes, in states, districts, and schools with larger proportions of African-American students. "Teaching to the test," or basing instruction primarily around what will be tested, is more common in schools with higher percentages of black students. According to the Education Commission of the States, 20 states now hold schools accountable for test scores, and FairTest research indicates that by 2004, 25 states will require high school students to pass an exam to graduate.
Most state tests now include writing samples. But the typical state examination treats writing as a completely formulaic exercise. Such "essay" questions typically reduce verbal expression to the "five paragraph paper," penned in response to a generic prompt of the "What I Did on My Summer Vacation" variety. What gets evaluated is not writing but compliance with the state scoring guide. Preparing students to pass these standardized state writing tests takes time away from helping them learn to write for meaningful purposes.
What do I propose? Rather than base scores on a standardized format, educators would analyze critically each student's writing, which is worth doing if the student is expressing something of value. Both Ernest Hemingway and Toni Morrison won Nobel prizes though their styles are markedly different.
Each child can and should be encouraged to write as an individual, even if not all will win prizes.
High-quality performance assessment geared to authentic curriculum and instruction remains sorely needed in our schools. It is possible for states to base a school's accountability on students' daily performance rather than just tests. Some states have already taken strong, positive steps. Vermont, for example, has implemented student portfolios that have been credited with improving instruction in both math and writing. If states are to use tests as one part of their accountability program, then a test like the Maryland State Performance Assessment Program, which includes multidisciplinary tasks performed over two to three school periods in grades three, five, and eight, is preferable.
In our classrooms, all teachers can use comprehensive assessment strategies, such as strategic observation of students, real investigations and reports done by students in different subject areas, rubrics, and documentation of children's work through portfolios or learning records. Many teachers are already adept in these methods and use them to evaluate student growth and plan instruction. As indicators of student achievement, their accuracy is superior to that of tests.
All of us — educators, policy makers, and the public alike — must change our thinking about accountability. Independent reviewers, for instance, can examine collections of work from randomly selected students to evaluate quality. By looking at the work in light of the school's or district's curriculum, evaluators can fairly and accurately assess the standards to which students are held, the opportunities they are given to engage in authentic learning, and finally their achievement. Additionally, teams of outside educators and other qualified people (such as parents and professionals in various fields who have been trained in review procedures) could conduct comprehensive, on-site evaluations of school quality.
These approaches appear "messy" — they lack the (deceptive) simplicity and false objectivity of a black-and-white number derived from a test. But educators know that real learning is messy — and complex and subjective. Which in turn means that genuine assessment cannot be reduced to dealing with only what is simple to measure.
The inability of test-driven reform to improve schools and the increasingly centralized control over curriculum and instruction that disempowers teachers, parents, students, and local communities is producing a backlash. As educators, parents, and concerned citizens ponder alternatives to testing overkill, classroom-based performance assessments will again, I believe, claim serious attention.
The answer is not more tinkering with standardized tests. We must go far beyond that, to use authentic assessments that improve learning and teaching — and guarantee genuine accountability.
The FairTest Examiner, a quarterly newsletter, is published by FairTest, located at 342 Broadway, Cambridge, MA 02139. FairTest is a public education and advocacy organization working on assessment reform. Find the newsletter and other materials about assessment at http://www.fairtest.org/
Testing Our Children: A Report Card on State Assessment Systems, by Monty Neill, Ed.D. (FairTest, 1997).
"The Influence of Testing on Teaching Math and Science in Grades 4–12," by G.F. Madaus et al. (Boston College Center for the Study of Testing, Evaluation, and Educational Policy, 1992).
The Merrow Report: Testing, Testing, Testing (PBS video). Explores the nature and uses of standardized tests in the public schools. To order, call 1-877-2MERROW or visit http://www.pbs.org/
Association for Supervision and Curriculum Development. For videos and printed material on designing performance assessments, 1703 North Beauregard St., Alexandria, VA 22311; http://www.ascd.org/
Monty Neill, Ed.D. is the executive director of the National Center for Fair & Open Testing (FairTest). He has been a teacher and an administrator in preschool, high school, and college.