How Should You Judge Your Teachers?
Vitriolic public debate, union resistance, and gut feelings. How to balance these factors to create a fair teacher-evaluation system.
Pensacola, Florida, school district administrator Karen “KK” Owen is on the front lines of the national debate over teacher assessment.
Florida lawmakers ordered sweeping changes to evaluation systems starting this fall, which made for a very busy summer as Owen and her fellow educators hammered together a framework that meets state requirements while still allowing school districts some flexibility.
Their goal: to create an evaluation system based on facts, not feelings.
“We can’t make comments like, ‘Good job!’ Or, ‘I really like the way you did this.’ Those are opinions,” says Owen, director of staff development for the Escambia County School District. Instead, evaluators have been trained to look for evidence—what the teacher says and does and what the students say and do.
Florida appears to be in the vanguard of the fight over teacher quality. The Sunshine State’s new law, opposed by many educators, mandates that performance be based 50 percent on test scores, ends tenure for new teachers, and ties performance to pay. Still, with the federal Race to the Top program requiring states to measure teacher effectiveness, and a number of other state legislatures passing or considering reforms, the issue of designing, or redesigning, evaluation systems is facing school districts across the country.
The big question is this: How do you effectively, and fairly, take the measure of a teacher?
First, says Charlotte Danielson, a Princeton, New Jersey–based educational consultant and expert on teacher quality and evaluation, administrators and other stakeholders need to get clarity and consensus on what good teaching is. “I’ve heard principals say, ‘I can’t really define it, but I know it when I see it,’” she says. “We can do a lot better than that.”
Danielson has created the Framework for Teaching (FFT), a research-based blueprint for instruction covering 22 teaching behaviors spread out over four domains—planning and preparation, classroom environment, instruction, and professional responsibilities. Two of the domains involve observation of, and reflection about, what’s going on in the classroom, while the other two look at behind-the-scenes components such as planning and communicating with families. Evaluators are trained to make evidence-based assessments, and those assessments are then compared to a standards-based rubric to arrive at a level of performance.
Unlike traditional systems, which tend to be passive—an administrator shows up in class with a checklist and later delivers her findings to the teacher—Danielson aims for a process in which teachers take an active role, and that results in professional conversations about what’s going well and what could be improved. Assessment is “not just about inspecting and getting rid of bad apples. That should never be the driver,” says Danielson. The goal is to design systems that help teachers get better—and allow administrators to confidently tell the public that teachers are performing well “and here’s how we know.”
To that end, Danielson has been working with San Francisco–based Teachscape and Educational Testing Service (ETS) to develop the Framework for Teaching Proficiency System, a multi-faceted tool that provides observer training, scoring practice, and a proficiency test to ensure the raters are reliable.
Escambia schools are using FFT, with some modifications to meet state requirements, and the district is launching an assessment and mentoring program called START (Successful Teachers Assisting Rising Teachers). Under Escambia’s START plan, the district’s best teachers are becoming full-time consultants, each responsible for a dozen novices. They, and not principals, will be evaluating the work of new teachers. The idea is to “lead our teachers down a road of teacher growth—more than just ‘Here’s your number, you scored a three,’ that’s it,” says Owen.
Escambia is also using Teachscape’s Reflect Video system, equipment that captures a 360-degree view of the classroom with accompanying software that allows administrators or colleagues to tag and comment on practices. The method, which incorporatess FFT and was initially developed as part of the Bill and Melinda Gates Foundation’s Measures of Effective Teaching project, is intended to assist in pinpointing what works and what doesn’t in a classroom. A teacher can watch his own lesson as an aid to self-reflection, and the tape can be reviewed by others for feedback. One advantage to the system is that teachers can observe an archived class, perhaps on a teacher development day, without having to skip their own classes. Meanwhile, the 360-degree view means a teacher can review what various students were doing during the class—a digital rendering of that useful teaching skill known as “eyes in the back of your head.”
The Gates-funded project involved videotaping thousands of classes, scoring performance, and then correlating that with student achievement data from those classes to build a bigger picture of what constitutes proficient practice. This, in turn, creates an assessment tool that can be used to train people to become good evaluators.
“We’ve learned a lot by using videotape,” says Mark Atkinson, founder and chief strategist of Teachscape. Some tapes, he says, have shown that a teacher may be having an exciting engagement with one student who may be an eager learner—but she is missing the fact that many students have disengaged, perhaps because the lesson wasn’t designed to prompt enough participation.
For all its high-tech appeal, getting teachers to accept being videotaped can be a challenge. Atkinson notes that teachers do have the power to turn off the camera or not share the video, reducing some of the Big Brother effect.
At Escambia, administrators are requiring new teachers, many of whom may have used this method of feedback in college, to agree to the taping as part of their contract. Their hope is that veteran teachers who might be anxious about the system will become less so when they see it in action. “We want our veteran teachers to observe this occurring in novice teacher classrooms and realize that it’s not all that scary,” Owen says.
Atkinson sees the video recording as a level of support that most teachers never get, but would like. “In many cases, they’ve never had anybody look at their teaching and give them accurate, knowledgeable, reliable feedback.”
Evaluators as Collaborators
In the run-up to the school year, a goal at Escambia has been learning to leave bias at the classroom door.
The evaluation system has four categories, from “highly effective” to “unsatisfactory.” For teachers who land in the latter category, “something needs to be done immediately in [their] classroom,” Owen says. But for those who fall into the “needs improvement” area, the emphasis is on finding ways to improve the teacher’s practice.
“We want teachers to know that this evaluation system will not misrepresent their teaching since it is based on the collection of evidence. It is our hope that teachers will know that evaluators are their collaborators in this teacher-growth process and not ‘out to get them’ on their evaluation,” says Owen. “In the past, teacher evaluation was sometimes viewed with trepidation, but we want to have teachers open up and be very honest with themselves and their practice. In other words, let’s take away the ‘gotcha,’ and instead, let’s grow.”
Creating a teacher assessment system takes time, says Cindy Worner, principal at Scott Altman Primary School in Illinois’s Pekin Public School District. The district, which is also working within Danielson’s framework, spent almost a whole school year developing its system, meeting with teachers to decide what engaged learning looks like. Only then did they sit down and start generating the forms that would be used to assess that teaching.
“It’s a process,” says Worner. “It was important to take time up front.”
Running the Numbers
An example of the growing emphasis on student performance can be found in Delaware, a state which, along with Tennessee, was one of two winners in the first round of Race to the Top funding, but still had to amend its assessment system to bolster the student performance component.Delaware has had a statewide educator evaluation system in place since the 1980s, with a revised system it’s been following since 2008 that is based on Danielson’s Framework for Teaching, says Linda Rogers, the associate secretary for Delaware’s Department of Education.
Having that system firmly in place, along with three years’ experience using it, Rogers says, has helped with the latest development: revising a student growth component that, in part, represents an appropriate level of change in achievement data.
The student growth component, which is being rewritten to make it more rigorous, is divided into three parts. The first is a school-wide assessment measure such as adequate yearly progress (AYP), which applies to all teachers in the school. Second is a cohort measure based on proficiency goals. For example, if a school scores high in reading but needs improvement in math, then teachers set an individual goal built into that effort. A math teacher would focus on her students. A P.E. teacher, on the other hand, would identify a group of students who needed to improve math skills and work that into her curriculum. The third part of student improvement involves a non-state assessment measure such as the Iowa Tests of Basic Skills, or perhaps a district’s internal measure.
Part one counts for 30 percent of the student improvement score, part two for 20 percent, and part three for 50 percent.
“Our goal is to be rigorous but comparable across teacher groups,” says Rogers.
In Florida, one issue for schools is that although evaluation weighted 50 percent on student test scores such as the Florida Comprehensive Assessment Test (FCAT), those tests aren’t taken by K–2 or 12th-grade students. There also aren’t tests for subjects such as art and P.E. While they wait for tests to be developed, school officials have come up with ways to assign scores to teachers based on overall school performance.
The Problem With Scores
Scores are readily available and seem like objective, simple measures, says Patricia H. Hinchey, a professor of education at Penn State University. But they can be misleading, and they shouldn’t be relied upon too heavily when assessing teachers.
“Frankly, I believe test scores are incredibly harmful and stand directly in the way of real teacher effectiveness,” says Hinchey, who reviewed 275 articles and reports on teacher assessment for her research brief, “Getting Teacher Assessment Right,” published by the National Education Policy Center. “They undermine teacher professionalism and have produced a truly lamentable narrowing of curriculum in schools.”
Standardized test scores can vary widely from year to year, and they offer no information to help improve students’ performance. “We do need to improve teacher professionalism and effectiveness, but a punitive system that offers no specific information about what needs to change hinders, rather than promotes, improvement,” she says.
In designing a system, stakeholders need to decide what’s worth assessing and which tools can most reliably deliver crucial information. Also important are variables that affect student learning, many of them out of a teacher’s control. “If a child is going to bed hungry every night, it’s insane to think that a single classroom teacher in a single year can somehow magically bring that child’s learning in line with that of a child who has been taking violin lessons since the age of five,” says Hinchey.
She advocates a comprehensive assessment system made up of multiple components, including observation and reviews of student portfolios—a lot can be learned by looking at students’ progress between the beginning of the year and the end—as well as reviews of student surveys and teachers’ self-reports on classroom practices. As with any assessment, the more limited the input, the more limited the applicability of the results, says Hinchey.
Figuring out who the best and the worst teachers in a school are isn’t hard; administrators and students usually know exactly who those people are, says Hinchey. The key is paying attention to the middle group of teachers, who may not be perfect but would do better if they knew how.
Danielson doesn’t think test scores are irrelevant, but she sees the real question as “What counts as evidence, and how do you know that I’m the one who caused it?” For instance, a school’s reading scores may have risen for a variety of reasons, not necessarily due to the English teacher. But if a portfolio of student work shows marked improvement in comprehension and writing skills, then a picture begins to form.
In the Public Eye
The push to gauge, and improve upon, teacher performance takes place against a background of politics that can make it difficult for districts to avoid catering to public opinion when designing an evaluation program.
Hinchey believes, however, that a community is likely to accept “any system that’s well-designed and is clearly explained to them, as long as they believe it will remove true incompetents from the classroom, something it absolutely has to do, but fairly.” Outside of severely under-resourced communities, where staffing is generally a challenge, it’s common for residents to think the teachers in their own district are doing a good job, but feel “other districts have problems.” Moreover, she says, parents who were themselves subjected to standardized tests as students are often dubious about the tests’ accuracy as measures of student learning.
In some ways, the policy environment is helping, notes Danielson. “They are saying you must make accurate judgments. It’s not good enough when the children fail. If you’re going to get serious about teacher evaluations, then you really have to do it well because the stakes are so high.”
The good news, she adds, is that when principals and other evaluators learn how to make standards- and evidence-based assessments, everyone finds it valuable.
“Being observed and having a conversation about that teaching is powerful,” says Danielson.
Assessing Success: How to rate the raters.
You walk into a classroom and it is pin-drop quiet, every head down as students pore over their textbooks. A few doors down, you walk in and hear a babble of voices as students, split into groups, are talking to one another.
Which one looks like learning? It depends, say experts. Quiet doesn’t always equal quality, and there’s a difference between a class that’s loud because it’s out of control and one that’s loud because students are enthusiastically engaged in learning.
Being able to spot the difference, and then back up your conclusions with evidence, not emotion, is the hallmark of a good evaluator, the linchpin to a good teacher-assessment system.
“One of the big things we train for is understanding what student engagement is,” says educational consultant Charlotte Danielson. “There’s been such a long bias in favor of quiet and compliance. One of the things we know from the psychology literature is that’s not where the best learning happens. What does real learning look like? It’s about intellectual engagement.”
Unlike the old “drive-by” observation models where an administrator showed up with a checklist, a comprehensive teacher-assessment system requires evaluators who are trained to assess accurately, provide meaningful feedback, and engage teachers in productive conversations about their work, she adds.
Danielson has worked with video technology developed by Teachscape and ETS to design a certification system that trains and tests evaluators on their ability to judge lessons objectively and consistently—and back up their judgments with evidence.
Teaching is a complex skill, and judging that skill is no sinecure, she says. “The training you do to be a good evaluator, or even a good mentor, that training itself is extremely powerful professional development for those who do it.”
In Delaware, outside evaluators audit the statewide assessment on an annual basis, not only conducting paper screenings but also interviewing administrators and teachers. Based on that, state education officials make whatever adjustments are necessary. “It is not a static process, but one that evolves depending on the feedback,” says Linda Rogers, associate secretary for Delaware’s Department of Education.
At Florida’s Escambia County School District, which is implementing a new teacher assessment plan this school year, evaluators trained over the summer and will likely face annual recertification through an online course developed by Teachscape, says Karen Owen, the district’s director of staff development.
Knowing that evaluators are being taught how to do the job is making teachers feel more comfortable with the system, Owen says.
“No longer will it be, ‘Oh, my gosh, I got Mr. Jones as my evaluator this year, he’s so tough!’ If we do our job right and the certification course works, people are going to
all have to be doing it the same way.”