Measuring Good Teachers
It’s the million-dollar question. How do you fairly evaluate your most valuable asset—teachers?
When the eight-day Chicago teachers’ strike ended last fall, organized labor celebrated a big win, but not over salary raises. The new contract bumped teacher pay by about 18 percent—far short of the 30 percent increase the union was seeking.
Instead, teachers claimed a victory over Mayor Rahm Emanuel on an issue not directly tied to money. A major focus of the fight, it turned out, was evaluation—specifically, how much of a teacher’s evaluation should be based on student test scores.
Like most districts across the country, Chicago was required under state law to put a teacher evaluation system into place to qualify for Race to the Top funds. Emanuel, and district leaders, wanted to see test scores count for as much as 45 percent of teacher evaluations. In the end, the new contract calls for test scores to account for 30 percent of teacher assessments.
“Should we use tests to judge teachers? Yes, it’s patently a good thing to do,” says W. James Popham, professor emeritus at UCLA’s Graduate School of Education. “But only the right tests.”
How to measure good teaching has been a third-rail question since 2009, when Race to the Top legislation offered federal funds to states that required “substantial” evaluation systems. Two years later, the incentives got bigger. Districts that formally evaluated teachers stood to earn waivers from some No Child Left Behind penalties.
Since 2011, as many as 40 states have installed systems that try to measure good teaching. Test scores are part of the picture. But which test scores should count and how much? And what about the other measures of a teacher’s worth?
Measuring Good Teaching
Just a few weeks after Chicago’s strike, members of Newark’s teachers union approved a contract that pointedly addressed the evaluation question. The new contract includes a system that assesses the district’s 5,000 teachers every year based on four measures, including student achievement, and it awards merit pay for teachers who rank as “highly effective.”
Like many districts, teacher evaluations in Newark used to be based on one annual classroom observation by an administrator. The new system, now in its first year, uses four equally weighted core indicators of effective teaching: lesson design, rigor and inclusiveness, culture of achievement, and student progress toward mastery.
Newark’s approach is similar to systems that have been developed in other school districts, including Denver and New Haven. In each of those cities the evaluation process starts at the beginning of the school year, when teachers set goals for themselves. The goals can relate to classroom management or student achievement growth, or to other areas, including effectiveness with specific student populations or with certain content areas.
In Newark, every school formed an evaluation committee that includes two administrators and a peer teacher. Teachers meet with this committee during goal setting, at the midpoint, and again at the end of the year.
Most districts recommend at least two classroom observations for every teacher. In Newark, observers will watch to see how teachers rank in the identified areas of effective teaching, looking for evidence that teachers are giving students the tools they need to meet the standards.
“We shifted our focus more toward Common Core indicators,” says assistant superintendent Mitch Center. “For example, there is an indicator called Precision in Evidence. How well are our teachers providing instruction about precision and providing evidence?”
The framework, Center says, is not a script but a rubric to help all teachers identify the elements necessary for student learning, and to help administrators go deeper during evaluations.
Observers are encouraged to pay attention to how teachers plan and execute a lesson over time, how well they maintain student focus, and how they build the overall arc of a lesson sequence.
“We are saying, ‘Don’t look at anything in isolation,’” says Center. “Everything is part of a broader story. We’re trying to help administrators be in classrooms more and for teachers to think about curriculum maps.”
As for the question of where student test scores fit into the equation, in Newark, the answer is nowhere and everywhere.
The idea of “mastery” is sprinkled throughout each of the district’s core effective-teaching indicators, but none is based solely on test scores. In addition, the district’s rubric doesn’t designate any specific assessments for tracking student achievement.
“We are looking for a demonstration of learning, and similarly over time, at how teachers are monitoring growth,” Center says. “But what does this mean? This is the challenge that every district and school and state department of education is grappling with. We have no clean, simple answer, but we’re wading in and trying to gauge.”
And the scrutiny is paying off for some teachers. “Highly effective” teachers are collecting merit pay, some of it coming from an unlikely source: Facebook founder Mark Zuckerberg gave the district $100 million to reward its effective teachers.
Evaluation Tied to Development
Some districts are quite specific about what percentage of a teacher’s evaluation should come from student assessments, even as the debate over which test scores to use continues.
Unlike New Jersey, Colorado passed a state law requiring that a full 50 percent of district teacher evaluation systems be based upon student test outcomes. (The other half is based upon three factors: professionalism, student surveys, and observation.)
Three years ago, Denver launched a new district initiative, Leading Effective Academic Practice, to help develop the new evaluation system. One of LEAP’s core mandates is to figure out which, if any, existing assessments can be used to measure teacher quality. A team of teachers and administrators meet every month to discuss the issue.
“We are trying to make sure that we have a good grasp on this so that it is not a mystical calculation that nobody really understands,” says Theress Pidick, director of teacher effectiveness for Denver Public Schools.
The emphasis on test scores presents another major challenge for the district: how to create fair evaluations for the 70 percent of district teachers required to teach untested content areas or grades.
“We don’t want to crank out assessments for the sole purpose of evaluating teachers,” Pidick says. “We’re still trying to figure out the distinct component that can fall within that 50 percent—for example, teacher- and team-created assessments or school-wide measures that give an indication of how schools are doing.”
The second half of Denver’s evaluation system uses three factors identified by the Measurements of Effective Teaching (MET) Project, a study funded by the Bill & Melinda Gates Foundation that looked at seven districts (Denver among them).
Observation: Schools select peer teachers to observe their colleagues, matched by grade level and content area. All non-tenured teachers are observed twice a year by their peers and once by a school leader.
Professionalism: Teachers are evaluated for their practices outside the classroom, including levels of collaboration with colleagues.
Student surveys: Teachers of students in grades 3–12 are given a score based on student perception surveys given every spring. Questions are yes/no and extend to areas over which the teacher has little or no control, such as general school climate and safety. Part of the value of the surveys, says Pidick, is that data can be disaggregated by gender and ethnicity.
“A teacher may get a high score overall, but perhaps Latino girls might respond differently than white girls,” Pidick says. “The conversation then would be about how the teacher can be more culturally responsive.”
Denver’s program emphasizes professional development for their classroom instructors. Each indicator used to evaluate teacher performance is aligned with specific professional development materials, including exemplar videos and readings.
“This is about growth and providing information so teachers can get better at their craft,” says Pidick. “Our primary intention is to give data and support to assist practitioners in their growth so we have an effective teacher in each of our classrooms and our students benefit. We need to keep students first and foremost in mind.”
Every Teacher, Every Year
Many districts implementing more codified, systematic evaluation plans are doing so after years of very little evaluation. Before installing its new program, New Haven evaluated tenured teachers once every five years, using only classroom observation.
Like many districts, New Haven had been using Charlotte Danielson’s Framework for Teaching Evaluation Instrument, but in practice, evaluations were based on a snapshot of classroom performance.
“I could have gone into a classroom and looked once. It could have been a good day or a bad day. That’s not a true picture of what happens in a classroom,” says Michele Sherban-Kline, director of teacher evaluation and development for New Haven Public Schools. “It wasn’t that useful, and there was no rating associated with it. To see how [teachers] were doing, you’d have to read between the lines.”
Under its new system, launched in 2010, New Haven requires teachers to set professional goals at the beginning of the school year. For teachers of grades 4–8, one of these goals must be directly tied to standardized test outcomes. The district does not use value-added data, but looks mainly at previous growth for each grade level. So the goal could relate to a grade level’s vertical score in reading or math, looking at the data from year to year, with the aim of either maintaining or increasing growth by a certain number of points. Or the goal could look at bands of achievement, with the target to move a grade up from basic to proficient.
The system requires evaluators to observe teachers in the classroom at least once before the midyear meeting.
“Administrators wanted to get into the classrooms again,” says Sherban-Kline. “Because it wasn’t required, observation took a backseat. Now, principals are dedicating the time, delegating other responsibilities, so they can observe the teachers in their building.”
In New Haven and Newark, there are new evaluation systems for administrators, too. Sherban-Kline says that principals are evaluated in part on how well they coach their teachers.
“We are developing training for administrators that focuses on their responsibility for teacher growth and development,” Sherban-Kline says. “Everybody is held accountable for student learning growth.”