Point, Counterpoint — Teacher Eval Based on Test Scores

This comes to me courtesy of The Paper Graders, who got it from a colleague and posted it yesterday. Thank TPG!

Up and at ’em early this morning, both in an effort to get a start on my homework and because I wanted to have the time to blog about this article, Should Student Test Scores be Used to Evaluate Teachers, which was published on the Wall Street Journal’s website in late June.

To begin with, three cheers for The Wall Street Journal for hearing both sides, and a big thank you to Thomas Kane and Linda Darling-Hammond for being willing to speak both sides. What I would really love to see are their reactions to each other’s ideas, because they were published here as two separate pieces. However, this is a big step in the right direction — too much media is only publishing one side of this argument, and it’s often the uncritical pro-test, pro value-added side.

That said, you all probably know where I fall on this issue. Should student test scores be used to evaluate teachers? Absolutely not. Here are my reactions to each of their posts. I’ll start with Thomas Kane.

First, my gut reaction after reading one of his first sentences, which read:

Clear evidence for that conclusion comes from the Bill and Melinda Gates Foundation’s Measures of Effective Teaching project, which I lead.

…is there ANYTHING that the Gates Foundation DOESN’T fund related to high-stakes testing and value-added measures?! Diane Ravitch posted recently about the Gates Foundation, similarly stumped by the forcible hand they have in shaping so much education policy. I feel like I’m seeing the name “Gates” in places where it shouldn’t be appearing more and more often, and I don’t like it.

But now for his actual argument, which has some interesting, though perhaps weak, points. For example:

…despite some fluctuation from year to year, we have found that a teacher’s record of promoting achievement remains the strongest single predictor of the achievement gains of their future students. In such a ratings system, a teacher’s average may vary from year to year, but so do the batting averages of professional baseball players. In each case, the measure provides a glimpse (albeit imperfect) of future performance.

I’ve seen this cited over and over again — the teacher matters. We know this, of course. Teachers, according to many studies in recent years, can inspire students to achieve, can help students overcome obstacles related to their contexts, and can deliver instruction in innovative ways that meet the diverse needs of their students. What concerns me about this, however, is that he not only doesn’t give any of the numbers (“we have found”… okay, but how?), but that he says this as though teachers’ averages actually get treated as such — as averages that fluctuate. The reality right now is that teachers’ averages get treated as absolutes, not as “imperfect glimpses of future performance.” And Darling-Hammond points out just how imperfect these glimpses are:

…at best, teachers’ value-added ratings in one year predict only 25% of the variance in ratings in the next year, leaving 75% or more to be explained by factors such as who is assigned to a teacher’s class and what conditions he or she teaches under. The National Research Council and the Educational Testing Service, among other research organizations, have concluded that ratings of teacher effectiveness based on student test scores are too unreliable—and measure too many things other than the teacher—to be used to make high-stakes decisions… Unfortunately, federally imposed teacher-evaluation policies insist on using state tests that do not measure growth, are poor measures of higher-order thinking skills and penalize teachers of the neediest students.

First of all, way to go Darling-Hammond, who is typically a qualitative researcher, for giving us the numbers that Kane conveniently left out. Second, she’s right. Things like SES are strongly correlated with school funding (duh… anyone who has ever owned a home in a school district with a “good reputation” — or a “bad” one — knows that), and that correlation messes up the numbers, meaning that those “averages” Kane is so fond of above are exceedingly inaccurate measures of teacher “performance.” They’re just as likely to measure how often, on average, the teacher’s students got a good breakfast before they walked out their doors in the morning, or how often they are distracted by the siblings they need to take care of while Mom works second shift to get food on the table. You can “control” for these variables, statistically speaking… but not really, and only sort of. They are always there. They are always part of the statistic.

I need to wrap this up, because homework calls. Suffice it to say that I’m glad WSJ posted this, I’m glad they talked to both sides, and I think Kane’s and Darling-Hammond’s arguments are worth hearing side-by-side. This is the way we should have these conversations.

But if I’m being honest, I think we should have these conversations this way because they point out just how ridiculous our current policy moves are becoming. On the one hand, we have a name-dropping, overgeneralizing argument — on the other, a clear delineation of the previous argument’s flaws. Maybe I’m biased. Who am I kidding, I am biased. But I think the more of these dialogues that we see, the more obvious it should become that the value-added, test-driven road that we’re on is not the logical one.