Our friend and frequent commenter KrazyTA has analyzed the response of the VAM Gang (Chetty, Friedman, and Rockoff) to the American Statistical Association’s pithy demolition of their famous and much praised justification for VAM.
Here is his analysis:
I urge viewers of this blog to read the recent response by Raj Chetty (Harvard University), John Friedman (Harvard University) and Jonah Rockoff (Columbia University) to a statement by the American Statistical Association (ASA) [2014] on VAM.
A pdf file of same (less than five pages hard copy) can be accessed at—
Link: http://obs.rc.fas.harvard.edu/chetty/ASA_discussion.pdf
The last paragraph of their response to ASA’s point #7 (p. 4):
“The ASA appropriately warns that “ranking teachers by their VAM scores can have unintended consequences that reduce quality.” In particular, it is possible that teachers may feel pressured to teach to the test or even cheat if they are evaluated based on VAMs. The empirical magnitude of this problem—and potential solutions if it turns out to be a serious concern—can only be assessed by studying the behavior of teachers in districts that have started to use VAMs.”
Immediately followed by the last paragraph of their response, in full (p. 4):
“In summary, our view is that many of the important concerns about VAM raised by the ASA have been addressed in recent experimental and quasi-experimental studies. Nevertheless, we caution that there are still at least two important concerns that remain in using VAM for the purposes of teacher evaluation. First, using VAM for high-stakes evaluation could lead to unproductive responses such as teaching to the test or cheating; to date, there is insufficient evidence to assess the importance of this concern. Second, other measures of teacher performance, such as principal evaluations, student ratings, or classroom observations, may ultimately prove to be better predictors of teachers’ long-term impacts on students than VAMs. While we have learned much about VAM through statistical research, further work is needed to understand how VAM estimates should (or should not) be combined with other metrics to identify and retain effective teachers.”
My initial reaction.
While they don’t use the term “Campbell’s Law” — IMHO, they are deliberately avoiding it — notice how they take the import and sweep of Campbell’s astute observation and reduce it to “responses such as teaching to the test or cheating” with the added proviso that “there is insufficient evidence to assess the importance of this concern.” *Note that in his testimony during the Vergara trial, Dr. Chetty on p. 547 casually dismissed this challenge to his VAM-based beliefs as “Campbell’s Conjecture.”*
Link: http://www.vergaratrial.com/storage/documents/2014.01.30_Rough_am_session.txt
This is critical. First, they reduce Campbell’s Law to a statement about individual morality and ethics—of the employees no less!—rather than something that involves whole institutions [e.g., the recent VA scandal or the Potemkin Villages of the now-vanished Soviet Union] and is created/mandated/enforced from the top down. Second, by doing so they avoid having to address the destructive effects of Management by the Numbers/Management by Objective/Management by Results, i.e., the very management philosophy of those funding their “research” and leading the charterite/privatization charge. Third, they literally discard the already large amount of evidence proving the accuracy and trustworthiness of Campbell’s Law re VAM [and its fuel/food, standardized test scores] by referring to it as “insufficient” — while their pronouncements, of course, even though they need “further work,” is the current Gold Standard.
So it is hardly surprising that they are hot and heavy for heading off potential problems in data corruption by “studying the behavior of teachers in districts that have started to use VAMs” when what is needed is to independently study, monitor and regulate the behavior of folks like administrators, school boards, heads of CMOs and charter owners/operators, the DOE, and those who employ people like Chetty, Freidman and Rockoff—they’re the ones that set the numerical goals/straightjackets that drive data corruption!
*While W. Edward Deming would come in handy here, someone else thought along the same lines: “When a measure becomes a target, it ceases to be a good measure.” [Charles Goodheart]*
The next is a bit perplexing. Apparently they don’t know how to use google and Amazon to find (among many such works) Sharon L. Nichols and David C. Berliner, COLLATERAL DAMAGE: HOW HIGH-STAKES TESTING CORRUPTS AMERICA’S SCHOOLS (2010, third printing) or Phillip Harris, Bruce M. Smith and Joan Harris, THE MYTHS OF STANDARDIZED TESTING: WHY THEY DON’T TELL YOU WHAT YOU THINK THEY DO (2011). Perhaps they permit themselves no newspapers, internet, or television either, hence testing scandals such as those in Washington, DC and Houston, TX and Atlanta, GA (just to name a few) escaped their notice completely. Also, the above authors and many others, like Audrey Amrein-Beardsley (see her recent RETHINKING VALUE-ADDED MODELS IN EDUCATION: CRITICAL PERSPECTIVES ON TESTS AND ASSESSMENT-BASED ACOUNTABILITY, 2014) can be contacted by email. Is it too much to ask of those claiming to be researchers that they take the time and make the effort to, er, get the contact information they need to make sure their research is done properly?
In their response to ASA point #7 they quote the ASA to the effect that “Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality” (p. 3). The trio start off as best they can by stating that “The ASA is correct in noting that the majority of variation in student test scores is ‘attributable to factors outside of the teacher’s control,’ and that this ‘is not saying that teachers have little effect on students.’” Wait! You can read the rest for yourselves but a fly in the ointment—or the elephant in the room—when you’re in a debate is that when you concede the most critical point you lose the argument.
Since Chetty/Friedman/Rockoff didn’t dispute the 1% to 14% assertion then I would like to point out that I would be awfully interested in knowing why they’re ignoring the other 99% to 86%. Could it be that it’s poses intractable difficulties to their VAManiacal beliefs?
My very last point. Chetty/Friedman/Rockoff don’t understand that even under the most favorable circumstances, high-stakes standardized testing measures very little, is inherently imprecise, and is used for purposes so inappropriate to its few strengths that it needs to be junked. Take out of the Chetty/Friedman/Rockoff response those terms referring to “test scores” and the like and, well, the whole thing falls apart. Those “vain and illusory” [thank you, Duane Swacker!] numbers/stats are the glue that holds VAM together, the fuel that keeps VAM moving ahead, the food that sustains its very existence.
The Golem of VAM reverts to its inert form when you remove the magic of Testolatry.
Perhaps they should have taken that class in ancient Greece rather than Bean Counting For $tudent $ucce$$—
“I have often repented of speaking, but never of holding my tongue.” [Xenocrates]
Or if you prefer another very old, very dead and very Greek guy:
“Words empty as the wind are best left unsaid.” [Homer]
Take your pick. Odds are you won’t go wrong. [a numbers/stats joke…]
😎
P.S. I leave it to readers of this blog to read the triad’s response and make their own judgments and comments.