Nov 23, 2009

I Nominate Harry [LINK]

There has been a growing scandal centered around the University of East Anglia Climate Research Unit (CRU), an important center of research into the theory of human-influenced climate change, a.k.a. anthropogenic global warming (AGW). Most importantly, CRU maintains the Global Climate Dataset, the raw historical weather station data that forms the primary source for climate change estimates generated by the United Nations Intergovernmental Panel on Climate Change (IPCC).

On November 19, someone, either a hacker or (more probably) a disgruntled CRU insider, surreptitiously posted a 65-megabyte archive, titled, for "Freedom of Information." It features a selection of emails from among CRU's principal scientists dating back to 1996, along with much associated data, code, and documentation, all of which are now being widely scrutinized. Amidst denunciations that the files were illegally obtained, several of the scientists involved have corroborated that many are genuine, and there have as yet been no challenges to the authenticity of any. I believe it unlikely at this point that any of the material will turn out to have been fabricated.

So far, the emails have generated the most comment. Writing for Pajamas Media, Charlie Martin identifies three distinct scandals they reveal, and provides supporting links to specific files for each assertion:

The emails suggest the authors co-operated covertly to ensure that only papers favorable to CO2-forced AGW were published, and that editors and journals publishing contrary papers were punished. They also attempted to "discipline" scientists and journalists who published skeptical information.
The emails evidence a remarkable hostility directed towards climate skeptics. In one case, the death of a skeptic is described as "cheering news."
The emails suggest that the authors manipulated and "massaged" the data to strengthen the case in favor of unprecedented CO2-forced AGW, and to suppress their own data if it called AGW into question.
In particular, the emails show a good deal of concern over how to obscure a recent worldwide cooling trend, along with a period 1,000 years ago called the "medieval warm period" in which temperatures rose, with no influence from carbon dioxide, to the point that Greenland could support agriculture.
The emails suggest that the authors co-operated (perhaps the word is "conspired") to prevent data from being made available to other researchers through either data archiving requests or through the Freedom of Information Acts of both the U.S. and the UK.
In particular, the Canadian statistician Steve McIntyre, who publishes the Climate Audit blog, has repeatedly tried to obtain data from CRU, and has repeatedly been turned down. McIntyre was instrumental in discrediting the "hockey stick" graph that purported to show a sudden upward spike in recent temperatures, a graph that featured prominently in Al Gore's documentary, An Inconvenient Truth.


No doubt, the revelations from the CRU files represent a major scientific scandal that should end some careers in disgrace and perhaps even generate a handful of criminal convictions. It should certainly prompt a thorough review of the current state and quality of climate science across the board. But it would be a mistake to focus too closely on the bad behavior of particular scientists. Even if they had behaved admirably in all other respects, it is now becoming apparent as well that the overall quality of the CRU data is itself quite poor. The likelihood that the debate over massively consequential policy proposals, such as the American "cap-and-trade" bill, might rest even in part on such poor data, is nothing short of alarming.

The blogger Devil's Kitchen posted an extended summary of the contents of one of the files found among the archive's non-email documents directory. The post mainly reproduces and summarizes the highly critical (and salty) comments of one Asimov posted to The file in question is called HARRY_READ_ME.txt, a very long text file (15,000 lines, three quarters of a megabyte) that represents a log written by a CRU computer programmer named Ian Harris ("Harry") detailing the extraordinary efforts he went through from 2006-2009 to make sense of CRU's set of raw weather station data.

Mr. Ian "Harry" Harris, "data manipulator"

While the technical content of the file is often esoteric, what comes across very clearly is just how out of his depth Harry was in figuring out the "piles and piles of undocumented and inconsistent datasets," and how ad hoc his solutions were. At one point Harry declares flatly: "There is no uniform data integrity, it's just a catalogue of issues that continues to grow as they're found." Jumping to random points within this file, you will find many such red-flag statements. Not only can CRU researchers not provide others with the tools to reproduce their results, they cannot themselves reproduce them. Harry's unfortunate and highly questionable task appears to have been to produce Fortran code that takes raw weather station data, matching it to a set of already published results that were produced by some other undocumented process. The format of this varying input, such as the time and place of each sample value, is often a complete mystery. The text is extraordinary in the series of bald assumptions and seat-of-the-pants hacks it reveals. It's not too strong to say that some of CRU's global temperature records were simply made up.

But consider the context in which this data is used. Predictions over future climate change are the result of sophisticated computer models that represent a complex set of variables, ranging from the effects of cloud cover to ocean circulations. There is a great deal of debate over how best to represent these variables. As a highly complex nonlinear system, these climate variables have to be calculated for relatively tiny areas across the Earth, with each set of results affecting surrounding areas dynamically as the model progresses. Given the complexity of the assumptions that go into making such calculations, consider what happens when what you should be able to assume is your most solid set of input data, the actual historical record of past temperature data worldwide, turns out to have been the product of questionable processes. As a result, it is no wonder that CRU has in the past refused to share its data.

For now, I nominate Harry as the most likely source of the leaked CRU documents. To research the unknown process whose results he was trying to match probably meant he had to review a wide range of past correspondence from the CRU server. If he has the patience to rummage through all this code, he certainly has the patience to do the same for ten years of emails. Reading his log file, you get the strong sense that he understands his own limitations in comprehending the data, and that the quality of his work suffers as a result. You get the strong sense that Harry has a conscience. But then, it could be anyone. I'm just guessing. You know, making it up as I go.


Asimov said...

I'm the Asimov from tickerforum. I want to apologize for my language. I was rather upset and, well, I'm not generally circumspect in expressing my emotions.

Anyway. My appologise for that, but not for what I posted. It's a travesty of the scientific method, and they need to BURN for this.

sierra said...

No need, Asimov; you were certainly provoked. Thanks for the analysis, and for slogging through that wretched README file.

PJP said...

"Harry" really and truly has my sympathy.

To be faced with that mess and stick with it for three years shows serious dedication to his job.

He almost certainly had people regularly on his back looking for results, and although some of the shortcuts, assumptions and "tricks" he used to obtain results are far from the best practices, he probably had no choice.

I certainly hope he doesn't become the scapegoat in this mess.