Saturday, July 24, 2010

A mistake with consequences ?

A new paper in press in Journal of Climate by Jason Smerdon from the Lamont Doherty Earth Observatory and collaborators documents surprising, and somewhat inexplicable, errors in some previous pseudo-proxy studies by Mann and collaborators.

Background: The reconstruction of past climates ís based on the analysis of biological or geochemical archives, like tree-rings, which may be sensitive to environmental conditions. These archives need to be somehow translated to variations in temperature or precipitation. This is mostly accomplished by statistical methods: in the simple case that the annual variations in tree-ring widths are correlated to variations of local summer temperatures, this correlation allows for a calibration of tree-ring widths in terms of mean growth-season temperature, which can be then extrapolated back into the past. Assuming it remains valid also for long-term climate variations, it can be used to reconstruct past temperatures. One of the question that always remains open is whether or not these empirical methods also provide a reasonable answer at longer time scales. This is not guaranteed. The archives may had reacted differently to environmental changes in the past. Also, although a statistical method may perform well for the rapid interannual variations scales, it may not be that good at longer timescales, such as decadal and centennial variability
In particular, the performance of statistical reconstruction methods is not easy to test at longer time scales, since the observational record is obviously short. This is the rationale of pseudo-proxy studies, which use artificial data generated in long climate simulations. In these simulations, the target variable to be reconstructed, is known. If artificial proxies could be also somehow generated within the climate simulation, the reconstruction method could be applied to these pseudo-proxies and the result could be compared with the known target temperature. A pseudo-proxy can be approximately  generated  by taking the temperature simulated at one of the model grid-cells and distorting these data with random numbers to emulate the imperfect relationship between real temperatures and real tree-ring widths.
There are several pseudo-proxy studies around that have tested in this way different climate reconstructions methods. The conclusions of these studies sometimes diverge. One of the current controversies involves the Regularized Expectation Maximization method. Some groups (Christiansen et al, Smerdon et al ) have found that this method leads to a similar underestimations of past variations as found in early reconstruction methods, whereas other group including Mann and collaborators, here and here, find that the RegEM method performs well. The new paper by Smerdon et al has identified basic and surprising errors in the testing of the RegEM method by the Mann group. The errors are not difficult to understand and do not involve the implementation of the RegEM method itself: in one case, when interpolating the climate model data onto a different grid, the data were rotated around the Earth 180 degrees, so that model data that should be located on the Greenwich Meridian were erroneously placed at 180 degrees longitude; in another case the data in the Western Hemisphere were spatially smoothed, while the data in the Eastern Hemisphere were not. These errors bear some consequences: the location of the pseudo-proxies did not match the location of real proxies any more; the spatial covariance of the temperature data was not correct; when the authors though they they were testing the skill of the method to reconstruct the temperature in the ENSO region by proxies located in North America, they were actually testing the reconstruction of temperature in other region by proxies located somewhere else.
Stuff happens and errors like these can creep in every study by any group. There is no reason whatsoever to think that malice played any role here. It is, however, surprising that these errors went undetected for several years, affecting two key manuscripts that used the same data sets. My conclusion is that those stress-tests were really too weak to detect any possible low performance of the method.
The most recent climate reconstructions by Mann et al published in Science in 2009 were conducted with the RegEM method, supported by the good skill displayed by the RegEM in those previous stress-tests. I am unsure as to what extent those reconstructions may be now compromised. Interesting food for thought for the authors of the next IPCC Report.

Update: Rutherford et al. have a response here, of which I ignore the status (in preparation, accepted, in press..)


Belette said...

There is a response by Rutherford et al. that you may want to point to:

Also, are you sure you didn't mean "A Mistake with Repercussions" (

Anonymous said...

In the response Belette points to, Mann and the others make it very clear how annoyed they are that this information saw the light of day without their having a chance to put their own gloss on it:

"...we are puzzled as to why, given the minor impact the issues raised actually have, the matter wasn't dealt with in the format of a comment/reply. Alternatively, had Smerdon et al. taken the more collegial route of bringing the issue directly to our attention, we would have acknowledged their contribution in a prompt corrigendum. We feel it unfortunate that neither of these two alternative courses of action were taken."

This is funny, given Mann's usual response to any critical anaylsis of his work.

eduardo said...

Thank you for pointing to this response. I have now updated the weblog.

In their response, Rutherford et al. acknowledge both errors identified by Smerdon et al: the temperature fields were rotated 180 degrees and the spatial interpolation was wrong.

They also assert that the errors have been corrected in subsequent studies. And yet, Rutherford et al. continue to show the wrong NH mean temperature simulated by the ECHO-G model - compare figure 1a in the manuscript by Rutherford and Figure 5b in Smerdon et al 2010. It is obvious that these error have not be corrected.

PolyisTCOandbanned said...

I looked at the two figures but can't get the point. Could you spell it out explicitly?

Also, I think Rutherford are changing procedure TTLS vice Ridge. So that may be the reason (not sure if this is legit). Just everything is not apples...

Rattus Norvegicus said...


The change in method was pointed out in a reply to a comment which made basically the same point (Smerdon and Kaplan, Journal of Climate, 2007).

Belette said...

They also assert that the errors have been corrected in subsequent studies. And yet, Rutherford et al. continue to show the wrong NH mean temperature simulated by the ECHO-G model - compare figure 1a in the manuscript by Rutherford and Figure 5b in Smerdon et al 2010. It is obvious that these error have not be corrected.

Fair point. I asked about this, and the wrong figure was transcribed. Looking, the PDF response has been updated to show the correct figure.

Both pix are in fig 1 of the Rutherford et al. reply to Smerdon (; it looks like they transcribed the wrong one.

Anonymous said...

Hmmm...thinking about this; the rutherford comment acknowledges problems, but then concludes pretty much across the board that none of these errors had an impact upon outcomes. well, then why bother with the methods at all? if having data off by 180 degrees does not jack up your analysis, what will?

They also note no observable significant loss of variance (i am going by memory, having just read this). to me, the modeling looks like it does lose variance - peaks are not as peaked, valleys not as low. visually/observably.

in prediction models, these modest limits can play out unfavorably. the models are "trained" based on some set of predictors, and the temp record is reconstructed - similar to but less varied than the comparator/benchmark. so, the observed full range of imputs yield variance less than they should, to whatever slight degree.

if atmospheric carbon gets any weighting in the prediction, then looking forward, gets that same weighting, then this will lend too much predictive weight to atmoshperic carbon.

i have not read through the multiple articles involved yet, but it seems like the same phenomenon is involved here that was involved in the mann 98 article - train a model, including atmosph carbon, then take the current increase in atmosph carbon, and weight it per the model - you are guaranteed to get predictions of unprecedented high temps.

Belette said...

if having data off by 180 degrees does not jack up your analysis, what will?

You've misunderstood what this is about. This is about the tests of synthetic data. If you put the real data in the wrong places you get the wrong answer.

eduardo said...

TCO and Belette

the difference is in the amplitude of the Northern Hemisphere mean temperature. In the (uncorrected) version of Rutherford et al, as well as in Mann 2005 and 2007 the this time series displays larger amplitudes, for instance the minimum in the millennium is almost -1.5 K, whereas the true amplitude is rather -1 K. This is a not simply a minor point because one of the most vocal allegation of Mann et al was at that time that the variability simulated in the ECHO-G simulations was too large - it turns out that although it is indeed larger than in other simulations, Mann et al made it mistakenly 50% larger due to the wrong interpolation. As an aside, I think that the amplitude simulated by ECHO-G will turn to be in the end quite correct- may be a tick too large, as the models fits many local and regional reconstructions.

The more surprised I was when I saw that this error had not been corrected even now in Rutherford et al 2010, when it had been a point of public discussion long before.

Now to the perhaps most substantial point: there is a debate around the RegEM method, as I tried to explain in the weblog. To my knowledge, three (truly) independent groups have found that this method also leads to too small past variations (Smerdon et al, Christiansen et al and Riedwyl et al). Another group says that the method slightly underestimates past variations (Lee et al) and another group says that this is a good method (Mann et al). The RegEM method is difficult to implement, there are several variants (ridged regression, total least squares, truncated total least squares, and for all them there exist hybrid versions in which the data sets are previously filtered in two different frequency bands). It also involves the somewhat subjective choice of a few internal parameters. In short, a quite complex method. If I am now told that, in the calculations by one of the groups, the input data have been mistakenly rotated around the Earth and that the interpolation on the global grid is not correct even before the implementation of the method has started.. well, what I am going to believe ?

Roddy said...

Eduardo, comment 9, beautifully expressed.

Anonymous said...

I took a look at the comment

I just wonder why their Figure 1 reconstructions stop at year 1900? Especially since they mention that the calibration period is 1900-1980...