Sunday, May 30, 2010

Guest post by Bo Christiansen: On temperature reconstructions, past climate variability, hockey sticks and hockey teams

Bo Christiansen from the Danish Meteorological Institute works actively on the development of statistical methods for climate reconstructions, a field of intense debate in the past few years. Hopefully we will enter a phase in which scientific debates remain .. well, in the scientific realm. Enjoy his post.

Anthropogenic emissions of greenhouse gases - in particular CO2 and methane - change the radiative properties of the atmosphere and thereby impose a tendency of heating at the surface of the Earth.  In the past the Earths temperature has varied both due to external forcings such as the volcanic eruptions, changes in the sun, and due to internal variability in the climate system.  Much effort has in recent years been made to understand and project man-made climate change.  In this context the past climate is an important resource for climate science as it provides us with valuable information about how the climate responds to forcings. It also provides a validation target for climate models, although paleoclimate modelling is still in its infancy.  It should be obvious that we need to understand the past climate variability before we can confidently predict the future.

Fig 1. Pseudo-proxy experiments with seven different reconstruction methods. The black curve is the NH mean temperature, the target which we hope the reconstructions will catch. But this is not the case: All reconstructions underestimate the pre-industrial temperature level as well as the amplitude of the low-frequency variability. Note that the reconstructions are very good in the last 100 years which have been used for calibration. The three panels differ in the strength of the variability of the target. From Christiansen et al. 2009.

Unfortunately, we do not have systematic instrumental measurements of the surface temperature much further back than the mid-19th century. Further back in time we must rely of proxy data. The climate proxies include tree rings, corals, lake and marine sediment cores, terrestrial bore-hole temperatures, and documentary archives. Common to all these sources is that they include a climate signal but that this signal is polluted by noise (basically all non-climatic influences such as fires, diseases etc.). From these different noisy proxies information such as the global mean surface temperature is sought extracted. A famous and pioneering example is the work by Mann et al. 1998, in which the mean NH temperature is relatively constant with a weak decreasing rend from 1400-1900 followed by a sharp rise in industrial times - the so-called "hockey stick". There has been much debate about this reconstruction, and its robustness has been questioned (see e.g.). However, some other reconstructions have shown similar shape and this has encouraged some to talk about the 'hockey team' (e.g., here). This partial agreement between different reconstructions has also led to statements such as 'It is very likely that average Northern Hemisphere temperatures during the second half of the 20th century were higher than for any other 50-year period in the last 500 years' by the IPCC. That different reconstructions show a 'hockey stick' would increase its credibility unless the different reconstructions all shared the same problems. We shall see below that this is unfortunately the case. All proxies are infected with noise. To extract the climate signal - here the NH mean temperature - from a large set of noisy proxies different mathematical methods have been used. They are all, however, based on variants of linear regression. The model is trained or calibrated by using the last period where we have access to both proxies and instrumental data. This calibration period is typically the last 100 years.  When the model has been trained it is used to estimate the NH mean temperature in the past (the reconstruction period) where only the proxies are known.  To test such methods it is useful to apply them to long simulations from climate models. Like in the real-world situation we split the total period into a calibration period and a reconstruction period. But here we know the NH mean temperature also in the reconstruction period which can therefor be compared with the reconstruction.  The proxies are generated by adding noise to the local temperatures from the climate model. The model based scheme decried above is known as the 'pseudo-proxy' approach and can be used to evaluate a large number of aspects of the reconstruction methods; how the different methods compare, how sensitive they are to the number of proxies, etc. Inspired by previous pseudo-proxy studies we decided to systematically study the skills of seven different reconstruction methods. We included both methods that directly reconstruct the NH mean temperature and methods that first reconstruct the geographical distributed temperatures, The method used by Mann et al. 1998 was included as well as two versions of the RegEM method later used by this group. Perhaps surprisingly the main conclusion was that all the reconstruction methods severely underestimate the amplitude of low-frequency variability and trends (Fig. 1). Many of the methods could reproduce the NH temperature in the calibration period to great detail but still failed to get the low-frequency variability in the reconstruction period right. We also found that all reconstructions methods has a large element of stochasticity; for different realization of the noise or the underlying temperature field the reconstructions are different. We believe this might partly explain why some previous pseudo-proxy studies have reached different conclusions.

It is important to note the two different kinds of errors which are examples of what is known in statistics as ensemble bias and ensemble variance. While the variance may be minimized by taken the average over many reconstructions the same is not true for the bias. Thus, all the reconstruction methods in our study gave biased estimations of the low-frequency variability. We now see the fallacy of the 'hockey team' reasoning mentioned above; if all reconstruction methods underestimate the low-frequency variability then considering an ensemble of reconstructions will not be helpful.
The question that arises now is if the systematic underestimation of low-frequency variability can be avoided. Based on an idea by Anders Moberg and theoretical considerations I formulated a new reconstruction method, LOC, which is based on simple regression between the proxies and the local temperatures to which the proxy is expected to respond. To avoid the loss of low-frequency variance it is important to use the proxy as the dependent variable and the temperature as the independent variable. When the local temperatures have been reconstructed the NH mean is found by averaging. Pseudo-proxy studies (Fig. 2) confirms that the low-frequency variability is not underestimated with this method. However, the new reconstruction method will overestimate the amplitude of high-frequency variability. This is the price we must pay; we can not totally remove the influence of the noise but we can shift it from low to high frequencies. The influence of the noise on the high-frequency variability can be reduced by averaging over many independent proxies or by smoothing in time. 
Fig. 2 Pseudo-proxy experiment showing the ability of the new method (LOC, blue curve) to reconstruct the low-frequency variability of the target (black curve). Here the target has been designed to include a past warm period. This past warm period is well reconstructed by the new method but not by the two versions of the ReEM method. From Christiansen 2010.

 I have applied the new reconstruction method, LOC, to a set of 14 decadally smoothed proxies which are relatively homogeneously geographically distributed over the extra-tropical NH. This compilation of proxies was used in the reconstruction by Hegerl et al. 2007.  The proxies cover the period 1505-1960, the calibration period is 1880-1960, and observed temperatures are from HadCRUT2v.  The result is shown in Fig. 3 together with eight previous reconstructions.  The new reconstruction has a much larger variability than the previous reconstructions and reports much colder past temperatures. Whereas previous reconstructions hardly reach temperatures below -0.6 K the LOC reconstruction has a minimum of around -1.5 K.  Regarding the shape of the low-frequency variability the new reconstruction agrees with the majority of the previous reconstructions in the relative cold temperatures in the 17th century and in the middle of the 19th century as well as in the relative warm temperatures in the end of the 18th century. I consider these real world results mainly as an illustration of the potential of the new method as reconstruction based on decadally resolved proxies are not particularly robust due to small number of degrees of freedom. Work is in progress to apply the new method to an annual resolved and more comprehensive proxy compilation. 
Fig. 3 The new real-world reconstruction (thick black curve) shown together with some previous reconstructions. All reconstructions are decadally smoothed and centered to zero mean in the 1880-1960 period. From Christiansen 2010.

Where does all this lead us? It is very likely that the NH mean temperature has shown much larger past variability than caught by previous reconstructions. We cannot from these reconstructions conclude that the previous 50-year period has been unique in the context of the last 500-1000 years. A larger variability in the past suggests a larger sensitivity of the climate system. The climate sensitivity is a measure how how much the surface temperature changes given a specified forcing. A larger climate sensitivity could mean that the estimates of the future climate changes due to increased levels of green-house gases are underestimated.


Bishop Hill said...

Presumably if the climate sensitivity is even higher than previously thought then the forecasts published by the IPCC around the time of AR3 are even more wrong than we thought?

Poiuyt said...

"To avoid the loss of low-frequency variance it is important to use the proxy as the dependent variable and the temperature as the independent variable."

Could you explain this more, or give a reference?

Also, how does inexactness of the temperature measurements affect your analysis?

Unknown said...

"It is very likely that the NH mean temperature has shown much larger past variability than caught by previous reconstructions."

If you had independent evidence for that, it would help your case. Periods of cold are usually accompanied by famines, wars and major migrations. In the 17th century, there were enough literate people to record the effects.

Anonymous said...


Allmählich verstehe ich dieses wissenschaftliche Kauderwelsch ja ein wenig. Wäre es nicht möglich einen Artikel wie diesen einmal auf Deutsch zu verfassen, bitte? Und zB. die Ausdrücke "low-frequency" usw. einmal genau zu erklären (in diesem Zusammehang).

Man hat den Eindruck dass verschiedene Rekonstruktionen sich gegenseitig aufheben.

Als Laie wagt man es jedoch nicht oft konkrte Fragen zu stellen, weil man sich nicht zum Affen machen will und leider auch sehr oft herblassend von einigen Wissenschaftlern bzw. Möchtegerns behandelt wird.

Ich denke nicht, dass wir Laien zu dumm sind es zu verstehen, wenn man es uns erklärt.

Ich hoffe jetzt seit Jahren Informationen zu den Themen zu bekommen, die nicht von entnervenden Anschuldigungen, Beleidigungen, Schummeleien und Lügen durchwirkt sind.

Allein konkrete Fragen zur mittelalterlichen Warmperiode oder zum Hockeystick, bzw. Breughel und dem Eislaufen in Holland, werden meist schon mit groben Beleidigungen, Unterstellungen und Diffamationen bedacht.

Da sie der einzige Blog sind, den ich kenne, der vielleicht wirklich (unbiased) Aufklärung betreiben will, wage ich es diese Frage hier zu stellen.

Danke im Voraus und alles Gute


Bayesian Empirimancer said...

While I would call this progress, it is important to note that the ubiquitous use of ad hoc statistical methods (like this one) has been to the detriment of scientific inquiry in general and climate science in particular. Until practice ends i will always find these types of reconstructions unconvincing.

Specifically, it is shown here that this method works 'better', but there is no indication of why? I would like to know exactly what are statistical properties of the proxies (and measured temperatures) that makes it work better. To my knowledge, only proper generative modeling coupled with bayesian model comparison can address these kinds of questions. Using 'the proxy as the dependent variable and the temperature as the independent variable' has the flavour of this kind of model, but this is still a rather impoverished approach when compared to methods which are commonplace in modern statistical analysis and machine learning.

corinna said...

Very interesting post!

However, I cannot follow the conclusion in the last two sentences:

"The climate sensitivity is a measure how much the surface temperature changes given a specified forcing. A larger climate sensitivity could mean that the estimates of the future climate changes due to increased levels of green-house gases are underestimated."

The larger climate sensitivity could also mean that the natural climate variability is underestimated by the climate models. Attribution of the temperature increase to greenhouse gas forcing is not imperative.
The implications of these results for the climate projections could be understood only if this is used to test and validate the climate models, and to investigate the sensitivity of the climate models to different parameterisations.

Bo Christiansen said...

@Bayesian Empirimancer: The properties of the new method is explained in more details in the submitted paper. A preprint is here:

For one-dimensional regression the bias and ensemble variance can be expressed analytically. It can easily be seen that choosing the proxy as the dependent variable will remove the bias.

Bayesian methods can be used to find probability distributions of the parameters of the model and the reconstructions. These distributions will estimate the ensemble spread but they will not address the effect of the underlying regression model. Choosing an wrong regression model will still give biased reconstructions
even in the Bayesian frame work.

@Poiuyt: See the reference above. The main source of noise comes from the fact that proxies include a lot of other variability than that coming from a variable climate. I don't think noise on the temperatures (used for calibration) is very important compared to the other sources of noise. There is also a more philosophical issue: If the temperatures in the recent period are noisy should we then compare them with noise-free temperatures in the past?

@Toby: Historical recordings undoubtedly include important information about the local climate. But when the concern is about the NH mean temperature the situation is that local temperature variations often balance each other. When it is relatively cold in one place it is
relatively warm another place (like for the well known North Artlantic Oscillation, NAO). Historical recordings will therefore have to be included in reconstructions in the same way as other climate proxies.

Bo Christiansen said...


Climate variability is a mixture of internal variability and forced variability. So, yes, the results from the new reconstruction could also mean that the natural climate variability is underestimated by the climate models.

It is worth noting that there often is a connection between the size of internal variability and the response to external forcings (the fluctuation dissipation theorem). So a larger past natural variability would also imply a larger climate sensitivity.

Georg Hoffmann said...

Thank you for this interisting posting.
Two questions:

What do you think about non regressive methods? (Analog method by Joel Guiot for exemple)

How have you created the warm period in Fig 2 at 1200 in the target model simulation?

Bayesian Empirimancer said...

Thank you for pointing me to your PDF. Equations always make things easier to decipher. For example, it's now clear that you are doing a variation on factor analysis.

More to the point, the methods of Bayesian inference don't just provide distributions over parameters and reconstructions. They have two other very important and relevant features.

(1) Bayesian modeling is explicit regarding assumptions about the relationships between various parameters and observables as well as the prior assumptions. Ad hoc algorithms are not explicit and while many ad hoc algorithms can be derived from an explicit set of assumptions, this is rarely done.

Because, Bayesian inference requires the construction of inference algorithms which are consistent with model assumptions this is not a concern.

(2) Moreover, one can do proper model comparison and determine WHY one model is better than another. In the work under discussion here, the authors seems to be arguing that a shared noise model which results from an error in variables model out performs an independent noise model. But the results are not certain. After all, there is no guarantee that the set of ad hoc inference algorithms applied to the data actually invert those generative models correctly.

So why is this method better? What are the properties of the data (as opposed to the properties of the algorithm) which make it better? We have no idea.

Adam Gallon said...

All well & good having a new method, but if it's still relying on the "treemomoeters" and other such questionable proxies for past temperatures, then it's still going to be a "GIGO" product.

Bo Christiansen said...

@George Hoffmann
In analogues techniques the reconstructed temperature is a weighted mean of the observed temperature over a number of situations in the calibration period. The weights depend on how similar the proxies in the calibration period are to the proxies in the reconstruction period. I am not very familiar with such methods but I would think that the reconstructions would be very constrained by the climate in the calibration period. This means that climate that is not represented in the calibration period would be difficult to reconstruct, e.g. previous periods with warmer or colder temperatures than in the present period.

I would be very cautious when using non-linear methods as they are harder to interpret and even more vulnerable to over-fitting than linear methods

The temperature fields with previous warm periods were generated as follows. First I performed an EOF analysis of the original temperature field from the climate model. I then removed a polynomial trend from each of the PCs. Then a phase-scrambling technique was used to generate surrogate PCs. These surrogate PCs have the same auto- and cross- covariance structure as the original detrended PCs. The surrogate PCs were then all multiplied by the same time-varying profile. Finally the trends were added to the surrogate PCs and they were recombined to give a new surrogate temperature field.

@Adam Gallon All reconstruction methods rely on proxies. That is all we have. But the pseudo-proxy experiments can show us about the consequences of using noisy proxies. They can inform of us about the error bars and possible biases. Based on pseudo-proxy experiments we can compare the different reconstruction methods.

@Bayesian Empirimancer
You probably know more about Bayesian statistics than I do. But if you fit a linear model to a banana shaped distribution of points then the Bayesian approach will still give you biased predictions. I am only aware of one attempt to use such methods for global scale climate reconstructions, Tingley and Huybers to appear in Journal of Climate (find it here

The new reconstruction method is better because it models the proxies as function of the local temperatures. See Appendix A of my paper.

happyman said...

The AR4 models are not calibrated against the reconstructions. So, their climate sensitivity is actually rather independent of the reconstructions. However, "the models" are qualitatively validated against NH temperature reconstructions (AR4 fig 6.13). Note, that these models are not the same as used for projections and could therefore have a completely different climate sensitivity. The validation figure however, tries to show that our understanding of the climate system is adequate to model the past climate to within uncertainty (my personal reading of the figure).

However, other climate models have been compared to the NH reconstructions. See AR4 fig 6.13. It is clear that no models would be able to fit the LOC reconstruction. So, If the century scale amplitude the LOC reconstruction is correct then there is clearly something which is not captured by the models. It could be that the model climate sensitivity is too low. It could also be that the forcing is wrong, or that internal variability is not well represented. When you go looking for the error it makes sense to look for the weakest link. I believe that the radiative forcing reconstructions are less uncertain than the proxy temperature reconstructions. Ask yourself what do you believe to be the weakest link. Or conversely perhaps you should ask yourself which is the strongest link and use that as your best guess for how the world behaves.

Patrik said...

Very interresting.
Are you planning on moving further back in time with this new method - maybe as far back as ~1000 BP?
That would be very interresting to see.

Another question: How come your curve doesn't cover all of the intstrumental period, up until ~2000?
Maybe that's covered in the post, in that case I've missed it. :)

Bo Christiansen said...

The reconstruction is based on decadally smoothed proxies from Hegerl et al. 2007. These proxies ended in 1960, probably in part because of the divergence problem concerning tree ring proxies (

Yes, I am working on reconstructions going back to 1000 BP. These reconstructions will be based on annually resolved proxies.

Georg Hoffmann said...

thank you for your answer. I just found the corresponding paper on your webside.
Just one more question about your "artificially" constructed warm period. Your answer sounds to me that leading patterns of natural variability went into the "warm period" in more or less equal pourcentages. However a warming has a specific pattern (at least in GCMs) that is land/ocean and high/low latitudes. So is this pattern dominating as well in your warm period at 1200?

Bo Christiansen said...

@Georg Hoffman
The artificial warm period is constructed by scaling the climate modes as found by the EOF analysis. The modes that "explain" most of the variability in the original field are also those that will constribute most to the warming. I have tried different schemes - e.g. allowing the dominant mode to contribute proportionally more to the artificial warming - without much difference in the results.

PolyisTCOandbanned said...

I am not a statistician nor a mathematician, but this approach of individual calibration of the proxies (versus local) and then summing them (perhaps geoweighted) seems obvious. Seems like it would be the simplest and first way to go after the problem. Versus all the RegEM and teleconnextions and geofields and the like.

boballab said...

Toby said... 3 "It is very likely that the NH mean temperature has shown much larger past variability than caught by previous reconstructions."

If you had independent evidence for that, it would help your case. Periods of cold are usually accompanied by famines, wars and major migrations. In the 17th century, there were enough literate people to record the effects.

From 1618 to 1648 there was war, famine and disease all across Western Europe due to the 30 Years War.'_War

At the time there was very few Europeans in North America with the majority located in the Spanish holdings in Mexico and the Caribean