Thursday, August 19, 2010

McShane and Wyner on climate reconstruction methods

Some readers have expressed their interest in discussion about the recent paper on climate reconstructions methods A statistical analysis of multiple temperature proxies: are reconstructions of surface temperatures reliable? by McShane and Wyner. It has also some indirect connection to our recent paper A noodle, hockey stick, and spaghetti plate: a perspective on high-resolution paleoclimatology by Frank et al., 2010, so I will try to give here my personal opinion.

McShane and Wyner are two statisticians that aim to analyze the statistical methods used in proxy-based climate reconstruction, but mostly focusing on the reconstructions by Mann and collaborators. Summarizing, this paper has two parts: in the first one, the authors present a critic of what the think are the reconstruction method used by 'climatologist'. This part is very weak. It seems that their knowledge of the papers by Mann et al is only indirect - from what they may have read in blogs - and that they actually did not read the papers themselves. They test and criticize a statistical method which, to my knowledge, has not been used for climate reconstructions, and in contrasts they barely mention the methods that are indeed used. If they had analyzed the latter methods, climatologist would have benefited much more strongly from this study. The second part, in which they focus on the estimation of uncertainties, is somewhat better. They claim that the uncertainties are much larger than those included in the 'hockey stick reconstruction' and that the shaft of the hockey stick is rather an artifact of the method. These conclusions are, however, hardly new. The flatness of the shaft is actually only defended by Mann et al. and more recently in a much weaker fashion then 10 years ago.
The introduction already contains a terrible and unnecessary paragraph, full of errors:
For example, Antarctic ice cores contain ancient bubbles of air which can be dated quite accurately. The temperature of that air can be approximated by measuring the ratio of ions and isotopes of oxygen and hydrogen.

Well, past temperatures are reconstructed from the isotope ratio of water molecules in the ice, and not from air in the air bubbles. The air bubbles themselves cannot be dated accurately, since air can flow freely in the upper 50 or meters of firn, and the bubbles are only sealed when ice is finally formed. Thus the time resolution of the age of the air bubbles is rather 70 years or so, depending on the site. The isotope ratio in the trapped air, for instance oxygen 18, is only a very indirect measure of global temperature and rather reflects the size of the biosphere through the fractionation that occurs in photosynthesis (Dole effect). It is not even a proxy for local temperature. Furthermore, the temperature of the air bubbles and of the ice layers is continuously changing, driven by the heat flow from the surface and from the rock. I am still wondering which 'isotopes of hydrogen' can be analyzed in trapped air (did they mean the hydrogen in the molecules of water vapor in the bubbles ?). The authors probably confused here the analysis of the past CO2 concentration in trapped air bubbles with the estimation of past temperate from the isotope ratio in ice.
This error is not relevant for the paper itself, and this paragraph is unnecessary, but it does tell me a few things: the authors did not consult with any climatologist; they feel confident enough to write about things of which their knowledge is very superficial; the editors did not find necessary the manuscript to be reviewed by someone with some knowledge about proxies.

Further misunderstandings, this time about climate models:
one such source of evidence. The principal sources of evidence for the detection of global warming and in particular the attribution of it to anthropogenic factors come from basic science as well as General Circulation Models (GCMs) that have been fit to data accumulated during the instrumental period (IPCC, 2007).

Although climate models contain parameters that may be tuned, climate models are not really fit to observations. If that were the case, the models would all reproduce perfectly the observed global trend. We all know this is not the case, and that the spread is quite large.

Summarizing the previous work of McIntyre and McKritik on the kockey-stick they write:
M&M observed that the original Mann et al. (1998) study (i) used only one principal component of the proxy record and (ii) calculated the principal components in a ”skew”-centered fashion such that they were centered by the mean of the proxy data over the instrumental period (instead of the more standard technique of centering by the mean of the entire data record)
This paragraph, and later other similar paragraphs, tells me that the authors have not really read the original paper by Mann, Bradly and Hughes (1998). MBH never used 'only one principal component of the proxy record'. The authors, again, are probably confused by what they may have read in blogs. MBH did calculate the principal components of some regional subsets of proxy records, in areas where the density of sites was very high, for instance, tree-ring sites in the US Southwest. This was done as a way to come up with a regional index series representative of that area, instead of using all series from a relatively small area and thus over-representing this area in the global network. The issue of the un-centered calculation of principal components is already quite clear (the way in which MBH conducted the analysis is not correct). But other that, the MBH reconstruction is not based on 'principal components of the proxy record'. It is based on the principal components of the observed temperature field. For the millennial reconstruction, MBH estimate that only one PC of the instrumental temperatures could be reconstructed. They never used 'only one principal component of the proxy record', they did use only the first principal components of the US Southwest tree-ring network, but not of the 'proxy record'. For instance, for the first part of the millennial reconstruction 1000-1400, MBH used an inverse regression method with 12 proxy indicators and one principal component of the temperature field. This point is so clear in the MBH paper that it really shows that McShane and Wyner actually did not read MBH98.

Further down in the paper, the authors go into the problems represented by the large number of proxy records and the short period available for calibration of the statistical models, 1850-1998. Again, they claim the Mann et al used principal components to reduce the dimensionality of the covariates:
to achieve this. As mentioned above, early studies (Mann et al., 1998, 1999) used principal components analysis for this purpose. Alternatively, the number of proxies can be lowered through a threshold screening process (Mannet al., 2008)

Again, wrong. Correctly or incorrectly, this is not what MBH did. Although, the number of proxy indicators is regionalized in areas with high proxy density through a regional principal components analysis, the way MBH deal with the risk of overfitting is by using inverse regression. This means that in their statistical model they write the proxy vector (around 100 proxies) as a linear function of the principal components of the temperature (about 8 principal components). This problem is always well-posed. MBH never conducted a principal components analysis of the global proxy network.

Section 3.2 is the core of the first part of the paper. In this section, the authors seem to propose a linear regression statistical model to reconstruct past temperatures in which the predictand is the northern hemisphere annual temperature and the predictors are the available proxy records (they use 1209 proxies). Then they argue that this model leads to overfitting and that the number of predictors (proxies) has to be restricted. They propose the Lasso method to screen the set of proxies. They compare temperature hindcasts of the last 30 years obtained using real proxy record with that obtained with simpler benchmarks: imputing just the mean of the calibration period, or assuming an autoregressive process for the mean temperature extrapolating forward, or finally, by constructing synthetic proxy records mimicking some statistical characteristics of the real proxy records. They found that the using the real proxy records does not produce a significant improvement of the hindcast than when using synthetic proxies.

Well, this result may be interesting and probably correct, but I doubt it is useful, since I am not aware of any reconstruction using this statistical regression model. The reconstructions methods for the Northern Hemisphere mean temperature that I am aware of are:
-CPS, in which only one free parameter is calibrated (the variance ratio between an all-proxy-mean index and the instrumental temperature.
-MBH, which as indicated before is an inverse multivariate regression method based on the principal components of the temperature field
-RegEM, an iterative method originally employed to fill-in data gaps in incomplete data sets
-Principal components regression (actually this method has only be used to reconstruct regional temperatures and not hemispheric means), a multivariate direct regression method in which the principal components of the target variable is written as a linear function of the proxy records.
- BARCAST, a Bayesian method to reconstruct the temperature field, later mentioned in this paper.

The closest situation to the model proposed by McShane and Wyner would be the regression at the core of the RegEM method. To regularize this regression, which indeed would consist of too many predictors, several versions of the RegEM method have been proposed (truncated total least squares, ridge regression). McShane and Wyner just mention this in passing. So I am surprised that McShane and Wyner just test and analyze a method that it is not actually used. A really useful contribution of this type of work would have been to analyse those methods that are actually used, which admittedly still present problems and uncertainties. For instance, the same test they propose for the Lasso method to reconstruct the Northern Hemisphere average could have been applied to the RegEM to reconstruct the temperate field. This would have been something interesting. Other potential problems of the RegEM method are only briefly mentioned. For instance, the RegEM requires that the missing data (the temperature to be reconstructed) be distributed at random within the data set. In the set-up of climate reconstructions this is clearly not the case, as temperature values to be reconstructed are clustered in one end of the data set.

The authors unfortunately do not go into a deeper analysis. Questions of proxy selection, underestimation of past variability (the failure of their method to reproduce the trend in the last 30 years could be perfectly due to this problem as well), the role of non-climate noise in the proxies, and finally the tendency of almost all methods to produce spurious hockey sticks, all of them are related to some degree. For instance, the presence of noise in the proxy records alone could, regardless of the statistical method used, lead to underestimation of past variations. A method based on some RMSE minimization would tend to produce reconstructions that revert to the long-term mean whenever the information of the proxies tends to be mutually incompatible, and would only produce the right amplitude of past variations if all proxies 'agree' . Two schools of thoughts depart here. One, represented for instance by Mann, attempts to use all proxy records available and design a statistical method that some how can extract the signal from the noise. The paper by McShane and Wyner also fits within this school, trying to apply the Lasso method for proxy selection. It fails, according to the authors, but this may be due to characteristics of the Lasso method that render it inadequate or perhaps due to the impossibility of designing a statistical method that can successfully screen the proxy data. It is not clear what the real reason is.
Other idea, portrayed in our recent paper argues that another way forward is to select good proxies, for which there is a priori a very good mechanistic understanding of the relationship between proxy and temperature. Here, the observed correlation between proxy and temperature is secondary and proxies with good correlations would be rejected if the mechanistic understanding is absent or dubious. Once a good set of proxies is selected, with minimal amounts of noise, any method should be able to provide good reconstructions.

The last part of the paper is related to the estimation of uncertainties, mostly by setting up a Bayesian method. As far as I could understand, their method is a simplified version of what Tingley and Huybers or Li et al have already put forward. The main difference, they note, is that these latter authors have only conducted methodological tests with pseudo-proxies, and not produced actual reconstructions. This is the part I most agree with, but their conclusions are hardly revolutionary. Already the NRC assessment on millennial reconstructions and other later papers indicate that the uncertainties are much larger than those included in the hockey stick and that the underestimation of past variability is ubiquitous.

Almost at the end of the paper they include a paragraph that either has been misunderstood by the authors themselves or by the in the blogosphere:
Climate scientists have greatly underestimated the uncertainty of proxy-based reconstructions and hence have been over-confident in their models

It is not clear to which models the authors are referring to. If they mean 'statistical models' to reconstruct past temperature, I would agree with them. If they mean 'climate models', they are again dead wrong, since climate models and climate reconstructions are so far completely separate entities: climate models are not tuned to proxy-based reconstructions, and proxy-based reconstructions do not use any climate model output.

In summary, admittedly climate scientist have produced in the past bad papers for not consulting professional statisticians. The McShane and Wyner paper is an example of the reverse situation. What we need is an open and honest collaboration between both groups.


Georg Hoffmann said...

Excellent and clear review, Edu.

Check the last link to your paper. There is an error.

On another web side Martin Vermeer mentioned another possible problem which I've found quite interesting.

Basically they calibrated on mean northern hemisphere temp instead on the leading EOF (ie the climate change pattern of warm land and warm high lats). This gives a strong bias to high lats proxies which is why their reconstruction looks a bit like Kaufmann et al 2009 for the Arctic.

What do you think?

Anonymous said...

Thank you very much.


bernie said...

Interesting review. It is a shame that someone intimately familiar with proxies was not involved in writing the paper. The link to your paper did not work for me.

eduardo said...

sorry. I have fixed the link to the Frank et al. paper

Anonymous said...

"They claim that the uncertainties are much larger than those included in the 'hockey stick reconstruction' and that the shaft of the hockey stick is rather an artifact of the method. These conclusions are, however, hardly new. The flatness of the shaft is actually only defended by Mann et al. and more recently in a much weaker fashion then 10 years ago."

So you agree with the point of the paper: Manns proxies do not contain enough "signal" (temperature), therefore his model (They do not mean GCMs)of the flat stick has no meaning. Present temperature slopes are not necessarily unusual.

They used their own method to prove that Manns model(his use of statistics) didn't do the job he claims they do.

You, like many others (The RealClimate team has started to back off too)have now decided to throw Mann under the bus. Everything else you said about statistics professors not knowing "climate science", whatever that is, was simply not necessary.


PS its McKitrick, not McKritik, although he is a critic of sorts.

PPS Also see Jeff Ids latest treatise on the temp signal in the proxies: SNR is way too low.

Anonymous said...

I was one who requested an opinion on the McS&W2010 paper since I believe it is a step in the direction of science sustainability, which this blog espouses.
You provided it so, thank you.
I may have been harsh in my comments above, but I have followed the Mann escapades for several years now, and believe Mann is a mediocre scientist, who rode his hockeystick to fame and fortune.

I have not read your paper, but will do so now.
I enjoy this blog
und lese es auch gene in Deutsch.


PolyisTCOandbanned said...

1. Ed, thank you for the long note. I value your trained AND fair view of things. You have great background in both the history and the statistics. Especially on the statistics, I feel the need for those of us who are "Internet junkies of the drama" to ask an expert. But I know to come to you for something like this and to James Annan for hypothesis testing and trend standard deviations! ;)

2. I wrote a quick, surfacey review of the paper on The Policy Lass website.

Had a similar impression that the authors did not have a good handle on the history of disputes and the methods of Mann (a shame). I do have a little sympathy for them in that MBH especially is hard going (why not show a process flow diagram for instance?) That said, sometimes you have to suck it up and crawl through things. Also relying on Wegman was a bad idea. Not only is he polarizing, but he hasn't stuck in the field and the authors would learn more, better by looking at primary papers.

A couple things that I like in the paper--just general concepts, not necessarily the treatment:

A. I think of the problem as one of data-method "space". So, here I like their trials with variaous different types of noise and proxies. I dislike the practice of picking one style of noise, doing a test and then giving a result. For McI, the noise will be very harsh. For Mann, gentle. But they sorta bury the dependancy of type of noise on result and just concentrate on showing the result (the same flaw comes in the MBH paper, but here with the issue of defining relevant variations and glossing over this as a crucial factor). A real curious, agnostic, scientist would just catalog the space, first, then perhaps make an argument over which noise is reasonable. The good thing is we can then find something to agree on in the sense of an if-then dependancy (e.g. if the noise is this harsh (or this gentle!), then the result is bad (or good!). But we can really survey the situation.

In this vein, I liked the Burger and Cubasch full factorial of Mannian method choices. But...well...we live in the computer era, so you can really think of it as a space of variaous method choices and various types of data shapes in terms of results we might get (whether good results of signal amplification or bad ones of noise cancelling or what have you.) Also, your challenge to McI of what is a bad apple MATHEMATICALLY is relevant (and as I told him, badness and appleness are different concepts!)

B. I'm very stuck on this issue of what I called "wiggle matching" a while ago. I think it's what you refer to as degrees of freedom. In that vein, I like the idea of challenging the proxies with shorter holdout instrument blocks and all through the time. Just mathing a trend or two, gives me low confidence in a proxies physical sensitivity, in it's prospect for out of sample predictions. But with a "lot of wiggles", then I feel better. And really...climate DOES HAVE WIGGLES (look at the recent slowdown, the 1998 excursion, etc.) While those might vex us in debates with skeptics over "global warming stopping", they are to our benefit when wanting to validate proxies as sensitive. I'm kinda straining my brain, but I thought someone (Burger in COTPD, or perhaps you or Juckes) talked a long time ago about doing RE/CE looking at many periods (even non-contiguous selections).

With this in mind, though, I think we need to do LOCAL calibration and verification. In theory, proxies may teleconnect some, but by glossing over local to global, I think we lose a lot of the ability to "wiggle match" because of smoothing by averaging around the globe.

3. Overall, really agree with you, that it's a shame these guys did not do a bit more homework in order to make a real contribution.

PolyisTCOandbanned said...

Correction to para 2.A (first): should be "MMH", not "MBH". I'm referring to the key issue of definition of standard error on which much turns (but which key concern is buried and just the choice the authors advocate presented as the only natural way.)

PolyisTCOandbanned said...

I think the Frank paper really deserves it's own thread.

It would be interesting to see some more try at your approach. To start with, I would love to see a nice "compare and contrast" of proxies. I remain stuck on looking at Kim Cobb's graph of her coral versus local temp and how really nicely it wiggle-matched over long periods. Then looking at Tiljander varves recently, I had the impression they were even worse than treeline ring width. Previous reviews by IPCC, etc. are too kind in just citing the different researchers, rather than digging into they issues of which proxies are better than others. I know eventually, you would want to do that within classes (e.g. this dendro series better than that because of more careful collection or higher elevation or whatever), but to start with just a simple compare and contrast would be nice.

On the other hand, I actually love the idea of trying different approaches. I remain pretty skeptical of Mike's proxyhopper, super signal-mining methods. That said, if we did not have him, we would have to invent him! Someone needs to be trying these elaboarate approaches...perhaps they can eventually be validated. Maybe if he got his publicity-seeking, uni-hopping ego-scientist bit under control...really embraced becomeing the uberuberuberstats boi. Publishing in the most specialized journals, very long papers. Concentrating on method. NOT on result. Eschewing the "press release" of PNAS/Science/Nature papers.

Anonymous said...

you say -
'climate models are not tuned to proxy-based reconstructions'

then why do climate modellers defend to the end Mann et al?

Jeff said...

Nice job, it's very clear and from the parts I'm familiar with, correct.

Jeff said...

It actually made me a bit crazy that they left out any discussion of variance loss while citing your paper. Their method will suffer the same fate as any of the others.

I've got a new post up at CA, if you get a moment, I'm curious as to your take on it. I didn't expect the result I got.

Andrew said...

It is interesting you say that they do not infer temperature from the isotopes in air bubbles as that appears to be exactly what some scientists have done,

Abrupt Climate Change at the End of the Last Glacial Period Inferred from Trapped Air in Polar Ice
(Science, Volume 286, Number 5441, pp. 930-934, 1999)
- Jeffrey P. Severinghaus, Edward J. Brook

"Nitrogen and argon isotopes in trapped air in Greenland ice show that the Greenland Summit warmed 9 ± 3°C over a period of several decades"

Andrew said...

BTW, why is there no collaboration with Computer Scientists for the climate models as it appears the climate scientists I have spoken with do not seem to understand the limitations of computer systems let alone understand how they work in general?

Magnus said...

Don't miss this debunking of the paper:

John Mashey said...

Andrew @ 14

So who do you/did you talk to?
I used to be Chief Scientist @ Silicon Graphics in the 1990s, and
I'm a computer scientist who used to design supercomputers, used among other things to run climate models. I've spent a lot of time with people in high-performance computing in many disciplines and specifically spent many hours with folks from/at NCAR, GFDL, NASA.

Having given them all lectures about forthcoming computer designs, and answering many questions:
1) I think they had some idea how computers work.
2) They were all too aware of their limitations.
3) Oh, and these groups had some pretty good computer people, whether or not they had CS degrees.

Your claim does not jibe with my experience, so say more about your sample.

Georg Hoffmann said...

McShane are referring to the isotopic composition of O and H
"The temperature of that air can be approximated by measuring the ratio of ions and isotopes of oxygen and hydrogen."

No way to reconstruct temperatures from these two elements in air.
Severinghaus measured the isotopic composition of some inert gases along rapid T transitions (younger dryas and such). There are gravitational and temperature dependent effects fractionating the isotopes of N and Ar in the firn. So there is a signal corresponding to the length and intensity of the corresponding rapid T transitions. But it's not a full T reconstruction along the entire core.
This is done (as Eduardo explained) by O/H Isotopes in the ICE, ie in the solid phase, and it's not the Temp in the air but the Temp where condensation took place (ie the clouds).

ghost said...

nice review... or better: devastating review.

Hm, my question to last sentence: What we need is an open and honest collaboration between both groups.


I could imagine, statistician may not earn a lot of laurels in the application of methods in climate science. So, the motivation may not be high.

As solutions I could imagine different points:
1. in reviews of such papers: as Eduardo said: reviewers from the science and statistics fields. That is probably hard for a statistics journal, but for journals like Nature or Science or so it should be the standard. Same for the IPCC review process.

2. collaboration projects as start, maybe also funded by DFG or NSF or so. I think the need is clear now, many reports required that (the NAS report about MBH98, the CRU reports). I think, there is a start already.

3. in long term: better statistics courses for science students? Could this be improved?

4. visiting the neighbor building more often ;) Many universities have a math department.

Anonymous said...

You should be aware that the version you are reading is the first version submitted to the journal, not the final version. It is likely that some of the points you raise will be addressed in the final version.
The point about the 1 principal component is worded wrong - they should have said that the MBH result is dominated by the first PC, that's clear from the MBH graphs. I can't see any justification for your claim that they have got their info from blogs.

Your last point makes no sense at all. It is quite clear from the sentence that they are talking about the proxy reconstructions

Anonymous said...

"3. in long term: better statistics courses for science students? Could this be improved?"

hahaha: Is the Pope catholic?

Anonymous said...

" For instance, for the first part of the millennial reconstruction 1000-1400, MBH used an inverse regression method with 12 proxy indicators and one principal component of the temperature field. This point is so clear in the MBH paper that it really shows that McShane and Wyner actually did not read MBH98."

MBH98 only starts from 1400; they did no analysis in that paper of 1000-1400. Maybe you are thinking of MBH99 ?

you will note that I have refrained from any suggestion that making an error suggests you haven't read the paper.

Anonymous said...

"climate models and climate reconstructions are so far completely separate entities: climate models are not tuned to proxy-based reconstructions, and proxy-based reconstructions do not use any climate model output"

If I were to make a climate model, I would automatically "tune" it to produce at least some resemblance to the historical / accepted data. Wouldn't any model that failed to do that be automatically proved dysfunctional and never get any use? Fiddling with different component multipliers etc until the model resembles the past climate data is "training" the model...

Anonymous Fred said...

#Anonymous 1.53pm

"If I were to make a climate model, I would automatically "tune" it to produce at least some resemblance to the historical / accepted data"

Good job you aren't then. The point of climate models is to have a physics-based model of the system that can be *tested* against the changes that have actually occurred.

Anonymous said...

I figured it all out!!'
1000 years ago, earth was overpopulated by animals and they produced a lot of methane. Apparently methane is a strong GHG. So it became warmer. Then man came and killed the animals. And then it became ice cold..

Obviously, without any test results from actual experiments, the above statement is just as illogical as to claim that CO2 causes modern warming.

John Mashey said...

ghost@ 18
The "climate scientists and statisticians don't talk to each other" meme really got started by the Wegman Report. It wasn't true then, it isn't true now, and the professionals know this. See comment @ Deltoid for a good example, the 2007 ASA meeting at NCAR.

Anonymous said...

@Eduardo Zorita

You explain: "Although climate models contain parameters that may be tuned, climate models are not really fit to observations. If that were the case, the models would all reproduce perfectly the observed global trend."

I have two questions:

We can often see comparisons between climate model outputs and the real temperature record.

Sometimes those graphics are nearly identical. That looks very suspicious to us laymen.

Here is one perfect example:

1) Why does it always only correlate afterwards?

2) Are models only tuned or is it true that the whole "knowledge input" of the climate models comes from real world observations?

Best regards

Hans von Storch said...

John Mashey/25 - if somebody claims a good interaction between profesisonal statisticians and climate scientists, that is one thing; if that is true is another. Wegman certainly had not really tried ot interact with climate scientists. He mostly tried to use his authority among statisticians to impress NRC and politicians.

In the series of "International Meeting on Statistical Climatology" we have over the years tried with limited success to bring the communities together; bringing "real" statisticians into the process did not often result in real successes (even though there were a number of successful imports), mainly because most found it difficult to understand the specifics of climate science (such as inability to do experiments; the ubiquituous dependence across time and space). The NCAR effort was in the beginning mostly an effort by some Bayesians to bring their view on the stage, without being able to provide good examples - such as data assimilation and parameterizations; this has improved somehow. - Hans von Storch

Anonymous said...

climate models are not tuned to proxy-based reconstructions,

What? Then how do they tune them? Randomly? Why would they not use the proxies?

Anonymous said...
This comment has been removed by a blog administrator.
John Mashey said...

Dr von Storch:

I don't think we disagree.
Along the spectrum of:

1) Statisticians and climate scientists do not interact.

2) At least some do, some of the time, and fruitfully.

3) There are enough climate-knowledgeable statisticians to work with all the climate scientists.

I would never assert 3), as there are rarely enough, as I wrote in CCC, last few paragraphs of Appendix A.10.4.

I often quote Jim Berger's 2007 NCAR talk, p.19, which seems to have informed comments along these lines.

Similar problems happen in other disciplines, where more statistical expertise would be useful, but is structurally difficult to achieve. I was appalled to give computer performance/architecture lectures at two of the world's top universities and discover that only 50% of the (mostly grad student) audiences had ever had a stats course... That may be fixed by now, but intro statistics courses are often oriented to social sciences. Students looked at the syllabus, saw nothing that seemed relevant, and skipped them if they could. At least some places, either Engineering or Physical Sciences create their own statistics courses or work hard with the Statistics department to create more appropriate ones.

Meme 1) gets repeated over and over, and I simply assert 2) to say 1) is wrong, not that 3) is right!

Andrew said...

John, I know who you are but I have to disagree as I do not see the computer science expertise existent in these groups. "Computer people" are not Computer Scientists as you well know. I fail to see the proper collaboration in the computer climate modeling field with properly trained Computer Scientists and Engineers. If you are aware of which groups have those properly qualified on staff I would interesting in knowing about them.

Specifically I spoke with various climate scientists (not going to name names) but they had no idea what they were talking about.

Andrew said...


I was commenting on Eduardo's statement,

"past temperatures are reconstructed from the isotope ratio of water molecules in the ice, and not from air in the air bubbles."

I was not commenting on the validity of which isotopes to use.

itisi69 said...

"Wegman certainly had not really tried ot interact with climate scientists. He mostly tried to use his authority among statisticians to impress NRC and politicians."

Pot and kettle. McIntyre was never invited by *any* committee and cimatologists are the first using their authority whenever someone outside their community is trying to cast doubt on their theories/models, "he's just a statician and has no clue about physics, climate, ... fill in yourself"

I object your use of the words "use authority and impress". NAS, under oath, admitted Wegnman used the correct methods, you call this use of authority, I call this expertise.

John Mashey said...

re: #31 Andrew
You seem to be claiming the non-existence of computer scientist/software engineers or of climate scientists with the relevant subset of CS/SE skills they need, at such sites.

I am simply claiming some exist. While it's been a few years, I'd certainly say NCAR, GFDL, various parts of NASA, LLNL to pick a few, either had some pretty good CS/SE folks, or scientists who could do well enough. I certainly had lots of discussions about computer architectures, memory organizations, how they interacted with algorithms, OS & compiler issues, problem partitioning, computation-vs-communication ratios, etc.

Skillsets are not distributed uniformly. Nevertheless, most HPC sites who do their own software tend to have at least some good CS/SE folks around. I've visited many HPC sits and I'd say the climate folks were certainly no worse than average, might even have been a little better.

Of course, the SE practices were not akin to those of MSC NASTRAN or SAS or telephone switches, but then they shouldn't be. When I was helping teach the software engineering project management course at Bell Labs, one of the messages was to use CS/SE practices appropriate to the project, ranging from minimal to vast, to keep people from overdoing it. We saved our heavy-duty SE for things that needed it.

jgdes said...

Ok so it's common knowledge that Manns methods were rotten and should not have been used by the IPCC to promote an "unprecedented warmth".

So.....why do these selfsame scientists still defend him, his methods and his discredited result so much? And why assume his new methods are any better than the last? It appears on closer inspection that in fact they are every bit as bad.

And do we need stats anyway? If you look at the first graphic here:
then you can clearly see, with Mark 1 eyeballs, that MBH98 was an absolute outlier that should not have been trusted, even less with a novel, unpublished and clearly unreviewed technique. Obviously current warmth is not unprecedented, or anywhere near it, worldwide.

That plot of course recalls the Soon paper that Hans Von Storch resigned over apparently because those well-known, dodgy statistical methods by the outlier Mann told him that the true result was a hockey stick and therefore any conflicting paper must be wrong. Hugely ironic that the reason given for that mass resignation was the publishing of "bad science". A little humility about that would be nice since it now appears there was far more intellectual integrity and rigour in Soon's paper than Mann and all his copycats.

Clearly though it is quite easy to overturn an old consensus. All you need to do is lay the blame for some problem on modernity and all those nice benefits we enjoy from an industrial society that actually pays for all this science in the first place....and incidentally makes it possible to be so free to be so green too.

Hans von Storch said...

Jgdes - the world is not black and white - so it is possible that the Soon and Baliunas-paper represented bad science and the MBH/Hockeystick was also not in order. S&B was not disqualified because of the hockey-stick but because of a string of other arguments, see EOS article.
Our stepping down at CLIMATE RESEARCH was because we had a case of bad editorial practice (failure of review), while there was no blaming on Soon and Baliunas, because submitting immature papers happens and is permitted. Same with MBH98, was immature, and should have rejected by the review process. Seemingly also the article, which this thread is about.

I had in those days documented what has happened. You may read it on my web-page.

ghost said...

if bad papers had survived the review process, than it is a problem, but not a big problem. Sometimes bad papers spark even new research or they will be forgotten. Not sure, but MBH98 was one of the earlier tries to combine a lot of different proxies with fancy new methodes. It "inspired" in many ways. I think. And it is still around. S&B was not important at all. It was just a political paper by think tank lobbyists. It's forgotten.

But coming back: the problem is not the paper but the icon-making of it and fierce, not always fair fight for it because of this. It happens to often in climate science, öhm, politics. However, I think, in real climate science it has already changed to the better since the TAR and following fallout. The AR4 was much better, and Hans and Eduardo wrote already about the decay of the hockeystick in 2005?.

But, on the other hand, the MBH got much support because of wrong accusations of fraud and conspiracy against Mann, Jones and co. Lies and filthy insinuations (also from "hero" McTyre) against them did not help to make the discussion rational. I think without it: the discussion would have been better and faster.

However, IMHO: science and science politics improved , maybe catalyzed by the emails. I have the feeling the "cold war" (did we have this comparison before? ;)) in climate science could be over. And I think: the McShane and Wyner paper is a cold war paper from yesterday.

eduardo said...

You explain: "Although climate models contain parameters that may be tuned, climate models are not really fit to observations. If that were the case, the models would all reproduce perfectly the observed global trend."

I have two questions:

We can often see comparisons between climate model outputs and the real temperature record.

'Sometimes those graphics are nearly identical. That looks very suspicious to us laymen.

Here is one perfect example: '

A more correct answer would require a whole weblog - perhaps in the next days. But to cut a long story short, climate models are very long and complex pieces of software, typically several hundred thousands of code. They indeed contain phenomenological parameters to represent processes that are not well resolved. For instance, in the cloud physics routines , one has to somehow describe the distribution of droplet size. As droplets formation and evaporation is not explicitly simulated, these parameters would describe the 'bulk' behaviour. The correct value, if there is one, is not very well known. Actually a large part of difference in the behaviour of climate models - why some models have larger sensitivity than others- stem from the choice of these parameters. It is however, very difficult to 'tune' these parameters to obtain a prescribed history of the global temperature - models is just too complex to achieve that. If you look at the corresponding chapter in the last IPCC report - chapter 8, you will see that the deviations among climate models at regional scales are not small.

The agreement you see in the evolution of the global temperature is in part due to the choice of 'external forcing' to drive the model, basically how much cooling the tropospheric aerosols would have caused. In that choice there was indeed some tuning involved. But this tuning would allow you to close the gap between simulated and observed global temperatures, which is a very aggregated measure, but not really when you look at other aspects such as regional temperatures or amplitude of annual cycle, or precipitation, etc.

'knowledge input' is input is a but difficult to define. Models are based on the physical equations of fluid motion, phase changes, and so on. But due to the limited resolution and computing power, they have to be augmented with heuristic or semi-empirical parameters. But they are certainly not simple statistical models tuned to the observations. This is usual in many other modeling areas, engineering etc. The difficulty here is that it is not possible to conduct experiments to get a good handle on those parameters.

eduardo said...

'What? Then how do they tune them? Randomly? Why would they not use the

Do you mean proxies as local temperature archives ? it is impossible to tune a clime model to mimic a temperature history in a certain location on the Earth (see my previous comment). You could change the values of certain physical parameters, but this change would affect all grid cells, and you could change the simulated temperature only indirectly, affecting at the same time many other aspects of the simulation. Even if it were technically possible, one would need many simulations, trials and errors, over several hundred years. This is not feasible. As illustration, we can simulate about 10 to 20 years of climate in one day. To simulate 1000 years we would need 1 or 2 months in a very powerful computer. There are very few simulations over the past millennium with 3-dimensional models. The ones we did a few years ago required one whole year

richardtol said...

I do not think that your review does justice to the paper. Most of your commentary is about their literature review. Most of their paper is about a method for predicting temperatures from proxies.

You dismiss their method as not useful because it has not been used before. You may want to reconsider your logic there.

Lasso regression is an appropriate method for a case like this (p>>n).

Lasso is a statistical method developed by statisticians and published in a statistical journal. Pedigree is not everything, but it is something.

Their main conclusion, that proxies do not contain much information about temperatures, is well supported.

jgdes said...

Dr Von Storch
I need hardly point out that it was Mann and the rest of the team who wrote the EOS comment: ie The pot calling the kettle black. Much criticism even seemed to hang on just how you describe "climate". It seems there is a fashion to equate it to only temperature. Soon adroitly pointed out that he was looking at various climate proxy types to determine climate change in the more traditional sense, ie a change to long term weather. Surely if other climate proxies don't change much with temperature then it is an important finding.

This conflation of climate with temperature allows all sorts of nonsense and trickery, such as claiming weather is changing when it is only temperature that has changed. I rather wish everyone would just stick to "global warming" for temperature and keep climate change as long term weather change.

As it turns out, McIntyre ha pointed out that the team are just as guilty of doing the same thing they accused S+B of doing, here:

Lets be honest though - the juxtaposed treatment of Mann and Soon tells us that you are far more likely to be criticised for being politically incorrect than just plain incorrect. Since Mann was the clear outlier and S+B merely represented the old consensus opinion, per Lamb an others, it is even more startling.

Hans von Storch said...

jdges - The EOS article was indeed coauthored by Mann and others - this does not disqualify the arguments put forward. The EOS article demonstrated two things - first, there were serious questions which should have raised during a professionally done review; second, for me most of the raised issues were valid. In those days, in 2003, I did not respond to complaints by Phil Jones (see e-mails), but only after the arguments were put forward in the EOS article (among others from Phil Jones), where I could read them and think about them.

My jugdement of S+B is independent from my judgement about MBH (and vice versa) - which, on the other hand, we made rather explicit on other occassions. I take the liberty to judge the arguments, whether they are from Steve McIntyre or from Michael Mann.

Georg Hoffmann said...

"The agreement you see in the evolution of the global temperature is in part due to the choice of 'external forcing' to drive the model, basically how much cooling the tropospheric aerosols would have caused."

This is often said and I know that there is this Science (?) article discussing the IPCC results in this sense. However allways when I asked one of the tuning guys of the different models they claim that they never modified forcing or sensitivity to aerosols. So is this an assumed "subconscious" tuning or tuning at all or even a documented tuning strategy (paper? pers.comm?).

Anonymous said...

You claim that there was no communication with climate scientists on this piece.

Let me ask you, why need there be a discussion when they employ Mann's own data?
As stated above, climate scientists are not known for stats, and Mann 2008 already represents the most clear interpretation of the proxies.

Sure, we can set up a straw man such as air bubbles and isotope ratios. But what do we gain?

The stats look good people

eduardo said...

Hallo Georg,
basically it is this paper by Kiehl. He found that in the IPCC simulations of the 20th century, the total anthropogenic forcing is negatively correlated with the model sensitivity. As the uncertainty in other anthropogenic forcings is small, it is concluded that the aerosol forcing anthropogenic forcing used in each model compensates for differences in the model sensitivity

GEOPHYSICAL RESEARCH LETTERS, VOL. 34, L22710, doi:10.1029/2007GL031383, 2007
Twentieth century climate model response and climate sensitivity
Jeffrey T. Kiehl1
[1] Climate forcing and climate sensitivity are two key factors in understanding Earth’s climate. There is considerable interest in decreasing our uncertainty in climate sensitivity. This study explores the role of these two factors in climate simulations of the 20th century. It is found that the total anthropogenic forcing for a wide range of climate models differs by a factor of two and that the total forcing is inversely correlated to climate sensitivity. Much of the uncertainty in total anthropogenic forcing derives from a threefold range of uncertainty in the aerosol forcing used in the simulations.

eduardo said...

Dear Richard,

There are several things that I didnt like in m&W paper and a few things I liked (more). What I didnt like was the following:

-Critic of the MBH papers, but explaining incorrectly what MBH had done. I would have nothing against a critic of MBH (perhaps a bit outdated), or of any other paper for that matter, but in that case they should recount correctly what MBH did and explain why it was wrong. They criticize a method that was not MBH.

-Their conclusion that proxy contain no useful information to reconstruct past climate. This is based on their application of the Lasso method to the real proxies and some pseudo-proxies. I dont think they can reach that conclusion. Is the Lasso method the best method of all? . In which sense is the Lasso method better that the methods applied by climatologist ? I would have welcome a proof under controlled conditions - for instance using pseudoproxies from climate simulations - that the Lasso method is much better than many other methods. But they only state that the Lasso method seems adequate in conditions p>n. . Well, I think this is too weak too conclude that the proxies contain no information. For instance, the Lasso method , as a method to screen the covariates, gives a strong weight to some covariates and almost no weight to others. Is this situation applicable to the proxies?. Why is it not possible that all proxies contain a weak signal? Why is the Lasso method better than RegEM ? I am not saying that it isnt, but why should the reader believe it is?

Also their statistical model in the first place is not really a natural model climatologist have applied. In their model, one predictand, the northern hemisphere temperature, is assumed to be a linear function of the predictors- the proxies. I am not aware of any paper in climatology that assumes this model.
So, in this first half, they build up a straw man, which does not exist, just to tear it down. I would have preferred that they prove that this straw man is better than the 'real men' in the first place.

- The continuously refer to 'the methods of climatologist', but they actually mean MBH ( or actually what they think is MBH) . The MBH paper was published in 1999, that method was applied just once, never before and never later.

richardtol said...

Just ignore the spin. They are trying to backcast temperature from proxies. Just ignore the disciplinary labels. Focus on what they did.

The Lasso method seeks the best-fit linear and parsimonious model. It is therefore an objective method to find the best proxies. They conclude that the best proxies are not very good, while worse proxies are worse.

You may not like the penalty in Lasso, but it is (a) estimated rather than calibrated and (b) consistent with information theory under normality.

You write you do not like linearity. If they had use linearizable models instead, I guess they would have found even less evidence that the proxies are any good at back-casting. That's easily tested. The algorithms are freeware, and the data can be downloaded.

eduardo said...


the conclusion that proxies do not contain information of past environmental conditions is quite dubious and actually inconsistent with many other lines of evidence. For instance, are we going to deny now the existence of the Little Ice Age ? That would be the direct consequence that 'proxies are not useful to reconstruct past climate'. Here I see a glaring inconsistency in the reasoning of the so called skeptics: on the one hand the climate is now recovering from the Little Ice Age, but on the other hand the proxies are not capable of delivering information about past climate. How do we know there was a Little Ice Age?. Well, from proxies.

The obvious conclusion to me is that the proxies do contain information about past climate, but that the Lasso method within the M&W set-up is unable to pick it up. One illustration is the dendro-chronology from Tornetrask in Northern Sweden, near the Arctic circle. The trees there are clearly limited by the temperature during the growing season -they are actually almost stunted trees. The correlation of tree-ring width or wood density of those trees with the local growing-season temperature during the 20th century is as high as 0.7. In contrast, the correlation with the global annual mean is probably negligible, and I am not surprised that any method interrelating both would not find any relationship. Eah proxy has its own characteristics. The question in paleoclimatology is currently how to combine those local and seasonal pieces of information, which undoubtedly exists, into a global or large-scale picture, and for that we need good methods and a lot of expertise from many different fields

jgdes said...

"Their conclusion that proxy contain no useful information to reconstruct past climate."

The missing word there is "global" between "past" and "climate". Many proxies are still very good for local climates.

But this conclusion was the widely held opinion of the paleo community before 1998. I believe even Mann and Briffa were quoted saying the same.

In those halcyon days, it was widely, and correctly, understood that you cannot compensate for sparse and patchy data by any statistical black box. This valid criticism surfaced again with Steig and Mann's Antarctic reconstruction where they just infilled huge tracts of land from a few compromised datapoints, mostly from the peninsula that was well known to be warming for entirely individual and localised reasons. Once again this poor work was given front page status, proving that little has actually been learned by the major journals or by "team" peer reviewers.

At some point we will presumably return to the harsh reality that the data is unfortunately not good enough to do a correct global reconstruction, even if such a metric is valid or useful in the first place.

richardtol said...

M&W find that proxies contain little information about the instrumental record. This conclusion is indeed limited to their method (Lasso). As Lasso is one of the more powerful methods around, I took the liberty to generalise their conclusion. I should not have.

The onus is now on people who do want to use proxies, to show that their methods are more powerful than Lasso.

A physics-based selection of proxies is less powerful than Lasso, as Lasso would consider that selection too.

Reference to local climate is a cop-out, as the task at hand is a reconstruction of continental and global climate. Note that M&W also did a Lasso on each of the grid-box temperatures.

Note that they also used principal components to reduce the dimensionality, and reached the same conclusion.

If you don't believe their results, your job is to show why their method is wrong.

Anonymous said...


Thank you very much again. I was waiting for more information before I wanted to answer. But maybe you're not able to do it because it would need a whole new weblog? ;-)

What bothered me most about the "tuning" of models was for instance what Mojib Latif said on a conference:

From Roger Pielke Jr.'s blog:_

"Latif predicted that in the next few years a natural cooling trend would dominate over warming caused by humans."

"People will say this is global warming disappearing," he told more than 1500 of the world's top climate scientists gathering in Geneva at the UN's World Climate Conference.

"I am not one of the sceptics," insisted Mojib Latif of the Leibniz Institute of Marine Sciences at Kiel University, Germany. "However, we have to ask the nasty questions ourselves or other people will do it."

Why did this paper only appear after "the cooling" had happened?

You will always find one run in climate models explaining this or this behaviour of the climate system.

Telling us afterwards that the models were again right is (in this case) not a scientific approach.

If a cooling (or anti-warming) trend could happen now for the next 15 years, it could happen again after this time period.

By these means one could just as well explain every kind of warming or every kind of cooling.

So my question remains, if these exact fits between models and global temperatures have anything to do with real model "forecasts" that are not tuned afterwards, or if any climate pattern could be explained afterwards with the same model "forecasts"?

Lets say: As long as any kind of warming continues the models were right?


John Mashey said...

You'd written:
"The introduction already contains a terrible and unnecessary paragraph, full of errors:"
and you were certainly right...
but worse, it appears virtually certain MW derived that paragraph from the Wegman Report, specifically pp.10-11, then fabricated the citation to Bradley(1999), from which the various gaffes did *not* come.

This appears in various places in the discussion of Deep Climate on M&W, but summarized here:

1)“Artifacts” is a very odd word usage. People do not normally write of tree-ring growth patterns, coral growth, ice-core data this way. WR: p.10.

2) “Ions and isotopes of hydrogen and oxygen” uses the WR’s meaning-changed miscopy as “ions” is not the same as Bradley’s “major ions.” WR.p.11. Amusingly, the other discussion in WR or ice cores get it more or less right (although perhaps because they are mostly cut-and-paste from Bradley).

3)"Speleotherms” is an uncommon misspelling of the standard “speleothems.” The WR miscopied, MW fixed it, but wrongly. WR p.11.

4) Finally, MW repeats the WR’s misspelling of Bradley’s book as “Quarternary” in place of “Quaternary.” WR p.53.

So, that's one oddity and 3 gaffes, all from the WR. Sadly, they did not pick up on the WR's invention of "phonological records" on p.11. [i.e., miscopyibng "phenological".]

5) However, they did add one unfamiliar to me, but then I'm no plaeoclimatologist. MW says:
"Since reliable temperature records typically exist for only the last 150 years or fewer, paleoclimatologists use measurements from tree rings, ice sheets, *rocks*, and other natural phenomena to estimate past

Rocks? Do you know of any studies that use *rocks* as temperature proxies for anything in the last 1000 years?

eduardo said...

Dear John,

probably they refer to borehole temperature profiles, see for instance here .

From the deviations from the linear geothermal gradient in boreholes one can estimate past temperatures. The resolution in time gets coarser back in time due to the thermal diffusion, very roughly from 200 years resolution in the past millennium to 20 years resolution in the recent decades.
This type of 'proxy' is different from the others and interesting, because it does not need to be statistically calibrated. It represents in theory a direct measure of past temperature, blurred of course by other process like lateral heat diffusion, changes in land cover, etc. So it really does not belong to the type of problems this paper was trying to assess

eduardo said...

Two comments on the McShane and Wyner manuscript are now available on line. One of them, by Martin Tingley, shows in a formal way that the Lasso method is not optimal and that the results obtain by McShane and Wyner are better explained in terms of these methodological limitations, rather than by a lack of climatic signals in the proxies. In particular, in situations in which the target signal is spread equally over a large number of predictors, the Lasso method is inferior to a simple averaging of all predictors. Martin's argument is along my views scattered over my previous comments.

John Mashey said...

re: #53 Eduardo

Thanks, I'm familiar with boreholes, and at first thought that was what they meant, but given all the other science problems, I wondered.

Deep Climate found the "rocks" in Wikipedia, which starts:

"Paleoclimatology (also Palaeoclimatology) is the study of climate change taken on the scale of the entire history of Earth. It uses records from ice sheets, tree rings, sediment, corals, shells and rocks to determine the past state of the climate system on Earth."

McShane, Wyner:
"Paleoclimatology is the study of climate and climate change over the scale of the entire history of earth. A particular area of focus is temperature. Since reliable temperature records typically exist for only the last 150 years or fewer, paleoclimatologists use measurements from tree rings, ice sheets, rocks, and other natural phenomena to estimate past temperature."

Basically, the first few sentences of MW are an interleaved blend of ideas and words from the Wegman Report plus 2 Wikipedia pages, the other being this one.

See Deep Climate on MW, p.2, where DC integrates all this side-by-side, with the cyan/yellow highlighting to show identical text/ trivial changes.

John Mashey said...

Oops, and Strange Scholarship in the Wegman Report has an Appendix A.12 on McShane, Wyner, since it was in many ways a remake of the Wegman Report.

I mention your post p.97, and quote you pp.102-103. Useful, thanks.

Hank Roberts said...

Noting a typo in the opening post: