Friday, August 23, 2013

Hans von Storch and Eduardo Zorita: on our paper on stagnation and trends

We have recently uploaded to a manuscript, coauthored by us two and two others, with the title 'Can climate models explain the recent stagnation in global warming', in which we compare the magnitude trends in the global mean temperature recently observed - trends in the last 10 years and the trends in the last 15 years (1998-20012) - with the ensemble of trends simulated by climate models participating in the Climate Model Intercomparison Projects CMIP3 and CMIP5. Recent trends as low or lower as those observed in the HadCRUT4 data set, of merely 0.4 C/century, are reproduced by at most 2% of the scenario simulations. Also two other analyses of the development of global mean temperature have been considered, with a higher trend of 0.8 C/century by GISS and 0.4 C/century by NCDC -  these other trends show up in the ensemble of scenario simulation at most in 4.7% of all cases and 0.6% of all cases. Obviously, there is some uncertainty in the trends, but our overall conclusion that the present trends are at the margin of the distribution generated by available A1B and RCP4.5 scenarios is robust against this uncertainty.

To increase the size of the simulated ensemble of model-suggested trends we analysed not only the recent simulated trends under the regime of A1B and RCP4.5 scenarios, but also all n-time segments in the period up to 2060, in which the assumed external forcing increases linearly as in the emission scenarios A1B and RCP 4.5. These scenarios describe changing emissions of greenhouse gases and aerosols, but do not describe changing solar activity, volcanic activity or any cosmic influences, since scenarios (or even predictions) of these factors for the next decades are very uncertain to construct. We let n vary between 10 and 30 years.

If the slow trend derived from the GISS, HadCRU or NCDC analysis would continue for a total of 20 years, the trend would occur at most in 0.9% of all cases. Of course, this statement is conditioned by the presently available set of scenario calculations (in CMIP3 and CMIP55). The title "Can climate models explain the recent stagnation in global warming" was likely misleading, as we did not examine climate models in general, but merely the output of contemporary climate models, subject to a specific class of scenarios, which are best mimicking the recent development. Maybe a title like "is the on-going warming consistent the developments envisaged by scenarios simulations exposed to realistic increases in GHG forcing" would have been more appropriate.

This manuscript was submitted to Nature, but it was not accepted for publication. Unfortunately, the reviews are subject to copyright-rules by nature, and we are not allowed to reproduce the reviews here. The manuscript has been clicked on more than 3000 times until 22. August 2013, with most clicks from, but also many from We want here to set straight some misinterpretations that may have arisen in the blogosphere, e.g. Bishophill, and may also have been present in the review processes by Nature as well.

The main result is that climate models run under realistic scenarios (for the recent past) have some difficulty in simulating the observed trends of the last 15 years, and that are not able to simulate a continuing trend of the observed magnitude for a total of 20 years or more. This main result does not imply that the anthropogenic greenhouse gases have not been the most important cause for the warming observed during the second half of the 20th century. That greenhouse gases have been responsible for, at least, part or even most of the observed warming, is not only based on the results of climate simulations, but can be derived from basic physical principles, and thus it is not really debated. It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years. The effect of greenhouse gases is not only in the trend in global mean near-surface temperature, but has been also identified in the spatial pattern of the observed warming and in other variables, such as stratospheric temperature, sea-level pressure and others.

However, climate model projections are not perfect. They are in a constant state of revision and improvement. The comparison between simulations and observations, and the identification of any mismatches between both, is thus a very important, and probably unending, task in climate research. This manuscript should be viewed under this perspective. However, the basic features of man-made climate change have been robustly described by these models in the course of time, even if more detail has been added, and rates of changes have somewhat changed in the course of time.

To understand the present mismatch, we suggest four different explanations; none is pointing to a falsification of the concept that CO2 and other greenhouse gases exert a strong and likely dominant influence on the climate (statistics of weather). None represents a falsification of climate models. But all point to the need for further analysis and improvement of our tools - which are scenario simulations with climate models û for describing possible future developments.

One is an underestimation of the natural climate variability, which could be related to variations in the heat-uptake by the ocean and/or in internal variations of the energy balance itself (such as cloud cover). Another possibility is that the climate sensitivity of models may be too large, but a longer period of mismatch would be required to ascertain this possibility, as 15-years trends are still strongly influenced by internal climate variations. A third possibility is that the set of external forcings prescribed in the CMIP5 simulations lack a component of relevance. In particular, the CMIP ensembles assume a constant solar irradiance, due to the difficulties in predicting solar activity. However, solar irradiance displays a negative trend in the last 15 years, which could be part of the explanation of this mismatch. Finally, although the number of simulations that produce a trend as subdued as observed is small, it is still not zero. The last 15 years may have been an outlier, especially considering that the starting years - 1998 - experienced a strong ENSO event, and therefore was anomalously warm. Thus, further analyses are necessary and we intend to carry them forward.

At present, we cannot disentangle which of the different possible explanations is the best - maybe a combination, but the conclusion is not: GHGs play a minor or no noteworthy role in ongoing and expected future climate change. A conclusion that we draw is that the A1B and RCP4.5 scenarios, which are used in very many impact studies, are suffering from some limitations.

Our paper does not represent a crises of the understanding of the climate system, but a wake-up call that scenarios have to be prepared better, and that all impact studies should expect that details of future scenarios concerning speed of change and intensity of natural variability may be described quite differently.


Karl Kuhn said...

Dear Drs Zorita and von Storch,

thank you very much for your willingness to share and discuss your carefully worded conclusions with the audience of the Klimazwiebel. The following two sentences caught my attention, and I would like to pose two questions, myself being someone illiterate in climate science, but deeply involved in modelling of complex biophysical-economic simulation systems.

"That greenhouse gases have been responsible for, at least, part or even most of the observed warming, is not only based on the results of climate simulations, but can be derived from basic physical principles, and thus it is not really debated."

Is the water vapor feedback effect part of these 'basic physical principles', and can measurements of water vapor content in recent decades be replicated by the models you investigated?

"It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years."

I infer from this that climate science and perhaps even climate simulation models CAN explain the warming period of the first half of the 20th century without invoking the greenhouse gas effect?

Thank you very much in advance!

Paul Matthews said...

The link for the paper should be

The paper seemed to be fairly clear in its abstract that the warming stagnation over the last 15 years " is no longer consistentwith model projections even at the 2% confidence level".
But maybe the Bishop Hill headline "models falsified" was an exaggeration.

As I am sure you are aware, Nature rejects the vast majority of papers it receives.
Has the paper been submitted to another journal?

Shub said...

The strong position taken in the beginning of the article is negated by the discussion that follows it.

The various possibilities raised in the second half of the article are in play only owing to one reason: the lack of a match between model output and real temperatures. But, the strength in confidence and the quantum of attribution, of 20th century warming to CO2, derive in no small measure from the same models.

The model/s should do better than just scraping by narrowly. If tomorrow tempertures increase, will the models 'perform' better? Sure they will!

Verification in science comes only with prediction, i.e., getting the right answer for the right reason. Everything else is moot.

Ed Hawkins said...

Hans - did the reviewers discuss the choice of CMIP5 simulations? Having 21 members (out of 62) from the various flavours of the GISS GCM, which all very low amplitude variability does not seem very representative?

Ed Hawkins

Shub said...

You mean, choosing models with increased amplitude would widen the range in predicted temperatures, thereby making the global average fall within chosen CIs?

Ed Hawkins said...

Shub - correct. The CMIP5 models have a large diversity in their simulated variability:

Picking 1/3 of your simulations from 4 (out of 42) models which are all at the very low end is not sampling the possibilities effectively.


Anonymous said...


is it possible that there is something wrong with the link to the paper? I arrive at a different page, about "Cold Plasma Inactivation". I guess there is a "9" missing in the url.

So here is the correct link:


MikeR said...
Lucia discusses the paper, and adds her own analyses. Michael Tobis comments.

My own question: How does the out-of-sample error for these models compare with the in-sample error from the "training period" of last century's temperatures? If it is much greater (which I don't know) doesn't that imply that there's a bigger problem - that the model design was essentially curve-fitting - than that the climate sensitivity is a little high?

As an example, many are saying now that the "missing heat" has gone into the deep ocean; that's why surface temperatures are too low for a decade or more. But what was happening for the last century? We have absolutely no data on the deep ocean temperatures from then, but is it the least bit believable that the effect started only since the year 2000? How did all those models correctly model surface temperatures for a century, without taking this obviously big effect into account? If so, how can anyone claim that the models are "just the physics" and are not tuned?

MikeR said...

Dr. Hawkins, I think the Lucia post and several around it discuss the climate variability of the models. She suggests that the variability of the models is _already_ too high; that is, that the real climate historically varies less than the models do. Obviously you can make the models fit any data by putting enough noise into them.

I'm not really addressing your point - it certainly makes sense to take a representative sampling of the models if one is studying them, but I thought it was relevant.

JamesG said...

You cannot call this science surely? That you cannot explain the last 15 years is mainly because you cannot explain at all what natural variation consists of or how powerful it is. What gave rise to the little ice age for example? If you predict something to rise parabolically and it goes the opposite way do you not at the very least deeply reconsider your predictions?

As this is so minor a trend it may even be natural and benign. Only the models convert innocuous warming into catastrophe and they are completely falsified.

Regardless of the 1 degree that is supposed to come from a doubling of CO2 by hugely simplistic 1D theory, the remainder was supposed to come from postulated positive feedbacks that just don't seem to be there. 1 degree is beneficial even according to the IPCC so the panic is about nothing.

Also the entire idea that mans contribution could be teased out from natures comes from models that are falsified. They are not imperfect as some like to say, they are inadequqte and unfit for the purpose of policy.

In short, that serious energy policy is based on this bag of speculative argumentation is an indictment of the entire field of climate science. You could be marching us off a cliff of high energy costs, low growth, poverty and starvation based on what amounts to mere pessimistic guesswork.

You might see it all as a curiousity, worthy of further funding ad infinitum. However, we in the real world see it as too much academic hubris resulting in massive waste of public and private funds in the middle of a deep recession.

Volker Doormann said...

"It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years."

There are a number of temperature reconstructions available with time intervals from centuries to millennia containing temperature frequencies and its power strength in analysed spectra. Despite the unknown physical nature of the mechanism it is a realistic alternative to the projection models explaining the warming in the last 50 years, because the relevant temperature frequencies can identified as astronomical functions. From the power strengths of the frequencies it is possible to simulate the global temperature, except the effects of the ocean impedances and the delay in time, and except the volcano drops.

Adding the fast functions, which have the most interference with the ENSO delays, it becomes clear that the simulation from temperature frequencies is an alternative method to explain the global warming. This was also the base of the idea of M. Lativ in 2008, when he has transferred temperature frequencies from the 1950ies and has correct forecast a stagnation in temperature.

My experience with models of heat currents in streaming fluids were based strong on physics and the physical properties of the fluids in a geometric model of 1:1.

I think this is an unalterable supposition for a physical model with no alternative.

In science there ever is an alternative or may an alternative.


Shub said...

Ed (Hawkins)

You say:

"The CMIP5 models have a large
diversity in their simulated variability:"

It is good that we are in agreement, though you may missing a point here.

If enough models, each with sufficient inter-decadal amplitude of global temperature change are included (in the ensemble), the confidence intervals will necessarily widen. At one point, the composite of models will essentially turn non-falsifiable, i.e., produce a range of temperatures that real-world temperatures would never fall out of.

The test of predictability comes from real-world temperatures falling with confidence intervals for an ever-decreasing number of models.

If the range produced by models includes one model that is flatline, and another that shows a temperature rise with slope corresponding to 0.7 (say, for example), that set of models would be non-falsifiable.

Hans von Storch said...

The URL has been corrected now. Thanks for pointing to this error.

For those, who do not want to go through, the manuscript is also available from ResearchGate.

Mr. Meteoman said...

Weshalb ist ein Trend über 15 Jahre so wichtig? Meistens heist es doch, klimatische Aussagen benötigen 30 Jahre an Daten, alles andere sei zu stark überlagert vom Rauschen?

Volker Doormann said...

„Weshalb ist ein Trend über 15 Jahre so wichtig? Meistens heist es doch, klimatische Aussagen benötigen 30 Jahre an Daten, alles andere sei zu stark überlagert vom Rauschen?“

Ich denke, man muss da unterscheiden zwischen Definitionen und physikalischen Prozessen. Der Begriff ‚Klima’ entstammt den Eigenschaften des lokalen Wetters über längere Zeit und seinen Mittelwerten auch in den Jahreszeiten. Lokale Wetterereignisse und auch extreme Wetterereignisse kann man physikalisch schwer zuordnen, aber man kann einen mittleren Trend der Temperaturen aus der Vergangenheit berechnen. Und für das Klima ist der Zeitraum der Vergangenheit auf 30 Jahre definiert. Unabhängig von dieser Definition steht es natürlich jedem Wissenschaftler frei auch kürzere Zeiträume zu untersuchen, insbesondere dann, wenn es Anzeichen dafür gibt, dass die globalen Temperatur-Anomalien Zeitperioden zeigen, von z.B. 3 Jahren und fast 7 Monaten, die man mit den Strömungen der Ozeane auf der drehenden Erde durch die Wärme von der Sonne erklären kann.
Wenn man andere Verzögerungen physikalisch begründen will, dann muss man zeigen, in welchem Wärmespeicher denn die Wärmemenge gespeichert ist und über welche Wärmekapazität der Wärmespeicher verfügt. In der Treibhausgas-Theorie geht man davon aus, dass aus dem Wärmespeicher der Erde mit der Atmosphäre sich die globale Temperatur monoton erhöht mit der steigenden Konzentration an Treibhausgasen in der Atmosphäre. Da man die Ursachen der Schwankungen der globalen Temperaturen nicht kennt ist es natürlich ein Problem den Anteil des Treibhaus-Effekts an der globalen Temperatur zu bestimmen.
Der Zeitraum von 15 Jahren ist wohl ein Zeitraum der einerseits vorsichtig einen Klimatrend für 30 Jahre erahnen will, aber auch wohl ein Zeitraum, der wohl so lang ist, dass man Schwierigkeiten hat, einen Wärmespeicher zu begründen, der die erhöhte Wärmemenge aus den monoton steigenden Konzentrationen der Treibhausgase voll aufzunehmen, wenn sich die globale Temperatur in den letzten 15 Jahren nicht erhöht hat. Es ist also ein sinnloser Streit zwischen einer (politischen) Definition und der Physik. Physik kann sich nicht auf Definitionen ausruhen; Physik muss zeigen, was die Ursachen der globalen Temperaturen, auch der kurz- und langfristigen Perioden sind, auch wenn sie mal im Mittel über mehr als ein Jahrzehnt stagnieren. Diese Anomalien sind kein Rauschen, wie es das thermische Rauschen aus Gitterbewegungen in elektrischen Widerständen ist; es hat reale physikalische Ursachen, ebenso, wie die Anomalien des Klimas die man als ‚Kleine Eiszeit’ kennt.

eduardo said...

@11 Mr. Doormann

you wrote 'Despite the unknown physical nature of the mechanism..'. Then this is no valid explanation in the modern sense.

We can for instance also 'explain' the orbits of the planets with Fourier analysis, but this explanation would be ptolemaic. It would not include the real cause for the orbits (gravitation). I could b brave and explain that the sun rises every day because I also have breakfast everyday.

Indeed, there is no alternative explanation, so far, for the observed warmig of the last 50 years that does not include GHG. This does not mean that climate models are perfect and provide accurate predictions. But I would really like to see an alternative physical theory that tells me why temperature is rising near the surface and hes risen by 0.8C in the 20th century , it is cooling in the stratosphere, why sea levels are rising for the last 200 years by 20 cm (i.e. why the planet is gaining energy as a whole). In other words, we have to request to any alternative theory the same level of accuracy and falsifiability that we require for GHG.
I may be wrong here because I cannot fathom all the papers that have been published, but I cannot remember any publication prior to 1998 that predicted the current stagnation. Please, correct me if you have better information.

I would also like to see a falsifiable prediction for the next years. if the20th century warming has been caused only by natural mechanism, when will temperature start to drop and by how much. Can anyone way be specific here ?

Hans von Storch said...

Meteoman/14 - an den 15-jährigen Trends ist per nichts Besonderes dran; in der Tat hat man sich im Rahmen der WMO (und Vorgänger) darauf verständigt, das "Klima" aus 30-jährigen Statistiken abzuleiten, also Mittelwerte, Trends, Variabilität, Extreme etc.
Argumentativ steht das vermutlich in Beziehungen zu den 35-jährigen Quasi-Perioden von Eduard Brückner, die bin in die 1930er Jahre populär waren, heute aber praktisch vergessen sind, nachdem der Hype der Periodizitäten abgeklungen ist. Es handelt sich bei den 30 Jahren also um eine gesellschaftliche Verabredung, die sich auch gut bewährt hat.

Warum man sich für die 15-Jahre interessiert? - einfach, weil in den letzten 15 Jahren wenig globale Erwärmung stattgefunden hat und man sich fragt - was ist, wenn sich das fortsetzt, was würde das bedeuten? Denn der schlußendlich herauskommende 30-jährige Trend wird ja im erheblichen Maße auch von den ersten 15 Jahren beeinflußt.

Es ist aber auch eine Frage an die Klimamodelle, inwieweit sie die Variabilität auf allen Zeitskalen, von Stunden zu Jahrhunderten realistisch darstellen. Und da stellen wir fest, daß so kleine 15-jährige Trends in Gegenwart eines deutlichen Anstieges von Treibhausgas-Konzentrationen in diesen Modellen fast nie beschrieben wird, was darauf hinweist, daß die Modelle das nicht "können" oder "wollen". Das muß nicht die Fähigkeit der Modelle, die Wirkung von erhöhter Konzentration von Treibhausgasen zu beschreiben, beeinträchtigen, aber ausschließen kann man das auch nicht. Es besteht also Forschungsbedarf, nicht mehr und nicht weniger.

Hans von Storch said...

Ed Hawkins/4 - No, the reviewers did not comment on a possible bias related to the usage of relatively many GISS-family scenarios. We used all the data available at the CMIP5 data base, made no selection.

Hans von Storch said...

Paul Matthews/2 - "As I am sure you are aware, Nature rejects the vast majority of papers it receives.
Has the paper been submitted to another journal?
Yes, a rejection by nature is by no means a catastrophe but a kind of standard. But it was clar that we would have a hard time to argue against the reviewers, who first of all pointed to "not really innovative" and "depends all on the warm year 1998". We are now extending the manuscript (for nature it had to be very short), and will submit it well after the IPCC publication of the WGI report in about half a year or so.

Hans von Storch said...

Karl Kuhn/1 - "Is the water vapor feedback effect part of these 'basic physical principles', and can measurements of water vapor content in recent decades be replicated by the models you investigated?

I infer ... that climate science and perhaps even climate simulation models CAN explain the warming period of the first half of the 20th century without invoking the greenhouse gas effect?

Ad 1: part I, yes, part II: I do not know, I guess others can answer.

ad 2: Yes, the first part of the climate variations in the 20th century could be explained be natural variations. -- see, for instance the early study Hegerl, G., H. von Storch, K. Hasselmann, B.D. Santer, U. Cubasch, P.D. Jones, 1996: Detecting anthropogenic climate change with an optimal fingerprint method. - J. Climate 9, 2281-2306. Note the paper is 19 years old; first submission in August 1994.

Anonymous said...

Hans von Storch,

you gave here four possible explanations, but in the draft linked above I can only find three.
Did the fourth explanation presented here (I would call it the "bad luck hypothesis") emerge in the peer review process?


hvw said...

Hans von Storch:
We used all the data available at the CMIP5 data base, made no selection

This is hard to believe, as the CMIP5 database currently lists 43 models for which the relevant data are available. Even if you failed to download one or the other, as easily can happen, you either somehow "forgot" on the order of 20 models, or you made a selection indeed. (I'm not saying a planned one to bias the results, as the other poster implied).


Hans von Storch said...

we did not discuss the trivial explanation - the small likelihood event of - you call it - "bad luck".
for scientists this is obvious and does not need to be discussed - at least I would presume so - but for a more general public it may be worth to list it explicitly.

Anonymous said...

Sehr geehrte Herren,

ich habe beim Lesen des Papers einen Punkt nicht verstanden: Ist die Aussage des Papers, dass die Modelle einen so niedrigen 15-Jahrestrend nicht hergeben, oder dass eine Voraussage eines so niedrigen 15-Jahrestrends nicht im Zeitraum 1998-2012 stattfindet? Hatte ich korrekt verstanden, dass es sich um den zweiten Fall handelt?

Falls dem so ist, stört mich hierbei ein wenig die Konzentration auf 1998. Wäre der Aussagewert nicht höher, wenn man sich auf den 15-Jahrestrend konzentrieren würde, und ermittelt, inwieweit die Modelruns 15-Jahres Trends eines ähnlichen Anstiegs wie von 1998-2012 ergeben? Wieso muss es konkret 1998 sein? ENSO ist nicht vorhersagbar, wieso müssen also die Modelle dazu in der Lage sein, ein außergewöhnlich starkes ENSO Ereignis zum korrekten Zeitpunkt in die Vorhersage einzubauen?`

Würde es denn für die Qulität des Modells nicht reichen, wenn ähnliche 15-Jahrestrends auftreten würden, also unabhängig von 1998? Ähnlich wie z.B. 1953-1968. Oder 1963 - 1978.

Falls ich die Aussage des Papers falsch verstanden habe, ignorieren Sie meinen Kommentar bitte. Ich bin nur ein interessierter Laie.

Mit freundlichen Grüßen,

eduardo said...

the cmip5 site is a 'meta-site'. This means hat the data are not actually stored there: it is hub that redirects the user to the individual sites that actually store the data. Sometimes, the files are broken or the variables listed at the cmip5 site are not actually available. We are checking this though.

This question is, however, not relevant here. We have now re-done the analyses with data from other source that stores directly the global averages and the results remain unchanged. For instance, using the RCP4.5 runs (109 in total) until 2060, the HadCRUT4 temperature trend in 1998-2012 lies below the 2% percentile of the RCP4.5 ensemble: the same as in our manuscript.

Other blogs reach similar conclusions, and as do other published published papers .

Perhaps in this case we should try to find out why in this period the model ensemble is barely compatible with observations, and in doing so improve models and in the end come up with better projections, instead of uncritically dismissing the message beforehand.

There may be simple explanations for the stagnation, e.g. that the heat is going into the deep ocean. But then, we have to find out why models have problems in sending this amount of heat to their respective model ocean.

eduardo said...


'Falls dem so ist, stört mich hierbei ein wenig die Konzentration auf 1998. Wäre der
Aussagewert nicht höher, wenn man sich auf den 15-Jahrestrend konzentrieren würde, und
ermittelt, inwieweit die Modelruns 15-Jahres Trends eines ähnlichen Anstiegs wie von
1998-2012 ergeben?'

Das ist genau, was wir gemacht haben

Hans von Storch said...

_Flin_: es sind alle Trends (ursprünglich non-overlapping, inzwischen alle; Ergebnis unverändert) Trends während der linearen Anwachsphase der GHG Emissionen in A1B und RCP4.5 (also b is 2060) betrachtet worden. Diese Szenarienrechnungen geben wieder, wie die Modellklimate reagieren auf ansteigende GHG Konzentrationen und abklingenden menschgemachten Aerosolforcing. Sie wissen nichts davon, dass 1998 ein starkes El Nino Ereignis war, aber selbstgemachte El Ninos sind gegenwärtig in diesen Simulationen.
Das aussergewöhnliche 1998er ENSO event macht die Analyse schwieriger, aber in früheren Zeiten wurde gerade das 1998er Ereignis als schön passend zum generellen Warming angesehen; dann muß das Ereignis auch zukünftig "mitspielen". Eine Erklärung kann aber gerade sein, daß die natürliche Variabilität in den Szenarien, und dazu gehören ENSO Ereignisse, zu schwach ausfallen. Das wäre aber auch schlecht, weil dann das Niveau für Detection zu gering angesetzt gewesen wäre.

Anonymous said...

@ HvS, Eduardo

Aus dem Ergebnis, dass nur 2 bzw. 5% der Modellläufe den Trend seit 1998 reproduzieren wird z.B. bei BishopHill geschlossen, dass damit die Modelle falsifiziert sind.

Ich weiß, dass sie das natürlich viel differenzierter betrachten, mir ist aber etwas Grundsätzliches noch nicht klar geworden:

Wir wissen, dass in dem Zeitraum seit 1998 die solare Aktivität schwächer war. Wir wissen, dass der Zeitraum eher von LaNina-Ereignissen bestimmt wurde und dass mit 1998 mit einem ungewöhnlichem ElNino-Jahr begonnen wurde.

Da kommt im Sinne der "bad luck-Hypothese" schon einiges zusammen. Ist es dann nicht logisch, dass solche Phasen in den Modellläufen selten vorkommen? Aber wie selten wäre "normal", wie "ungewöhnlich selten" sind die in ihrem Paper genannten Zahlen?

Möglicherweise habe ich ja nur ein Brett vorm Kopf, wäre dann schön, wenn es wegkäme.


hvw said...

Dear Eduardo,

I am well aware that the relationship between the ESG database and actually existing and usable files is not perfect. It's a pain. However, I am sitting in front of actual files of tas for rcp45 from 44 models. Given that an unavoidable weak point of such an analysis is that you are restricted to an "ensemble of opportunity", I believe it is highly desirable to make sure to use anything that is available. If I were a reviewer, I'd be bitching big time if you presented only a subset and state that the incompleteness is "not relevant here".

ACCESS1-0 : 1
ACCESS1-3 : 1
bcc-csm1-1 : 2
bcc-csm1-1-m : 1
CanCM4 : 10
CanESM2 : 7
CCSM4 : 7
CESM1-CAM5 : 4
CESM1-CAM5-1-FV2 : 1
CMCC-CM : 10
CNRM-CM5 : 6
CSIRO-Mk3-6-0 : 13
EC-EARTH : 312
FGOALS-g2 : 27
GFDL-CM2p1 : 70
GFDL-CM3 : 59
GISS-E2-H : 90
GISS-E2-H-CC : 2
GISS-E2-R : 188
GISS-E2-R-CC : 4
HadCM3 : 20
HadGEM2-AO : 1
HadGEM2-CC : 5
HadGEM2-ES : 28
inmcm4 : 1
MIROC4h : 9
MIROC5 : 3
NorESM1-M : 2
NorESM1-ME : 1
numbers after the colon are number of files. That includes those referring to the period after 2060 though. Let me know if you want more, tracking-ids for example.

That said, I do not strongly believe that the results change significantly if you include all models.
The paper is nice and clear and somebody has to do this first step. Otherwise I agree with Andreas below that this cannot be all we've got to offer. This is an exploratory result which doesn't provide a robust answer to the question about the likelihood of this 15y trend happening conditional on the models having no collective error that would lead to the underestimation of variability on that timescale. More research is needed :).

MikeR said...

Drs. Storch and Zorita, I'm wondering why the ensemble of models is the right metric to be using. Would it be a good idea to identify which models failed, and are rejected, and which are not rejected (yet)? Why not get rid of the ones that didn't work, and proceed with what remains?

eduardo said...

Dear hwv,

thank you for your input. The number after the semicolon, as you said, represents the number of files, which may refer to 6-hourly, daily and months means for different sub-periods altogether. For instance, for model GFDL-CM3 you indicate 59 number of files. The CMIP5 site at Lawrence Livermore Nat. Lab. includes just 1 realization of model GFDL-CM3 for scenario rcp4.5 ( I just checked this).

In my previous comment - maybe you overlooked it- I indicated that we have repeat the analysis downloading the global means from the Climate E Explorer for a total of 109 simulations, and the results are the same as in our manuscript

Hans von Storch said...

Unsere Beobachtung hebt ab auf den Grad an Realismus, den die Szenarienrechnungen zeigen.

Diese Szenarienrechnungen werden in fast allen Abschätzungen der Wirkung von Klimaänderungen verwendet. Die genannten "internen" Faktoren sollten von den Modellen in den Szenarien dargestellt werden; vielleicht gelingt dies nur unzureichend. Es kann auch sein, daß andere Wirkfaktoren unzureichend beschrieben sind, was heißen würde, daß relevante Faktoren unberücksichtigt bleiben. Alles schöne Herausforderungen für weitere Forschung.

hvw said...

Dear Eduardo,

you are absolutely right that the number of filenames I listed are useless. Not because they refer to different timesteps (it's all monthly) but because files are split into different intervals. If you like to compare with what you got from Climate Explorer (would that be regarded as an authoritative source anyways?) here are the number of realizations per model that I can see and from what you have about 74%:

ACCESS1-0 : 1
ACCESS1-3 : 1
bcc-csm1-1 : 1
bcc-csm1-1-m : 1
CanCM4 : 10
CanESM2 : 5
CCSM4 : 6
CESM1-CAM5 : 3
CESM1-CAM5-1-FV2 : 1
CNRM-CM5 : 1
CSIRO-Mk3-6-0 : 10
FGOALS-g2 : 1
GFDL-CM2p1 : 10
GFDL-CM3 : 1
GISS-E2-H : 15
GISS-E2-H-CC : 1
GISS-E2-R : 17
GISS-E2-R-CC : 1
HadCM3 : 10
HadGEM2-AO : 1
HadGEM2-CC : 1
HadGEM2-ES : 4
inmcm4 : 1
MIROC4h : 3
MIROC5 : 3
NorESM1-M : 1
NorESM1-ME : 1
Total: 148

That brings be to another question: Apparently you are using multiple realization for a model, if available. Doesn't that give undue weight to the models with many runs? In other words, would different realizations of the same model not be expected to show a similar, model-specific variability?

eduardo said...

dear hwv

thank you very much again. We will check out what we have missed.

Related to this, it is a pity that the CMIP5 site is so cumbersome. I think it is a missed opportunity to increase the transparency. I guess that it is is a difficult task though.

eduardo said...


in some sense, estimations of the climate sensitivity based on Bayesian methods are based on what you are proposing. They essentially are weighted averages, the weights being a measure of how close a model is to past observations.
However, this becomes quickly a more fundamental question: if I am a pilot and one among three on board computers disagrees with the other two, I would not build an average among three. I would try to understand why this happens. On the other hand, it may very well happen that the model one would reject because it fails to reproduce the temperature trends, is the one that produces a better annual cycle of, say precipitation.

I would essentially agree with you that one goal should be to disregard the worst models, but there are different opinions on this. In the end, the question boils down to 'what does an ensemble of models represent, when at most only one can be right ?'

The end of model democracy

Predicting weather and climate: Uncertainty, ensembles and probability

MikeR said...

Thank you, Dr. Zorita!

Anonymous said...

@HvS: Vielen Dank für Ihre Antwort.


Paul Matthews said...

It must be rather frustrating for you to have your paper rejected by Nature and then see today a paper published in Nature saying more or less the same thing
"Overestimated global warming over the past 20 years".

I wonder what is the difference between the two papers, apart from the names of the authors?

Hans von Storch said...

Paul, we mostly interested in the ability of scenario simulations in describing the present stagnation, not in explaining the stagnation. That is quite different.
What I find difficult with the "other" paper that it is again an a-posteriors explanation (like cold European winters caused by less Arctic sea ice in the preceding fall) and just one. There are in principle others, and we would need to do some work to disentangle the plausibility of different explanations.

lucia said...

hvw said
Doesn't that give undue weight to the models with many runs? In other words, would different realizations of the same model not be expected to show a similar, model-specific variability?
They do. I've done this a different way combining the distitributions by model. I've counted each entry at the climate explorer as a model to estimate a typical variability for a model but based the estimate on models with more than 1 run in the projection. (My method requires repeat runs from a model to estimate the variability due to initial conditions only.)

If you examine my figure in that post you'll see the variability of trends differs from model to model as does the mean trend.

The results are similar to VonStorch and Zoritas.

I haven't organized the code to collect together some models listed in several cases (e.g. E2-H_p1, _p2, _p3 better be considered 1 estimate of the variability; if this is done, likely E2-R and MPI-ESM should be similarly grouped. )

hvw said...

lucia, thanks for the info. Your approach to dealing with such an unbalanced ensemble sounds like an improvement.

A new study ( seems to point to a link between ENSO-related SST patterns and the currently observed small global temperature trend.

I wonder whether something can be learned by sorting the models under consideration by their performance in capturing ENSO.

Another thought: If we assume (or better hope) that modelled global temperature variability doesn't change much with the system's position on a warming trend (and you and HvS and EZ apparently do that by considering the distribution of n-year trends stationary in a 55 year interval), then it might be worthwhile to examine the AMIP runs with respect to their decadal variability. But someone already did this, I suppose ...

Hans von Storch said...

hvw - the new study published by nature on a possible link to ENSO is certainly encouraging, but it is typical how things are negotiated - somebody suggests one solution - which explains what happens, but it dos not help to sort our question, what is wrong with the scenario simulations - but only one. There may be others, and before declaring that our problem is solved we must be able to exclude other "solutions".

wflamme said...

After rolling two dice severel times does it make sense to question the dice model after havong obtained two sixes?

lucia said...

lucia, thanks for the info. Your approach to dealing with such an unbalanced ensemble sounds like an improvement.
I don't know if it is an improvement--but it has the potential for addressing whether the estimate of the variance in trends is over-dominated by models with smaller variances in trends which some like Ed Hawkins suspect to be the case.

In this regard: it is worth noting that if we examine residuals from the linear trends relative to what we see for the earth, on average the models have too much natural variability, not too little. Mind you: this test is dominated by variability at timescales less than the trend length and also, some models have less small scale variability that the observation. And also: the test is ambiguous (high model residuals could arise from excessive internal variability in models or from failure to correctly model volcanic eruptions). Nevertheless, the test can be done, where in contrasts tests to compare variability at long time scales to earth variability have such low power as to be practically impossible). And this test which has the advantage of being 'doable' does not point to individual models having too little internal variability on average.

I wonder whether something can be learned by sorting the models under consideration by their performance in capturing ENSO.
I was planning to apply an enso adjustment to the models which must be done if one is going to compare ENSO corrected observations to model outcomes. I grabbed the required model data, but haven't done it yet. (I've got to get of my duff and do it.)

If the models do simulate enso propery, this should narrow the variance in trends for models. As we have had La Nina's recently, it ought to move the earth trends more positive. How the two will pan out together I don't know-- but I anticipate to be similar to what's in the Fyfe paper.

FWIW: I prefer the comparisons without ENSO as more useful for a variety of reasons including the fact that once one considers ENSO, one has a variety of choices to try to remove ENSO, and many choices means that one might hunt for the method that gives answer the analyst 'prefers'.

Another thought: If we assume (or better hope) that modelled global temperature variability doesn't change much with the system's position on a warming trend
That's an important issue. But this assumption that the variance in 'n' year trends is identical over all periods is testable using the exact same methods we can use to test whether the variance in trends from different models differ from each other. So the assumption seems warranted (or at least is not inconsistent with the model-data available.)

We can compute the variability in trends across runs of identical models over matched periods and test if this variability changes over time. (The other test is to see if variability differs across models).

I have done so in the past with the AR4 models and there is no particular evidence the natural variability in trends increased (or decreased) over the 20th or modeled 21st centuries or that the variance differs from period to period.

In contrast, the same method used to detect whether variances differ across models confirms they do differ. The variance in trends is larger in some models than others. I need to repeat this and formalize it. (I think I mostly did 10 year trends too. ) I haven't done this check for the AR5 models mostly because I have a number of items on the 'to check' list.

I'd write more but blogger limits characters. Plus, I need to go actually implement the ENSO correction. :)

Hans von Storch said...

I do not think that one can order models according to skill or quality. Depends all on the metric, and there is no way of choosing a "best" metric. Why should ENSO be more important than extreme rainfall in Asia, than the MJO, or the formation of blockings, just to mention a few.

Also, when jumping on the ENSO part, you have made a choice among the three (or four) explanations for the inconsistency of observed recent trend vs A1B/RCP4.5 trends - you say: it is the natural variability. But how do you know it is not a a lack of external forcing, or a possibly slight overestimating of the GHG response? Maybe we even had only bad luck, and this stagnation is a two in hundred rare event?

Does the natural variability explanation - which I personally find attractive - have a specific political utility, namely that we do not need to touch on the quality of the forcing nor on the quality of the response to forcing?

By the way, when the natural variability is not ok, and the models usually describe the full variability in the 20th century well, then the additional/missing natural variability must have been accounted for by forced variability. If too little natural, then the responses is overestimated, if it is too large, then it is underestimated.

We need time and patience to deal with these issues and should not jump on the most convenient explanation why our scenarios fail in describing the recent (and quite possibly soon ending) stagnation.

lucia said...

Hans von Storch,
I don't know if your response was addressed to me.

Like you, I doubt we can ran models according to quality. I think hypothetically it could be done. But -- as you say, why prefer ability to mimic ENSO vs. MJO? Hypothetically if one model was sufficiently bad at everything we could throw that one away.

On the 'ENSO part', the only reason I think it's worth examining whether ENSO is an explanation is that when presented data showing the current observations are skirting or outside the range of the models, some people always immediately suggest it is ENSO as hwp did just above. Since methods of explaining ENSO for earth observations exist, when some suggest 'it's just ENSO', it can be worth looking into that issue and seeing whether correction application does change the result.

namely that we do not need to touch on the quality of the forcing nor on the quality of the response to forcing?
I actually favor these two as the more likely reasons because I don't think the main reason for the discrepancy is ENSO.

models usually describe the full variability in the 20th century well,
Do models describe it well "well"? And how well? Collectively the variability of 10 year trends in the models used in the AR4 was exceeds the variability of 10 year earth trends in the 20th century by between 2% to 30% depending on whether the comparison is made between models and HadCrut3, NCDC or GISTemp.

The collectively model variability in 10 year trends exceed that of the earth variability despite the fact that (a) the earth includes measurement errors on top of other variability and (b) some of the the AR4 models did not include volcanic or solar forcings. The effect of each factor individually should tend to make variability in observations of earth trends larger than in models-- and yet earth trends are, if anything, somewhat smaller. (The amount depends on whether one chooses HadCrut3, NCDC or GISTemp for the comparison.)

then the additional/missing natural variability must have been accounted for by forced variability
In fact, with some of the models in the AR4, we can see large variability. But if tabulated, the excess might be overlooked -- because those model runs contained no volcanic forcings. And so while the very large spikes in earth temperature frequently coincided with volcanic eruptions, those in the model simply occur due to that models internal variability. For example, see echam5:

So, what we have here is a model whose variability of 10 year trends might not look so poor when variability of 10 year trends in single runs over the 20th century are tabulated and compared to that of earth trends, but which, to some extent, achieved that goal precisely be leaving out volcanic forcings which are thought to have caused a portion of the variability in 10 year trends for the earth.

Certainly there are other models whose variability seems possibly too small. But if we make the comparison in the aggregate, the variability of 10 year trends in individual models seems more likely on the high side than the low side.

We need time and patience to deal with these issues
I agree with this. Unfortunately, with only one earth one can't go to the lab collect replicate earth observations which would be very useful if we could have them.

eduardo said...

I am not sure that the models should be weighted by the number of realizations they provide. This would assume that the models are independent, which has been shown not to be true. Let us assumed we have 50 realizations with model M. Perhaps 2 of this realizations have been done on another computer, or with another compiler, or someone changed a comma in the FORTRAN code. Formally, these two realizations belong to a different model, and yet in reality, they are almost the same model. If we weight compute the variance separately and combine them, these two realizations would be unduly overweighted.
We have two sources of variability for the trends: The structural (model) variability, and the internal variability. By weighting the ensemble, we are implying that the first is more important. Why ?
This question was indirectly addressed in the manuscript (supp. info). The ensemble is not a random sampling of a putative `model space`. Actually, we do not know what the ensemble represents, and so weighting the ensemble is not per se better.

lucia said...

I agree with you that weighting is not necessarily better. On other hand, it's not necessarily worse either. To some extent, weighting by runs vs. weighting by model are just different ways and getting similar results both ways merely shows a degree of robustness. That is: the result isn't emerging merely because of a somewhat arbitrary choice.

For example, while you example explains the difficulty with treating two things that claim to be different models as different when they are the same, a similar issue would hold if the 50 run model and the 2 run model really were different with different parameterizations or solution methodology, but we weighted by runs. In this case, each model does provide an independent estimate of the variance runs about a mean given that set of parameterizations. Meanwhile the difference in the mean between the two runs gives an estimate of the effect of structural uncertainty.

Addressing your example where the same model is run 50 times and called "A" and then run 2 times and called "B", computing the variance by combining the two models wouldn't result in a great deal of bias in the computed variance which will tend to be the same as if we pool all 52 and simply compute over the full 52. My understanding is the difficulty is merely that we will get a less precise value estimate. And while the variance computed this way will be unbiased, the standard deviation will tend to have a low bias arising form the small sample size of 2 runs. So, weighting by model would, in this instance, be a suboptimal use of the data, but not truly horrible. Meanwhile if they had been different models, weighting by runs could result in model A's variance swamping the analysis.

I elected to do by model because
(a) I wanted to look at individual models anyway to see how their means and variances looked relative to observations,
(b) back when the AR4 was published, the multi-model mean highlighted in graphs and tables was obtained by first computing model means and then averaging over model means. So, my graphs mimic that methodology and
c) computing a pooled variance from individual models gives a cleaner estimate of typical internal variability stripped of the variance that springs from the structural uncertainty and (c) cannot be done by computing the variance in trends over runs without first separating into models. As such, the variance weighted by model is a better model based guide to variability arising from uncertainty in initial conditions. (Assuming models get 'weather right, of course.)

I did by the way agree with the comment in your manuscript that the ensemble is not really a random sampling of putative 'model space' (while replicate runs in an individual model may be.)

eduardo said...


Fyfe, Gillet and Zwiers construct an empirical distribution of the difference of trends ( model minus observations ) based on bootstrapping (see their supplementary information). They do take into account the different number of realizations but their scheme implies a much smoother weighting than just weighting by the number of realizations

lucia said...

I looked at the supplemental in fyfe and saw they don't weight by realizations. My discussion is addressing hvw concern. I pointed out that I get the same thing weighting by model rather than realization.

I didn't plow through the details in fyfe enough to know know what happens in the case where the distribution of model runs about the model mean is normal and the distribution of model mean is normal and how that compares to what I did. I just skimmed. That gives me the gist but I often have to sit down and think through limits to fully understand how methods relate.

I think hvw's concern when criticizing weighting by runs might be your paper where things are weighted by run/realization.

But as I noted: I get more or less the same results with different weightings and using a different method. So as a practical matter, I don't think the choice is making much difference. And also, I'm not claiming one weighting is necessarily better than the other especially given that we really are not able to pull models randomly from a set of 'all possible models'.

eduardo said...


the number of realizations enetsr in the estimation of the empirical dustribution of trends under the null hypothesis

in the supp. info:
the deviation in the j-th trend for model i that is induced by internal variability. Since
the model i ensemble is generally small, the deviations are smaller than would be
representative of an infinitely large replication of runs for model i, and so to
compensate for that loss of variance, multiply the difference M ij − M i. by
[ Ni /( N i − 1 )]1 / 2 .

So it is not a direct model weighting, but the number of realizations is taken into account indirectly

JamesG said...

The stratosphere has not been cooling since 1995 so no need to find an explanation for that at all! Stratospheric cooling in fact was the official IPCC "fingerprint" for AGW and in any other field the hypothesis would have been rejected rather than the wrong-footed "experts" being allowed to suggest a slew of contradictory and unphysical excuses for a "warming masked by cooling". Whither Occams razor?

Explaining the brief, minor and beneficial heating period of the 20th century is less useful than explaining historical cooling periods. What caused the recovery from the ice ages when CO2 was at it's maximum levels? What caused the little ice age? What caused the drop after the 1940's, what causes the current plateau? As it happens the only plausible theories available for all of these are amplified solar forcing. CO2 cannot explain cooling at all. And since what causes the cooling likely also causes the heating then CO2 is not required to explain the 20th century.

Solar forcing was the dominant consensus theory for centuries. It is also still perfectly valid for both the Arctic and the US48 temperature datasets; the only ones with little likely influence from urban heat islands.

And if I was to explain that the current plateau should really have started in the 60's when sunspots levelled out but aerosol reduction from the clean air act caused a temporary cooling then you'd rightly say that I just made that up. Yet that is the current, accepted reasoning for the inability of the CO2 hypothesis to explain the post 40's drop in temperature. This juxtaposition demonstrates the facile logic that is allowed only if you are a catastrophist.

hvw said...

It is undeniable fact that the cumulative comment count of this blog remains stagnant since nine days now!

This is extremely unlikely according to to our ensemble of of blog comment simulations in which only 2 in a hundred show a similar behavior. The paper was rejected by Nature, but this incidence still points not only at a possible sudden death of Klimazwiebel but puts in question hitherto undoubted results, on which our models are based, about the character of cumulative anything.