Friday, January 22, 2010

Guest contribution from Reinhard Böhm, ZAMG, Vienna

“Faking versus adjusting” – why it is wise to sometimes hide “original” data by Reinhard Böhm (Vienna)

Although it is one of my personal principles not to read the correspondence of other people, it was kind of inescapable recently when “climategate” overwhelmed us. One of the frequently heard allegations drawn from the illegally published emails was that the original data were intentionally not posed to everybody’s free access in order to conceal the “tricks” applied to the original data to increase the amplitude of anthropogenic warming. Although I must confess that I am very much in favour of the idea of free data access for everybody we must be aware also of the dangers implied in this nice principle. And I want to argue here that some of those “tricks” are purely necessary to make data collections fit for climate analysis – the community I am part of calls these tricks “homogenizing”.

In the field of analysing climate variability, trends, oscillations and other things that we nowadays tend to simplify under the umbrella of “climate change”, we must be aware of the fact that “original climate time series” in no case contain climate information exclusively. In fact there is much random noise in them and (even worse) also systematic breaks or (the worst of all) trends or other things not representing climate but growing cities, trees, technological progress in measuring instruments, data processing, quality control mechanisms and an number of other non climatic factors.

People from universities or other research institutes usually consider climate data coming from weather services to be kind of “official” data of great quality. Working in a weather service I am glad about this and I can approve it. We spend much time and invest much money, manpower and savvy in our quality controls. But the aim is to produce data of internal and spatial physical consistence according to the current state of the respective measuring site. It is these data which are stored in the databanks, exchanged all over the globe, and published in yearbooks. It does not belong to the principal canon of the duties of weather services to have a look at the longterm stability of their data.

Therefore a free and unrestricted data policy in the field of longer climate time series of original data easily and comfortably accessible from institutions like CRU, NOAA, NASA and others opens the doors not only for serious research but also for a (planned or unintentional) misuse under the quality seal of these institutions.

I want to illustrate this with one example. I found it some years ago in the best selling book “State of Fear”. The author’s main intention is to reveal the presumed worldwide conspiracy of alarmistic NGOs to draw as much attention as possible to the case of global warming. One of his arguments was only possible through NASA’s liberal data policy. It was simply necessary for Michael Crichton to quickly download a number of obviously “original” longterm temperature series from some American Cities and some from rural sites, then selecting some urban ones with strong warming trends and some rural ones with weaker or even with cooling trends and the convincing argument “global warming is not real but an artefact of increasing urban heat islands” was ready for use and it was underpinned by “high quality original data of a trustworthy American research institution”.

In real life we can show – but only after investing the additional and painstaking work of homogenizing – that such urban or other biases can be, have to be and in fact are removed in respective high quality datasets. This is no “faking” or “tricking” but the intention to provide a data basis fit for the special application of time series analysis. Being part of a group specialised in the field of homogenization I do not want to bore the readers now with the details of our “tricks”. I only want to mention some basic findings from our experience:

  • No single longterm climate time series is a priori homogeneous (free from non climatic noise)
  • At average each 20 to 30 years a break is produced which significantly modifies the series
  • Many but not all of these single breaks are random if the regional (global) sample is analysed - even regionally or globally averaged series contain biases of the order of the real climate signal
  • There are a number of mathematical procedures which - preferably if combined with metadata information from station history files – are able to detect and remove (or at least reduce) the non climatic information
  • This is much work so it should preferably be done by specialised regional groups close to he metadata – this produces the best results, is more effective and saves the time of research groups wanting to analyse the data

A number of such regional groups are active in the homogenizing business but I must also clearly state that the job is not done yet completely and globally. But we are working on it and already now I can advise everyone to use original data only for controlling the quality or the respective homogenization attempts but not for analysis itself if the goal is a timeframe of 20 years or more – a length usually necessary to gain statistically significance at the given high frequent variability of climate.

At the end I want to illustrate at one single but maybe astonishing example, how strong and how systematic a simple fact - the installation of meteorological instruments in regular weather service sites – has changed during the time of the instrumental period. The two figures display the great variability but also the average systematic trend of the height above ground of the thermometers and the rain gauges of a greater sample of longterm series in central Europe for which we were able to produce the respective metadata-series. There obviously was a change in the measuring philosophy from “preferably remote from surrounding obstacles” (on measuring platforms, towers, rooftops) to “near to the ground”.

A research group using the “original data” would have had no chance to invest the time to go into these details. Such original data would have produced a significant “early instrumental bias” of too cold maximum temperatures, too warm minimum temperatures and too dry precipitation totals. The former being at the order of 0.5°C each and thus reducing the MDR as strong as 1°C in some cases, the latter producing a precipitation deficit near 10%.

I hope my plea for “tricking” is not misunderstood but regarded as what it is – an attempt to see things more differentiated and sophisticated. A completely liberal data policy may seem to be the only acceptable and achievable alternative at first sight. But not each modification of the original data has the intention to “hide the truth” – at the contrary, the overwhelming majority of such attempts want to help to effectively unveil the truth.


27 comments:

@ReinerGrundmann said...

Your explanation of the need for treatment of raw data is well put. However, it does not follow that it must be kept away from the public or sceptical scientists. If people use raw data to make up unsubstantiated claims people like you will have to step up and refute them. This is now much more difficult to do as a result of data secrecy. Skeptical people will not believe anything in terms of explaining a warming trend. Watch this space for the comments to come.

And you should have read the correspondence in order to know what the fight is about. It was not so much raw data but intermediate data and code that was requested through Freedom of Information requests. The inner circle resisted these requests as they thought an opne controversy would damage their reputation. Others thought (and still think) it might damage the prospects of climate change policies.

Compare these risks to the actual damage done. Would we not be all better off had this debate about the hockey stick been faught out publicly, with both sides having access to the same data?

Hans von Storch said...

Reiner,
this contribution must not be read as a contribution to ClimateGate, but as an independent description of what the issue of homogeneity is about. To make us all better known with the methodical background and the issues involved in the science. Your points may be well taken, but I would prefer if we discuss the issues addressed in this post, and not always blend all different threads at the same time.

@ReinerGrundmann said...

Hans, I am sorry but Reinhard Boehm discusses the merits and dangers of freely available data from start to finish of his contribution. In fact, he tells us that he does not want to bore the readers with too much detail and very little detail is given about how exactly homogenization is done.

Stan said...

Michael Crichton was a graduate of Harvard and Harvard Medical College. Simply excelling in medical school wasn't sufficiently challenging, so he wrote bestselling novels in his spare time. I would imagine that Crichton is a lot more intelligent than most climate researchers, and fully capable of dealing with the data in a responsible way. Given the sloppiness and corruption which underlies so many of the 'studies' we have seen from luminaries in the climate science field, that wouldn't be difficult.

But the essential argument here is that the data is too important to let anyone but the approved scientists deal with it. Given the evidence, I have every reason to think that the approved scientists will butcher it. The world would be better without any data at all than to have the data hidden by these people.

Rich said...

That climate data need several levels of analysis because it contains the impact of multiple effects, not all of them climate seems unarguable. But where is the problem in making "before and after" data series available?

Some people might make an ill-informed use of the "before" data, true, but surely the answer is ready to hand? "Here's what we did and why".

The effect of the Climategate emails is to make the rest of us mistrust that last step so I think this posts title is unhelpful.

P Gosselin said...

Perhaps the following write-up can provide some information on the methodology of "homogenisation" used in climate science.
It's fresh off the press!
http://www.americanthinker.com/2010/01/climategate_cru_was_but_the_ti.html

itisi69 said...

It has been showed many times (the Harry_Read_This files for instance) that climatologists are not very skilled in mathematics or statistics. Therefore the raw data need to be analysed by various levels of science, be it skeptic or not doesn't really matter. Steve McIntyre many times stated that he is not a skeptic, he just wants to get the figures right.

TCO said...

You should read E Bright Wilson's ON SCIENTIFIC RESEARCH. It is critical to publish raw data as well as adjustments, since the method of adjustment may have been wrong and future researchers can still get value as long as the raw data is available.

Also, you really SHOULD read the "hide the decline". It was blatant misgraphing by showing as data, what was model. If this were done in a semiconductor physics paper, the PRL editors would have ripped tits off of the authors. Very bad practice to sex a graphy like they did...and makes me worry where else they play games.

Anonymous said...

Science is about publishing the raw data also, so others can reproduce the results. Everything else is unscientific.
It might be a problem to publish the raw data or there might be misuse. But those problems need then be adressed after they arise and after publishing the raw data. There can be no excuse no to hide the raw data. This would be a path to intransparancy
With best regards
günter Heß

eduardo said...

With this comment I am not trying to excuse any of 'hidings', but it should be indeed mentioned that they were not published in peer-reviewed papers, at least not egregiously. As far as I know, the hiding happens only in the IPCC 2001 Report and in the WMO 1999 Statement.

One could argue that the Mann et al reconstruction of 1998 (the hockey stick) only goes up to 1980, but I think it s difficult to prove that the decline was hidden here on purpose. In the two cases mentioned above, it was and it should not have.

Anonymous said...

Your examples can also be countered with more open data. In this case, if the metadata (descriptions) of the data included the known factors, adjustments, site descriptions, etc, then there should be fewer misinterpretations of the data.

TCO said...

1. Agreed that the sexed graph of Jones was not in a paper. Still wrong. Still dishonest. Still shows advocacy eroding honor. Still a lie.

1. I have been following on blogs only, so correct if wrong, but I think JeanS showed that the team (well Mike) did mix real temps data into the graph of proxy data in both his 1998 and 1999 papers (and if you look at the comparison figure that Jean S shows, its very evident that the effect was to hide a ~1940-1980 decline in proxy values. [this isn't even to mention the other games like superposition of the real temp data at the end and coloring it red and all that crap.] See this post:

http://climateaudit.org/2009/11/20/mike%E2%80%99s-nature-trick/#comment-202936

Anonymous said...

Rheinhard. After all that has been revealed in recent months/years I DO NOT BELIEVE YOU.

The credibility of climate science is shot.

Free the data. Free the code.

Hans Erren said...

@eduardo
That makes it even worse, the target audience for the WMO paper were policymakers, the graph was sitting on the cover of the rochure. The brochure would be lying on coffee tables and magazine displays in offices of policymakers The graph shouts: "Look how good the measurements are showing how unprecedented the current warming is". Wheres, as you know, if you omit the the thermometer readings and add error bars the picticture is a completely different one.

Nobody looking at the cover of the brochure would bother to look up the scientific references, which were hidden to a secondary level.

wflamme said...

In 19 Versions and Whadda You Get McIntyre gives some insight into multiple versions of such best intentions - tongue in cheek calling the outcome a station record 'ensemble'.

Now Reinhard Boehm is afraid that offering access to Truth V.20 - the raw station data/metadata - will open the door for "(planned or unintentional) misuse".

But the illustration he gives is not too convincing. AFAIR Crichton's (mis-)use of Truth V.20 never played an important role in the sceptics camp. Just the opposite: It was the presence of Truth V.01 ... V.19 and the frequent absence of this V.20 that raised (and raises) suspiciousness.

eduardo said...

@14
Hans, I am not justifying to mislead the IPCC or WMO readers, at all.

TCO raised the point that, according to him, this could not have happened in the peer-reviewed literature in Physical Review Letters.
I just tried to explain that this didnt happen in the PR literature in climate either, or at least not so clearly that a reviewer could have spotted it easily.


Do you think that policy makers read the IPCC reports, other than the SPM, or the WMO reports? I am not so sure. We would have to ask our sociologists for enlightenment.

TCO said...

I didn't say it couldn't happen in PRL...I said that if it did, the physicists would rip the tits off of the person lying about his temperature versus conductivity plot. I don't see the same type of response here in climate world. And besides, Mike did mix in the temps into the proxies in MBH98 and 99, which are in the literature.

Ask yourself what you would think of someone who decided to cover up a conductivity plunge (or rise) that did not follow some standard model, by taking the standard model and using that to smooth in the endpoints (in effect mixing in model data into the raw).

Günter Heß said...

I would agree with TCO. I think in a physics area a mixed graph improperly explained like the one TCO cited would get a consensus response. Nobody would justify it. Everybody would resent it. Justification of such a dubious graph in a research area raises my suspicion immediately and also of the scientists I am acquainted with.

Hans Erren said...

Why it is wise to never ever hide the original:
On Being a Scientist: Third Edition: 2009. ISBN-10: 0-309-11970-7 ISBN-13: 978-0-309-11970-2. 82 pages
Quote from page 8:
“Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mislead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated.”
http://pielkeclimatesci.wordpress.com/2010/01/18/national-academies-press-book-on-being-a-scientist-third-edition-2009/

Hans Erren said...

On homogenisation:
How do you know that a thermometer placed on the first floor on the shaded north facade of a building (current up to the 20th century), is on average warmer than a thermometer in a stephenson hut on a grassfield?

@ReinerGrundmann said...

Eduardo, you say
"One could argue that the Mann et al reconstruction of 1998 (the hockey stick) only goes up to 1980, but I think it s difficult to prove that the decline was hidden here on purpose. In the two cases mentioned above, it was and it should not have"

What sense would it make for Jones to claim that he
"just completed Mike's Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith's to hide the decline"
--if Mann had not done this in his original paper?

eduardo said...

@ 21
Reiner,

the Mann et al. 1998 reconstruction goes up to 1980. In their paper they dont 'clearly' merge the reconstructions and the instrumental data, (except perhaps to produce the time-smoothed reconstruction at the end point); they are shown in the same plot but with different line patterns.
So the question would be: did they stop at 1980 because they didnt want to show the 'decline' or because they didnt have up-dated proxy data?
I do believe that they knew about the decline. Jones' mail is clear but they are his words, not Mann's. So it is difficult to prove; that is what I meant.

Other than that, I am not the one defending Mann et al. here. On the basis of Mann et al., I think you cannot claim the year 1998 was the warmest in the millennium, since you would be comparing data sets of different nature.

Stan said...

More reasons not to allow anyone to keeps secrets or control the data. http://climateaudit.org/2010/01/23/nasa-hide-this-after-jim-checks-it/

Power corrupts. Absolute power corrupts absolutely.

Let the disinfecting sunshine in. The stench is overwhelming.

eduardo said...

@ 17
TCO wrote '.I said that if it did, the physicists would rip the tits off of the person lying about his temperature versus conductivity plot. I don't see the same type of response here in climate world..Ask yourself what you would think of someone who decided to cover up a conductivity plunge..'

I spoke up about this, so you dont need to ask me :-) If the AGUs and EGUs of the world didnt, they will have their own reasons. My feeling is that they didnt believe this affair was going to swell in the way it has. That even Schellnhuber is talking about IPCC reform would have been unthinkable in November.

Leigh Jackson said...

A thoughtful article exploring questions more difficult than some will allow. What matters most is honesty of intention. To hand over an axe to a woodsman is one thing; to hand it to an axe-murderer is another.

Leigh Jackson said...

Eduardo 22
Your comment echoes the conclusion of
the National Academies report "Surface Temperature Reconstructions for the Last 2000 Years":

"The basic conclusion of Mann et al. (1998, 1999) was that the late 20th century warmth in the Northern Hemisphere was unprecedented during at least the last 1,000 years. This conclusion has subsequently been supported by an array of evidence that includes both additional large-scale surface temperature reconstructions and pronounced changes in a variety of local proxy indicators, such as melting on ice caps and the retreat of glaciers around the world, which in many cases appear to be unprecedented during at least the last 2,000 years... The substantial uncertainties currently present in the quantitative assessment of large-scale surface temperature changes prior to about A.D. 1600 lower our confidence in this conclusion compared to the high level of confidence we place in the Little Ice Age cooling and 20th century warming. Even less confidence can be placed in the original conclusions by Mann et al. (1999) that “the 1990s are likely the warmest decade, and 1998 the warmest year, in at least a millennium” because the uncertainties inherent in temperature reconstructions for individual years and decades are larger than those for longer time periods and because not all of the available proxies record temperature information on such short timescales."

I cannot see the word "hiding" in the report. Peer-review failure by the National Academies?

MikeR said...

Maybe this is the right venue to ask for clarification of the recent claims that 90% of the worlds' thermometers have been dropped from the GISS calculations for recent years, predominantly leaving the ones in warmer areas. It seems to me a remarkable accusation, and it comes from someone who apparently put a lot of time into reproducing the GISS data analysis. See many posts at http://chiefio.wordpress.com/
Has there been a rebuttal, or a response?