Sampling biases in global temperature anomalies

Nick Stokes points out some fundamental problems with determining trends in surface temperatures. This is due to the changing distribution of stations within a grid cell with time. Consider a 5×5 degree grid cell which contains a 2 level plateau above a flat plain at sea level – as shown below. Temperature falls like -6.5C per 1000m in height so  the real temperatures at different locations will be  as shown. Therefore the correct average surface temperaure for that grid would be something like (3*20+2*14+7)/6 or about 16C. What you actually measure will depend on where your stations are located. Grid1Since the number of stations and their location is constantly changing  with time there is little hope of measuring any underlying trend of average temperature  in that cell. You might even argue that an average surface temperature, in this context, is a meaningless concept.

The mainstream answer to this problem is to use temperature anomalies instead. To do this we must define a monthly ‘normal’ temperature for each station over a 30 year period e.g. 1961-1990. Then in a second step we subtract these ‘normals’ from the measured temperatures to get DT or the ‘anomaly’ for that month. Then we average those values over the grid instead to get the average anomaly for that measurement month compared to 1961-1990. Next we can average over all months and all grid cells to derive the global annual temperature anomaly.  The sampling bias has not really disappeared but has been partly subtracted. There is still an  assumption that all stations react in synchrony to warming (or cooling) uniformly within a cell. This procedure introduces a new problem for those stations which have insufficient data  defined within the selected 30 year period, and this can invalidate some of the most valuable older stations. Are there other ways to approach this problem?

For the GHCN V1 and GHCN V3(uncorrected) datasets I wanted to use all stations so took a naive approach. I simply used monthly normals defined per grid cell rather than per station over the entire period.


Compare annual anomalies calculated per monthly grid cell. After 1900 the agreement of GHCN V3 with CRUTEM4 is good. The original GHCN1 data is shifted warmer than GHCN3 by up to 0.1C. This difference is real.

A novel approach to this problem was proposed first by Tamino, but then refined by RomanM and Nick Stokes. I will hopefully simplify their ideas without too much linear algebra. Corrections are welcome.

Each station is characterised by a fixed offset \mu_i  from the grid average. This remains constant in time because, for example,  it is due to its altitude. We can estimate \mu_i  by first calculating all the monthly average temperatures T_{av} for the particular grid cell in which it appears. Then by definition for any of the monthly averages

T_{av} = \frac{1}{N_{stations}} \sum{T_i - \mu_i}

so now in a second step, by averaging over all the ‘offsets’ for a given station we can estimate \mu_i .

\mu_i =\frac{1}{N_t}\sum_{time}{T_i -T_{av}}

So having found the set of all station ‘offsets’ in the database we can calculate temperature anomalies using all available stations in any month. I still think the anomalies  have to be normalised to some standard year, but at least the bias due to a changing set of stations will be reduced, especially in  the important early years.

P.S. I will try this out when time permits.


About Clive Best

PhD High Energy Physics Worked at CERN, Rutherford Lab, JET, JRC, OSVision
This entry was posted in AGW, Climate Change, climate science, Science and tagged . Bookmark the permalink.

13 Responses to Sampling biases in global temperature anomalies

  1. Euan Mearns says:

    Clive, your V3 uncorrected, is that the data Nick supplied, or have you downloaded from GHCN? And how many stations in the V1 data and in the V3 data?

    The cyclic nature pre-1980 looks very similar to the compilations of V2 records I’ve been making.

    Its the pre-1900 part that’s really interesting.

  2. Clive Best says:

    I downloaded V3 uncorrected direct from GHCN. There are 6039 stations in V1 and 7280 stations in V3. I haven’t looked in detail at the differences but the general impression is that the past has cooled a little with respect to V1. Both have been processed exactly the same way. Nick Stokes argues that my calculation of anomalies obscures partly the trend when there are few stations.

    Everything is a compromise. It is just a question of finding the least intrusive compromise.

  3. Nick Stokes says:

    My preference is the linear model approach. But I think a simple and practical alternative for anomaly is to fit a regression line to each station, and use the fitted value of a designated year as the normal. This has the stability merits of averaging, and avoids the trend drift problem. It would be better to weight the regression to favor a central block of years. But there is no rigid requirement for data in a period.

    I’ve described the approach here, with application here.

  4. Hero Volkers says:

    I would suggest the following calculation method which is based on the assumption that the offset µ of a station with respect to the grid average is constant and which forgoes the need for computing the value of µ.

    The calculation for each grid cell is as follows:
    1. Calculate for each station in the cell the monthly increments of the observations. This eliminates µ.
    2. Calculate for each month the average of the available increments.
    3. Add up the averages starting from the chosen standard year

    The differences between the available increments in step 2 are food for statisticians; they could give insight into the influence of the changing distribution of stations within a grid cell with time.

    • Clive Best says:

      Nice. So the average of the increments for that month is the ‘anomaly’ compared to the same month one year ago?

      • Nick Stokes says:

        Clive, you could say that. The key thing is that you are averaging something that is reasonably homogeneous, so sampling issues are reduced. Hu McCulloch at CA was an enthusiast. There is an interesting discussion there, where strong points pro and con were made. Eric Steig joined in. FDM, though, has not carried the day, and I think it has stability issues.

      • volkerskh says:

        No, my original intention was the increase between consecutive months, but the method is equally well applicable to any other time frame.
        The link provided by Nick Stokes to the paper of Thomas C. Peterson et al was very helpful.
        They described the same method under the name of FDM (first difference method).
        They compared this method, applied to the GHCN, with 2 other methods (CAM and RSM), and came to the conclusion that the results did not differ significantly.
        I would prefer, however, FDM because it eliminates the need for the calculation of µ.
        (I have used FDM for the integration of partially overlapping time series of ground water levels).

  5. A C Osborn says:

    Everyone knows that Micro Climates exist and that some of them are not so “micro”.
    The current homogenisation methods totally destroy any input that those micro climates should have to the overall temperature by saying that thay are “wrong” because they do not fit anything up to 1200km around them.
    They are even given Biases to fit in with their surrounding even though their temperatures are clearly real.
    BESTS use of Regional Expectations is even worse, it is like a self satisfying prophecy.

    It is the one thing that I can’t get my head around, that Scientists believe that changing the past is the correct thing to do. When Humans actually experienced and documented those historic periods as being factual, the 2 most obvious are the USA years of the Dust Bowl and the Australian Heat Waves of the late 1800s, during which period the Birds and Bats fell out of the sky and trees dying of Heat Stroke, animals and human also suffered the same fate.
    Today those Temperature records are ignored or re-written as being “Instrumental Error”, what utter bullshit and Hubris to re-write history in such a fashion.

    • Clive Best says:

      I think the only way round this is to find a method that uses only ‘uncorrected’ data. There will allways be uncertainty about global temperatures before ~ 1900 because there were so few sttations. SST values must be even more uncertain as they are based on temperatures of buckets of sea-water.

      It is almost certain that there has been surface warming of about 0.6C warming from 1980 to 2000 followed by a pause since then.

  6. Frank says:

    Clive said: “I think the only way round this is to find a method that uses only ‘uncorrected’ data.”

    I’d like to disagree – at least with respect to TOB corrections, corrections associated with new instrumentation, and possibly with respect to corrections for documented station moves. With TOB, we have a clear flaw in how the data were collected (changing TOB) and a validated method for correcting it. (The measured error in the validated correction method needs to be added to the other uncertainties.)

    What I find most objectionable is the assumption that all undocumented breakpoint should be corrected. As far as I can tell, there are far too many breakpoints being identified for the above causes to be responsible. It is possible that stations “observing conditions” gradually deteriorate with time: dirt decreases the albedo of the screen, ventilation gets clogged, etc. When station maintenance restores original observing conditions, a breakpoint is created and correcting that breakpoint INTRODUCES bias. Basically, every time an undocumented breakpoint is corrected, an untestable hypothesis has been made that correction of that breakpoint improves the record. No one knows whether this is true or not. So after correcting documented problems in the data, I think a range of results should be reported – with and without corrections. The best answer should properly convey the amount of warming with a fair assessment of the uncertainty in that amount.

    Unfortunately, when you go back before 1900, the modest number of thermometers weren’t always placed in a well-ventilated location continuously shielded from direct sunlight. Some countries apparently started using effective, well-ventilated screens out in the open (today’s technology) as early at 1850 and others closer to 1900. If you want to draw any conclusions from temperatures before 1900, you may want to look into this problem.

  7. Clive Best says:


    I am sure you are right that there are systematic problems with the early temperature data. Some of these may be addressed objectively such as time of observation. However, these effects should be small when averaged over a month.

    It is the automated and/or selected homgenisation of data that is suspect. This introduces a suspicion that there is a latent trend bias built in. If we expect global warming to have occured since 1850, but then we see larger average temperatures than expected before 1900, we go looking for a reason.

    Currently I am looking into the linear station offset bias discussed by Nick Stokes. The software is a bit of a nightmare but so far I see a reduction in early temperautures but not enough to match GISS/CRUTEM. After about 1900 all methods agree with each other. It is the early period which is suspect.

  8. Frank says:

    Clive: Many changes in TOB occurred in the US in the second half of the 20th century because observers were asked to record precipitation in the morning to minimize evaporation and temperature was recorded at the same time. Validated correction methods from 24 continuous reading data add about 0.2 degC to the overall US change. I gather that there isn’t as much metadata for TOB from the rest of the world or systematic study of how to correct for it.

    I’ve been told that homogenization algorithms add about 0.2 degC to the change for the rest of the world with about 1 correct per decade in the average record. Many corrections warm past temperate, but more cool it. To my knowledge, no one understands why more of these corrections with undocumented causes cool the past. I have been suggesting that the albedo of station screens and ventilation decreases with time and that these problems are corrected every decade or so. In that case, the station would have a slowing increasing warm bias that is removed every so ofter, restoring original “observing” conditions. The breakpoints caused by “maintenance” that restores original observing conditions shouldn’t be corrected.

Leave a Reply