Untangling Global Temperatures

There are systematic problems in defining global “temperatures”. This post looks more into this problem by comparing uncorrected GHCN weather station data with CRUTEM using different methodologies. This is an ongoing study.


Comparisons of uncorrected GHCN V3 with CRUTEM4 and uncorrected GHCN V1 with a contemporary CRUTEM (Jones 88) .

I have calculated new temperature anomaly data from all stations in GHCN V1 and GHCN V3 raw data. I have applied the linear offset correction for each  stations and for each month, to avoid sampling bias as discussed in the previous post as originally proposed by Tamino. This avoids both biases from differing seasonal responses and differing altitudes of individual stations within a single grid cell. The basic problem facing all analyses of global temperature data are because we only have an evolving set of stations and measurements with time.  For this reason temperature anomalies for each station are traditionally  used rather than raw temperatures.  An anomaly is the difference between the measured temperature for one month and a ‘normal’ monthly value which is usually pre-calculated based on a fixed 30 year period. However if we calculate such anomalies at the station level, then we must discard all stations with zero or few values within the fixed period. I wanted to avoid this and use ALL available data. GHCN V3 contains  7280 stations whereas CRUTEM4 has 5549 stations which include 628 new ones from the Arctic.

To make progress, I assume that there exists a true average temperature for each month in a grid cell – T_r . We can estimate T_r by subtracting the net offsets of those stations present each month. Then we calculate normals based on the monthly T_r values and use these to derive grid anomalies.  This  new method also avoids discarding stations which do not fall within the  ‘normal 30 year window’ by correcting their offsets for cells in which they do appear. I also want to avoid linear interpolation of station data to months and years when they do not have data.

Each station is characterised by a monthly offset \mu_i  from the grid (regional) average. This remains constant in time because, for example,  it is due to its altitude. We  first calculate all the monthly average temperatures T_{av} . Then for each station we derive the offset for each month from the particular grid cell average in which it appears. This is our first estimate for \mu_i . There are 12 such offsets – one for each month in the year.

\mu_i =\frac{1}{N_t}\sum_{time}{T_i -T_{av}}

You can then iterate again using the new offsets to derive a new set of offsets. In reality the second iteration changes the end result only a very small amount.  Then to estimate the true grid average temperature for a given month T_r we get

T_r = \frac{1}{N_{grid}} \sum_{s}{T_{av} - \mu_s}

So we average the seasonal temperatures per month in each grid cell across the full time range to get ‘normal’ temperatures. You can also select a fixed 30 year range for the normals but it makes little difference. We then calculate  one anomaly per month and per grid cell. These anomalies are first area averaged and then annually averaged to get yearly global temperature anomalies. The results are shown below for V3 compared to CRUTEM4 and V1 compared to a contemporary version of CRUTEM (Jones 1988)


There is fairly close agreement except before 1920 where V3 is warmer and post 1998 where V3 is cooler. Now look at V1 compared to a 1988 version of CRUTEM made by Phil Jones in 1988.


In general there is good agreement between CRUTEM and V1 except before 1900. In both cases the net warming since the late 19th century is about 0.2C less than that observed by CRUTEM.

Now I can already hear objections from Nick Stokes to my methodology because he will likely argue that you should average station ‘anomalies’ in each cell and not first ‘temperatures’. Another objection could be that my normals are over a longer time period instead of the standard 1961-1990. Let’s look at these in turn.

In principal I could first derive anomalies for each station by using the offsets and fixing the monthly Tm at  one particular year such as 1975. The station anomaly would then be   (Tmes + offset) – Tm. I can  look into this if I find time,  but I don’t believe it can make much difference.

My main objective was to use all (raw) station data independent of time span to answer one question. The problem with a fixed 30-year normalisation period is that a significant number of stations have no data to define such  ‘normals’.  One can imagine using the offsets to define station normals based on the grid average temperature, but this generates fake data. There is another philosophical reason why this may be wrong.

Suppose that for some reason only winters have been gradually warmed. By defining seasonal normals within  a recent fixed time span you risk skewing winter months by reducing natural seasonal variation.

My conclusion thus far is that there are small but significant differences in temperature trends depending on the definition of temperature anomalies and on data correction/homogenization.  The overall warming trend varies  by about 0.2C depending on how it is defined.


About Clive Best

PhD High Energy Physics Worked at CERN, Rutherford Lab, JET, JRC, OSVision
This entry was posted in AGW, Climate Change, climate science, Science and tagged , . Bookmark the permalink.

15 Responses to Untangling Global Temperatures

  1. Very impressive work, thank you.

  2. Hi Clive,

    Creating an analysis of global temperature can be tricky, and methodological choices can introduce bugs that can mess up your results, especially if you are comparing them to series produced by other groups (e.g. your GHCN v3 raw vs. CRUTEM4 graphs).

    I’ve spent a lot of time and effort over the last 5 years doing these comparisons. A good first test is this: does your method (using GHCN v3 adjusted) produce effectively the same result as NCDC’s record? If not, there may be some methodological issues in your approach. I can say with confidence that my code does quite a good job of replicating the results of the major groups using the same input data: http://rankexploits.com/musings/2010/replication/

    I’ve also published some academic papers using my code with folks at NCDC: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/papers/hausfather-etal2013.pdf

    Unfortunately, it looks like the results you are getting are a consequence of flaws in your approach, not underlying aspects of the data. For example, here is what a comparison of GHCN v3 raw and CRUTEM4 should look like (using a common anomaly method with 5×5 lat/lon gridding and a land mask):

    Similarly, here are GHCN v3 raw, GHCN v3 adjusted, and CRUTEM4:

    I’ve looked at GHCN v1 vs. GHCN v3 raw in the past, back when E.M. Smith made a big deal out of it. I even wrote a post on the subject at WUWT here: http://wattsupwiththat.com/2012/06/22/comparing-ghcn-v1-and-v3/

    Unsurprisingly, GHCN v1 and GHCN v3 are effectively identical during the period of overlap. The exception is prior to 1880, where GHCN v1 had very few station records and much more were added in GHCN v2 and v3:

    • Clive Best says:

      Thanks Zeke,

      I think you are right that methodological choices affect the result. I am sure that if I use exactly the same methodology as CRU/GISS and normalise each station to a 30 year period then I should get essentially your result. However does that make the standard methodology correct or does it just show that it is applied consistently?

      Your V3 raw comparison to CRUTEM4 and V3 corrected shows the same trend as mine above in the early period being consistently warmer . So the only real difference is a slightly warmer trend after 1970. This is probably due to taking a long seasonal average so that I can include all stations.

      So yes there are only minor differences between V1 V3. There are systematic differences due to particular methodologies uses. What is the best methodology?

      Since these trends are used to estimate climate sensitivity small differences can become important. CRUTEM4 increased recent trends compared to CRUTEM3 by changing the zonal distribution of stations.

    • E.M.Smith says:

      Do also read all the comments in that WUWT posting where the critique is rebutted. Also this looks at the First Differences complaint and finds it wanting

      • Clive Best says:

        I read most of the thread on WUWT. It is exactly the same argument about methodologies. The situation is as follows.

        The orthodox method is to define station temperature anomalies relative to a 30 year period 1961-1990. These are gridded on a 5×5 degree base and averaged weighted by cos(lat) to form a global monthly and annual average. Since many stations overlap in GHCN and CRU (because GHCN imported all CRU stations) they give comparable results.
        Zeke has essentially shown this is the case except for the early period.

        However this method causes stations with poor coverage in the reference period to be dropped. Efforts to avoid that use interpolation or values derived from near neighbours in the same grid. This generates fake data which may or may not affect results.

        Is the orthodox method correct? Is it the only way to eliminate sampling biases? Does the use of a 30 year period change the result in some way? Is there group thinking happening ? It seems to me perfectly reasonable to question this, especially as this has become such a political hot potato.

        If by trying another approach we get a slightly different result, it does not mean our result is wrong since there is no ‘correct result’. It just means that the methodology is different unless it can be shown to introduce artificial trends. Perhaps the use of a 30 year normal based on a warming climate itself introduces artificial trends!

    • Nick Stokes says:

      “A good first test is this: does your method (using GHCN v3 adjusted) produce effectively the same result as NCDC’s record? “

      TempLS does 🙂 Even using unadjusted. I did a check here. It’s still tracking very closely.

  3. Nick Stokes says:

    The key argument for using anomalies is this. When you take the mean of numbers (a sample) to represent a continuum average, then it is as if you did a proper integral with all unmeasured points assigned a value equal to the sample mean. That puts a heavy burden on good sampling. And with temperature, we don’t have much control.

    However, if you subtract your best prior estimate from the sample points, then sampling matters much less. The expected value for your sample anomalies is zero, and so it is for the mean. So where there is an implied substitution of a mean value in averaging, that won’t create a bias.

    So there isn’t any arbitrary period, 30 years or whatever, that is right. The criterion is that you subtract your best estimate, to avoid residual bias, before averaging anything. A long term (month) normal is a reasonable estimate, but not necessarily best. A grid cell mean is a much worse estimate of station values.

    I’m writing a post about this at the moment. Should be out in about 24 hrs.

    Here’s a reason why you have to worry about a fixed period. The basic model that you are fitting is
    T = L + G
    where T is actual temp, L is station offset and G is global difference (function of time only). There is a rank deficiency, because you could arbitrarily add something everywhere to L and subtract it from G. If you want L to be a mean temp, then G should have mean zero.

    But you can’t make G have mean zero over a variety of station ranges. It’s independent of station. And if you don’t, then for each station, your expected mean will be the mean of L+G. So offset L is not the mean.

    But you can make G have mean over 1961-90, and then L is the station mean from 1961-90.

    • clivebest says:

      I understand all this.
      I am just trying out different ‘best estimates’.

      In order to check everything I decided to follow the orthodox route and calculate the 12 monthly normals for the period 1961-1990. Then as a second step I calculate the station monthly anomalies for V3 ( both uncorrected and corrected) and average these in each grid cell. I discard stations which do not have at least 10 measurements within the 30 year period.

      1. I get very good agreement of V3C with CRUTEM4 but with a just slightly cooler trend after 2000 – more like CRUTEM3.

      2. I also get reasonable agreement with V3 uncorrected but also with a slightly warmer past and even more slightly cooler post 2000.

      3. NCDC Land shows a slightly stronger warming trend than even CRUTEM4. This is similar to BEST results. This increase must be due to their time/spatial interpolation.

      4. The anomalies method described above gives almost exactly the same results as the orthodox normals but again the cooler trend increases.

      I will write all this up fairly soon.

  4. Ron Graf says:

    Nick, I noticed on your comment on Paul Homewood’s blog post last July here that you say the TOBS (Time of Observation Adjustments) were done at station’s convenience in nonchalant fashion and that is why there is no graph blip for them. I find this very sloppy practice. This data should have been treated as important forensic evidence for science. Even as early as the 1960s we were concerned about the record, which was the purpose. I suppose, of TOBS. With the TOBS changes accounting for .25C for the USA recorded increase since the 1930s when the difference in CAGW and slight warming is fought over far smaller statistical differences in trend; when the IPCC models validity is hanging by slimmer margins; when there is a Paris summit that could result in commitments on a course costing trillions of dollars, I ask who couldn’t bother to choose a particular date to mandate a change of protocol for the station chiefs?

    Also, with all the significance of the data and it’s “corrections,” shouldn’t a NOAA web page be devoted to access to all the data, corrections and explanations? The data going missing smacks of JFK’s missing brain? I’m glad we found it before we had to put Brad Meltzer on it. What are your thoughts?

    • Nick Stokes says:

      “were done at station’s convenience in nonchalant fashion and that is why there is no graph blip for them”
      Individual stations could change their observations times, by applying to NWS. NWS kept records of permissions given. No-one was “nonchalant”. The reason why changes don’t cause blips is that they weren’t synchronous. This plot shows how statistically they changed over the years:

      The dashed line shows a further check. Observers didn’t write the time of each observation – they had an agreed time – but they did write the temperature at that time, and knowing the diurnal cycle, you can tell if they are, on average, sticking to their agreement. It’s not bad.

      • Ron Graf says:

        Nick, thanks for your reply. I believe what you say is true. But don’t you agree, in hindsight, that it was an important enough issue to have been given a warning period and then firm implementation on, say, the start of a new year? Handling as they did likely made it seem less important to the stations to comply rigorously.

        • Nick Stokes says:

          Well, hindsight isn’t useful. But I don’t see how those strictures would have helped. There’s actually no difficulty about making a proper TOBS adjustment, as long as times were recorded.

        • Ron Graf says:

          Nick, you are not the problem, but part of the solution to data decoding, organizing and disseminating. I am just saying our well-paid officials could be a bit more organized and sensitive to outsiders possibly wanting access. Leaving things a mess could be concealment or incompetence. I’m sure all national tax agencies have little sympathy for those who keep their records in such fashion that the taxpayer is but the one qualified gatekeeper to decode their records.

          True, hindsight is not useful if there is no lesson. But are you saying there has been improvement? I see M Mann’s defenders claim the methods used in MBH98 were known to the researchers. There was no secret among the important people (see gatekeepers) therefore there was no foul.

          As Don Monfort wrote recently, “this is not … a backwater science, like entomology. … on why green eyed gnats prefer diddling on Thursdays.” The entire public and science community needs ( and paying for) transparency and competence. We may not agree on the value of hindsight but I hope we can agree transparency needs to improve.

Leave a Reply