A new analysis of Global Land Daily temperature data

This post describes my attempt to reproduce global temperatures from scratch. By scratch I mean using all the original  raw temperature measurements from the NCDC daily weather archive without adjustments.

Annual Land Global Temperature Anomalies. Before 1830 there are just a few stations in central Europe forming the ‘global’ average. The spike from 1880 to 1895 is the addition of large numbers of  US stations. After 1950 the agreement with CRUTEM4 is  very good.

The largest accessible archive of raw temperature measurements is the NCDC Daily Archive. It consists of 3 billion measurements from 106,000 weather stations starting in 1763. I have used all this data to calculate global temperature anomalies without any corrections and without  discarding any data except where flagged as duplicates.

The method I use is based on Icosahedral grids which has the advantage of being equal areas on the earth’s surface The connections to each grid point form hexagons like those on a football. I am using a 2562 node grid, for details see: Icosahedral Binning.


Example distribution for December 1980. These are the average anomalies formed from all stations within each grid cell. All cells are of equal area across the earth’s surface.

First I calculate the grid location numbers for all 106,000 stations. Those stations which share the same grid location are assumed to follow the same climate. I can then calculate the normal monthly temperatures for each grid point as being the average over all member stations for that month covering the 30 year period  from 1961-1990, which  is the same normalisation period as that used by HADCRUT4.

The advantage of this normalisation method is that afterwards I can use it as a reference to derive temperature anomalies over the grid cell rather than for all stations individually. This means I can use every recorded station temperature covering any time period because early stations ending before 1960 and newer ones starting after 1990 can still be included due to their contribution to the average temperature in each cell. All 3 billion temperature measurements can therefore be processed. However, unlike all other studies,  I am using no adjustments or any homogenisation. So these results are based on the raw temperatures  as originally recorded, which  are illuminating.

Global Land temperature anomalies calculated from NCDC daily temperature records.

Clearly before 1950 temperatures are much higher than any other index, including Berkeley which also uses data back to 1750. The reasons are as follows.

  1. There are only 2 or 3 stations recording temperatures back to 1750 and these are all in central Europe, however some CET stations are missing before about 1830. The number and area covered gradually grows as corresponding temperature anomalies  reduce until around 1830 when a few US & Australian stations begin to appear.
  2. The spike from 1875 to 1895 is a sudden influx of US stations. This triples the spatial coverage and so dominates the global average. Exactly why the spike appears and then disappears 15 years later is unclear to me. However  pre-industrial temperatures depend critically on any adjustments made to US stations. My results show that the raw data disagrees strongly with CRUTEM4, GISS and NCDC itself. Interestingly though Berkeley sees a hint of  the same trends before 1850.

Berkeley Earth Average Temperatures

Berkeley however use a completely different method, and the data is after adjustments and homogenisation have be applied. After 1950 the agreement with CRUTEM4 is rather good

Detail comparison to CRUTEM4. After 1950 the agreement is good.

Adjustments and homogenisation make only small differences to the result after 1960. However these have always increased slightly net annual warming on land. Note also that the raw data implies higher average temperatures for the early 20th century.


The raw data apparently show much higher temperatures before 1950 than other datasets. Is this due to the normalisation method? Well maybe it is. If you just have one station within a grid cell, as is the case before 1850, that the anomaly relative to the many stations average in 1985 may be biased. However I wanted to use all temperature data even those without coverage in the normalisation period. In general though I believe the raw data show higher mean temperatures than the ‘corrected’ data.

I was surprised to discover just how important the US stations are in setting the pre-industrial temperature baseline, as evident by the large spike in 1880. This is because the US surface area is much larger than northern Europe, the only other location with significant coverage. Consequently USHCN corrections, which have been discussed many times before, are critical to determining how much the earth has warmed since the 19th century.

Finally here is an animation of all the monthly distributions from 1868 onwards. The couple of stripes appearing around 1919 are cells which span the dateline which I later corrected !

Processing this data takes around 30 hours of iMac computer time but takes  far more time writing the algorithm and debugging it !

About Clive Best

PhD High Energy Physics Worked at CERN, Rutherford Lab, JET, JRC, OSVision
This entry was posted in AGW, Climate Change, NOAA and tagged . Bookmark the permalink.

14 Responses to A new analysis of Global Land Daily temperature data

  1. Pingback: A new analysis of Global Land Daily temperature data – Climate Collections

  2. Lance Wallace says:

    I would be interested in the absolute temperatures from this dataset. As you know, the GCMs do not agree on the absolute temperatures, varying from about 12-15 C. Would be interesting to see what your 106,000 stations record as absolute temperatures. Perhaps by year instead of month to avoid the seasonal variation?

  3. Lou Maytrees says:

    Just an observation yet your raw temperature measurements graph shows no ending of an LIA, so temperatures were 2-3*C warmer during the LIA than they are now?

    • Clive Best says:

      The early data is based on a tiny number of stations. I suspect NCDC is missing UK CET stations. Yes I can do absolute temperatures. I am about to get on a plane though !

  4. Mr Broccoli says:

    Fascinating Clive. As David Frost once said – of statistics prove anything, and statistics prove they do- It is very easy to get trapped into the mindset that fits the paradigm. You are remaining open minded and doing a great job of analysis. There would appear to be a great deal more to climate change than just CO2. Hopefully this sort of analysis will give insights into where to look for the other factors that have a bearing on climate.

  5. Recall that much of the variability in temperatures is caused by ENSO, which can essentially be captured by two data points at Darwin and Tahiti. The HadCRUT4 variation prior to 1950 matches the inverted SOI (excepting the intervals around 1938 and 1945). So those huge excursions in the raw data are definitely uncalibrated.

  6. Ron Graf says:

    Clive, on any new reconstruction I always check out the major volcanic events (though I think GCMs exaggerate them by about double to allow aerosols to mask modeled AGW mismatch with observed record). Your chart has a nice drop at 1813, nicely in sync with Tambora (which was 3X Krakatoa 1883). I also notice the BE chart dips before 1813 and is rising after Tambora. (Not best, or even good.)

    The transition of degree of adjustments from pre to post 1950 is stark. I am guessing that all UHI occurred before 1950. Kidding. I know that CRUTEM does not adjust for UHI. So what is explanation?

    Paul, do you have a link for SOI chart?

    • Clive Best says:


      If you look at the original GHCN V1 (1990) you will see much the same result as the one above for the 19th century, with a large Tambora drop in temperatures.

      Berkeley is based on a least squares (kriging) fit to temperatures in space and time which naturally smooths out any discontinuities. I don’t think it can ever detect sharp changes in climate either in space or in time.

      • Ron Graf says:

        Clive, Tambora was 1815 but the earliest GHCN charts is 1850. The BE chart has a significant drop starting ~1801 and is clearly recovering and beginning a warming trend by 1815. Your chart starts cooling later, ~1812, but is at full depth of dip by 1815 then starts recovering about 1817. The Dalton solar minimum is running along this same interval which is confounding any conclusions. But my guess is that volcanic aerosols were too small a forcing to compete with solar and ENSO, even a one in 500-year eruption.

        • Clive Best says:

          Sorry yes of course that’s right. I really mean to say that the older GHCN versions showed much warmer temperatures in the 19th century than they do now That can only be due to adjustments and homogenisation. There are no new historical measurements as far as I know.

          I am in Hong Kong right now so can’t run any new calculations

  7. A C Osborn says:

    Is the correlation after 1950 due to CRUTEM basically using NCDC data?

    • Clive Best says:

      Everyone uses exactly the same core station data after 1950. The only differences between CRU, GISS, Berkeley, NCDC, Cowtan & Way. etc are :

      1) The method of averaging (simple gridding or extrapolation through fitting into regions where there are no stations e.g. the Arctic.
      2) normalisation period
      3) Whether they include all US stations or filter them.
      4) Adjustments and ‘homogenisation’ which assumes that any station which does not give the expected result must be wrong, and is therefore nudged up or down until it does agree..

  8. Clive Best says:

    I am finally back from Australia and my Macbook packed up. As a consequence I was unable to do anything for the last 4 weeks. Meanwhile I have found the cause of the spike in temperatures between 1880 and 1896. One station in France ‘PTE DE LA HAGUE’ uses a different default value for no reading to all the others. Instead of -9999 for TMAX it uses 9990 ! Consequently it contributes 999C to the average ! So the spike is a complete artefact!

    The next job is to calculate offsets for each station against the gird average for TMAX, TMIN, and TAVG. This should mostly eliminate any systematic coverage bias in a single cell for the early years. See:


Leave a Reply