This post describes my attempt to reproduce global temperatures from scratch. By scratch I mean using all the original raw temperature measurements from the NCDC daily weather archive without adjustments.
The largest accessible archive of raw temperature measurements is the NCDC Daily Archive. It consists of 3 billion measurements from 106,000 weather stations starting in 1763. I have used all this data to calculate global temperature anomalies without any corrections and without discarding any data except where flagged as duplicates.
The method I use is based on Icosahedral grids which has the advantage of being equal areas on the earth’s surface The connections to each grid point form hexagons like those on a football. I am using a 2562 node grid, for details see: Icosahedral Binning.
First I calculate the grid location numbers for all 106,000 stations. Those stations which share the same grid location are assumed to follow the same climate. I can then calculate the normal monthly temperatures for each grid point as being the average over all member stations for that month covering the 30 year period from 1961-1990, which is the same normalisation period as that used by HADCRUT4.
The advantage of this normalisation method is that afterwards I can use it as a reference to derive temperature anomalies over the grid cell rather than for all stations individually. This means I can use every recorded station temperature covering any time period because early stations ending before 1960 and newer ones starting after 1990 can still be included due to their contribution to the average temperature in each cell. All 3 billion temperature measurements can therefore be processed. However, unlike all other studies, I am using no adjustments or any homogenisation. So these results are based on the raw temperatures as originally recorded, which are illuminating.
Clearly before 1950 temperatures are much higher than any other index, including Berkeley which also uses data back to 1750. The reasons are as follows.
- There are only 2 or 3 stations recording temperatures back to 1750 and these are all in central Europe, however some CET stations are missing before about 1830. The number and area covered gradually grows as corresponding temperature anomalies reduce until around 1830 when a few US & Australian stations begin to appear.
- The spike from 1875 to 1895 is a sudden influx of US stations. This triples the spatial coverage and so dominates the global average. Exactly why the spike appears and then disappears 15 years later is unclear to me. However pre-industrial temperatures depend critically on any adjustments made to US stations. My results show that the raw data disagrees strongly with CRUTEM4, GISS and NCDC itself. Interestingly though Berkeley sees a hint of the same trends before 1850.
Berkeley however use a completely different method, and the data is after adjustments and homogenisation have be applied. After 1950 the agreement with CRUTEM4 is rather good
Adjustments and homogenisation make only small differences to the result after 1960. However these have always increased slightly net annual warming on land. Note also that the raw data implies higher average temperatures for the early 20th century.
The raw data apparently show much higher temperatures before 1950 than other datasets. Is this due to the normalisation method? Well maybe it is. If you just have one station within a grid cell, as is the case before 1850, that the anomaly relative to the many stations average in 1985 may be biased. However I wanted to use all temperature data even those without coverage in the normalisation period. In general though I believe the raw data show higher mean temperatures than the ‘corrected’ data.
I was surprised to discover just how important the US stations are in setting the pre-industrial temperature baseline, as evident by the large spike in 1880. This is because the US surface area is much larger than northern Europe, the only other location with significant coverage. Consequently USHCN corrections, which have been discussed many times before, are critical to determining how much the earth has warmed since the 19th century.
Finally here is an animation of all the monthly distributions from 1868 onwards. The couple of stripes appearing around 1919 are cells which span the dateline which I later corrected !
Processing this data takes around 30 hours of iMac computer time but takes far more time writing the algorithm and debugging it !