There are systematic problems in defining global “temperatures”. This post looks more into this problem by comparing uncorrected GHCN weather station data with CRUTEM using different methodologies. This is an ongoing study.
I have calculated new temperature anomaly data from all stations in GHCN V1 and GHCN V3 raw data. I have applied the linear offset correction for each stations and for each month, to avoid sampling bias as discussed in the previous post as originally proposed by Tamino. This avoids both biases from differing seasonal responses and differing altitudes of individual stations within a single grid cell. The basic problem facing all analyses of global temperature data are because we only have an evolving set of stations and measurements with time. For this reason temperature anomalies for each station are traditionally used rather than raw temperatures. An anomaly is the difference between the measured temperature for one month and a ‘normal’ monthly value which is usually pre-calculated based on a fixed 30 year period. However if we calculate such anomalies at the station level, then we must discard all stations with zero or few values within the fixed period. I wanted to avoid this and use ALL available data. GHCN V3 contains 7280 stations whereas CRUTEM4 has 5549 stations which include 628 new ones from the Arctic.
To make progress, I assume that there exists a true average temperature for each month in a grid cell – . We can estimate by subtracting the net offsets of those stations present each month. Then we calculate normals based on the monthly values and use these to derive grid anomalies. This new method also avoids discarding stations which do not fall within the ‘normal 30 year window’ by correcting their offsets for cells in which they do appear. I also want to avoid linear interpolation of station data to months and years when they do not have data.
Each station is characterised by a monthly offset from the grid (regional) average. This remains constant in time because, for example, it is due to its altitude. We first calculate all the monthly average temperatures . Then for each station we derive the offset for each month from the particular grid cell average in which it appears. This is our first estimate for . There are 12 such offsets – one for each month in the year.
You can then iterate again using the new offsets to derive a new set of offsets. In reality the second iteration changes the end result only a very small amount. Then to estimate the true grid average temperature for a given month we get
So we average the seasonal temperatures per month in each grid cell across the full time range to get ‘normal’ temperatures. You can also select a fixed 30 year range for the normals but it makes little difference. We then calculate one anomaly per month and per grid cell. These anomalies are first area averaged and then annually averaged to get yearly global temperature anomalies. The results are shown below for V3 compared to CRUTEM4 and V1 compared to a contemporary version of CRUTEM (Jones 1988)
There is fairly close agreement except before 1920 where V3 is warmer and post 1998 where V3 is cooler. Now look at V1 compared to a 1988 version of CRUTEM made by Phil Jones in 1988.
In general there is good agreement between CRUTEM and V1 except before 1900. In both cases the net warming since the late 19th century is about 0.2C less than that observed by CRUTEM.
Now I can already hear objections from Nick Stokes to my methodology because he will likely argue that you should average station ‘anomalies’ in each cell and not first ‘temperatures’. Another objection could be that my normals are over a longer time period instead of the standard 1961-1990. Let’s look at these in turn.
In principal I could first derive anomalies for each station by using the offsets and fixing the monthly Tm at one particular year such as 1975. The station anomaly would then be (Tmes + offset) – Tm. I can look into this if I find time, but I don’t believe it can make much difference.
My main objective was to use all (raw) station data independent of time span to answer one question. The problem with a fixed 30-year normalisation period is that a significant number of stations have no data to define such ‘normals’. One can imagine using the offsets to define station normals based on the grid average temperature, but this generates fake data. There is another philosophical reason why this may be wrong.
Suppose that for some reason only winters have been gradually warmed. By defining seasonal normals within a recent fixed time span you risk skewing winter months by reducing natural seasonal variation.
My conclusion thus far is that there are small but significant differences in temperature trends depending on the definition of temperature anomalies and on data correction/homogenization. The overall warming trend varies by about 0.2C depending on how it is defined.