Do systematic effects caused by the geographic location of weather stations effect global temperature anomalies ?
I have been studying the station temperature data used in the Hadley Cru global temperature analysis. These data consist of all the land based stations dating back to about 1700 containing monthly averaged temperatures for each year of data. These are the data used to produce CRUTEM3 and HadCruT3. Hadcrut3 includes sea surface temperatures. I have been using and extending the PERL analysis programs kindly provided by Hadley – available here. As a by-product, I also developed a geographic station browser allowing you to view and plot all individual station – for further details see also here. A comparison of the calculated anomalies based on the >3000 stations and the usual HadCRUT3 is shown below.
There are small differences which confirm that the main component of the observed temperature rise is based on the land data. However it is important to understand exactly how these data points are derived.
1. Average monthly temperatures are calculated at each station based on the following procedure: record the minimum and maximum temperature for each day and then calculate the average of the minimum and maximum. Calculate the averages for the month from these daily data. These are then recorded for each station file along with metadata [lat,lon] station name etc.
2. So-called monthly “normals” are calculated for each station by averaging each individual monthly temperature over the years 1961 to 1990. Standard deviations are also calculated. The normals are then assumed to represent a “standard annual variation”.
3. Anomalies are defined for each station by subtracting the monthly values for a particular month from these normal values. Stations without normals for 1961-1990 or where any anomaly is > 5 standard deviations are excluded.
4. The world is divided into a 5×5 degree grid of 1592 points. For each month the grid is populated by averaging the anomalies of any station present within each grid point. Most grid points are actually empty – especially for those early years. Furthermore the distribution of grid points with latitude is highly asymmetric with over 80 percent of all stations outside the tropics.
5. The monthly grid time series is then converted to an annual series by averaging the grid points over each 12 month period. The result is a grid series (36,72,160) ie. 160 years of data.
6. Finally the yearly global temperature anomalies are calculated by taking an area weighted average of all the populated grid points in each year. The formula for this is $Weight = cos( $Lat * PI/ 180 ) where $Lat is the value in degrees of the midle of each grid point. All empty grid points are excluded from this average.
Quality of the data
From 1850 -1860 less than 5% of grid points contain data (figure 3). This rises to 20% by 1940 and peaks at 30% from 1960 to 1990, before falling again to 23% currently. Figure 4 shows the latitude distribution which demonstrates that over 80% of stations lie outside the Tropics at high latitudes ( Europe, US, Russia, Australia etc.). This can also be seen visually in the map displays of the Flash application (Figure 2).
1. There is poor coverage over the main tropical warm zone of the tropics +-25 degrees. The averages are biased to high latitudes with large summer to winter swings. This then also accentuates the temperature differences between the southern and northern hemisphere.
3. The stations are all on land and lack the sea surface temperature measurements. However the annual anomaly results are almost the same as those from Hadcrutem3VG which include sea surface temperature data. Therefore the land based temperature data dominate the temperature trends.
The first excercise I did was to look directly at the temperatures rather than at the anomalies. This also shows how an unbalanced averaging a high latitudes accentuates differences between North and South hemispheres and the annual variations. The monthly temperature data for the full period is shown in Figure 5.
You can see how initially the discrepancy between north and south diminishes as more tropical stations are added. They then separate again after about 1920 before narrowing again recently. Note also how it appears to be the reduction in the minimum temperatures (e.g. for January – North) that seems to be the cause of the anomaly rise.
Could there still be systematic effects due to changes in the distribution and proportions of stations over time ? Next I looked at the averaged temperatures for each of 3 regions A) -20 <lat <20 (Tropics) B) Lat >20 (Northern lats) C) LAT<-20 (Southern Lats). These results are shown below. The year 1863 actually had no measurements inside the tropics – hence the zero value.
There are a couple of observations here. Firstly note how excluding the tropics changes the “global average temperature” (B+C)/2 by only about 1 degree. Secondly, note how most of the temperature rise since 1980 is concentrated in northern latitudes. I am assuming that the “experts” choose to work with anomalies rather than absolute temperatures because of known systematic problems. Anomalies measure the deltas between monthly temperatures relative to a standard(normal) set. In effect we are subtracting two large numbers from each other and averaging the residues. It is assumed that if the Earth is warming overall by say 0.5 degrees , then the global average of all the deltas measured by each individual station will also rise by 0.5 degrees. We are no longer measuring the global temperature as such but rather changes of an evolving distribution of station measurements over several decades. The anomalies can still in principal be prone to systematic effects through over-sampling. A simple example of how this could happen would be say if North America rose by 1 degree while simultaneously the Sahara fell by 1 degree- maintaining no net increase in global temperatures. The averaging algorithm would then produce a net global increase in temperature because the US is over-sampled while the Sahara is under-sampled.
Next I looked at possible effects of the normalisation method used to extract temperature anomalies for each station. The normal procedure followed by HAD-CRU is to calculate monthly averages for each month and each station between 1961 and 1990. These are then used to calculated anomalies for each station, and average them in each grid point. I decided instead to use the actual temperatures at each grid point resulting from the average of these stations. Then in a second step I calculate the monthly normals for each month at each grid point by averaging all the available data. There is no particular reason to take a fixed time period for the normals, since anomalies are just deviations from the norm. First I generated temperature grids from 1850 to 2010. I then used the grid monthly time series to derive normals per month for each grid point. Then we subtract the normal from grid temperatures to derive anomaly grids. Finally the area weighted and annual averages are derived. How does this compare with the standard result ?
As you can see the time trend changes dramatically! The results show clearly that prior to about 1910 the temperature anomalies are highly dependent on the normalisation method. This is not surprising as the geographic coverage is < 14%. After 1920, however this analysis shows that we can have a high confidence level in the published values.
1. The coverage of stations is concentrated at relatively high(low) latitudes. There is far less coverage in tropical regions, which thereby tends to exaggerates seasonal global temperature changes.
2. Sampling problems over in 5×5 grid for early years must lead to systematic problems in early years because there are so many empty grid points.
3. The standard use of temperature anomalies per station over a fixed period 1961-1990 is somewhat arbitrary. When anomalies are instead calculated against a longer reference period and use grid values rather than station values then significant diferences are seen prior to ~ 1920.
4.The conclusion is that the land based data are reliable after ~1920 but that earlier data are subject to systematic errors. The absolute temperature data are also effected when there is significant asymmetries in geographic sampling.