How accurate is the pair-wise homogenisation algorithm applied to GHCN land temperature measurements and the instrumentation corrections made to sea surface temperatures?
Long term station measurements are affected by a station’s physical relocation, by environmental changes (urban development) and by instrumentation changes. Station relocations are usually recorded in metadata but not in a consistent way. Therefore an automated algorithm called the pairwise homogenisation has been developed which compares nearby stations to then identify “break-points” for a given station relative to its neighbours. A statistically significant and persistent violation of relative homogeneity is presumed to be artificial. The GHCN data is updated daily and the full pairwise algorithm is then also run daily.
Sea Surface temperatures have been measured since 1850 using different methods from bucket temperature, engine inlet temperatures, through to buoys and satellite data. Methods to correct instrumentation changes have been developed. The latest HadSST4 data incorporating satellite corrections to recent buoy data.
The overall result of both these updates has been to increase the apparent recent warming. This can be seen by comparing the uncorrected global temperature data with the corrected data each calculated in exactly the same way by spherical triangulation.
We see that the net effect is to increase the apparent warming since ~2000 by about 0.15C. How sure are we that these automated algorithmic corrections are correct? A recent paper has looked in detail at the effects of the pairwise algorithm on GHCN-V4 and the results are surprising. They downloaded all daily versions of GHCN-V4 over a period of 10 years providing a consistency check over time of the corrections as applied. They studied European stations and found that an average of 100 different pairwise corrections were applied during that time while only 3% of these actually corresponded to documented metadata events e.g. station relocations.
This implies that the algorithm is far too sensitive. You can see below how consistent these adjustments were by seeing how many times each was repeated. This results in a consistency rate of just 16%. The rest are most likely wrong.
Just 19% of the adjustments made in V4 correspond to documented events in the associated metadata. There could be station moves or instrumentation changes that are not documented but if so then we would expect consistency after some particular date. This is not observed and most changes occur very inconsistently or intermittently.
Another consideration is that a comparison of the temperature of one station with its near neighbours should occasionally identify those reading too hot and reduce the recorded temperature accordingly. Yet the trend always seems to be towards a warmer trend than that in the raw measurement data.
Here is one example. Click on image to see animation.