How accurate is the pair-wise homogenisation algorithm applied to GHCN land temperature measurements and the instrumentation corrections made to sea surface temperatures?
Long term station measurements are affected by a station’s physical relocation, by environmental changes (urban development) and by instrumentation changes. Station relocations are usually recorded in metadata but not in a consistent way. Therefore an automated algorithm called the pairwise homogenisation has been developed which compares nearby stations to then identify “break-points” for a given station relative to its neighbours. A statistically significant and persistent violation of relative homogeneity is presumed to be artificial. The GHCN data is updated daily and the full pairwise algorithm is then also run daily.
Sea Surface temperatures have been measured since 1850 using different methods from bucket temperature, engine inlet temperatures, through to buoys and satellite data. Methods to correct instrumentation changes have been developed. The latest HadSST4 data incorporating satellite corrections to recent buoy data.
The overall result of both these updates has been to increase the apparent recent warming. This can be seen by comparing the uncorrected global temperature data with the corrected data each calculated in exactly the same way by spherical triangulation.
We see that the net effect is to increase the apparent warming since ~2000 by about 0.15C. How sure are we that these automated algorithmic corrections are correct? A recent paper has looked in detail at the effects of the pairwise algorithm on GHCN-V4 and the results are surprising. They downloaded all daily versions of GHCN-V4 over a period of 10 years providing a consistency check over time of the corrections as applied. They studied European stations and found that an average of 100 different pairwise corrections were applied during that time while only 3% of these actually corresponded to documented metadata events e.g. station relocations.
This implies that the algorithm is far too sensitive. You can see below how consistent these adjustments were by seeing how many times each was repeated. This results in a consistency rate of just 16%. The rest are most likely wrong.
Just 19% of the adjustments made in V4 correspond to documented events in the associated metadata. There could be station moves or instrumentation changes that are not documented but if so then we would expect consistency after some particular date. This is not observed and most changes occur very inconsistently or intermittently.
Another consideration is that a comparison of the temperature of one station with its near neighbours should occasionally identify those reading too hot and reduce the recorded temperature accordingly. Yet the trend always seems to be towards a warmer trend than that in the raw measurement data.
Here is one example. Click on image to see animation.
Hello Clive ,
ref “Yet the trend always seems to be towards a warmer trend than that in the raw measurement data.”
Sorry but does this imply that the warming may not be a great as once expected?
Thanks
Paul
Surely you’ve noticed that with every new release of HadCRUT5, GIStemp, Berkeley Earth, GHCN etc. the agreement with the hosepipe of normalised model “projections” always seems to get better.
If I am honest I would say that freezing winter nights and heavy snow fall has diminished since I was a small boy in London. However warm sunny days in summer have not increased at all unfortunately.
I would agree, I’ve noticed opportunities to ice climb in the Lake District have declined since the late 80s. However, I’ve not noticed any difference to Summer weather.
Thank you for another very interesting article.
Clive,
Well, congratulations on being the first sceptic to attempt the obvious calculation of what difference adjustment makes to the global result. But I don’t think it is fair to say that the whole difference between HADSST4 and ERSST V4 is adjustment. They are just different indices. I have calculated differences due to GHCN adjustments only, and they are much smaller.
” Yet the trend always seems to be towards a warmer trend than that in the raw measurement data.”
Often lazily said, but you can actually do calculations, and it just isn’t so. Here is a histogram of the trend differences between unadjusted and adjusted in GHCN V4. I restricted to records longer than 60 years, and the trend is for years where data for both are available.
There is a bias toward trend increase, but it is nowhere near as universal (or large) as commonly asserted without doing the arithmetic.
Nick,
I am not really a climate sceptic. I am probably more of a climate cynic 😉
I am not quite sure what your histogram is actually plotting, however I usually trust your analysis. The net result though is a small net increase in warming trend
The change from HadSST3 to HadSST4 also definitely increased the ocean warming trend.
Have a look at the V4 analysis- https://www.mdpi.com/2073-4433/13/2/285
This clearly shows there are also a lot of spurious adjustments.
Clive
Clive,
I subtracted unadjusted from adjusted temperatures for GHCN V4 stations which had more than 60 years of readings, and calculated the trend of the differences (so trends aren’t always over the same periods). Then I plotted a histogram of the trend differences. They are scattered about a mean of 0.3 C/Century, so there is a bias, but it just isn’t true, as often said, that adjustment always increases the trend.
The bias mainly reflects correction for the changes to enclosures, and the use of Stevenson screens, which eliminated spurious radiant heating. That needs to be done. Also GHCN V4 has a lot of US stations where TOBS was an issue.
Incidentally I was in Brisbane for the “rain bomb” ~2 weeks weeks ago. Spectacular amount of rain – 70cm! The airport taxiways were flooded so our plane had to be towed to the runway before the engines could be started.
Yes, it can certainly rain in Brisbane. Melbourne had very little. For six months to late January I was mostly in England, with, I must say, very little inclement weather.
Very interesting. A greater public, unschooled in geospatial / homogenization algorithms, might benefit to know that your post points to inconsistencies that are important to resolve soon. To augment your notion, one can also simply directly compare any surface location’s reports of temperature to satellite reports of surface temperature at the same spot. Satellite reanalyses such as ERA-1 have impressive continuity of coverage, so questions of homogenization seem moot there.
I for one, routinely apply ERA-1 and other resources to spot check surface temperature records and other data and visit their trends as well. There are examples, such as an observation that Amundsen surface station temperatures don’t appear to align to expectations based on hydrostatic equilibrium. It’s a high and dry region, so one expects temperature and pressure to trend more or less the same. But over recent years at Amundsen, it seems that temperature trends one way while measured pressure trends another. Guess which way the temperature trends? Guess what satellite resource matches the Amundsen station’s pressure trend?
I hope all surface instrumental data and trends thereof, which influence climate policies, will not only be more appropriately homogenized, but also will be benchmarked where possible against the ERA-I. It seems unacceptable to let the surface observation tail wag the dog without any transparent quality assurance practices in place.
An assumption of most homogenisation methods is that step changes relative to neighbors are persistent, giving “rectangular” corrections. My experience is that raw land temperature data is full of transient perturbations. Here is an example of Tmax data from Boulia in Queensland. Using an estimate of regional average temperature variations (similar to Berkeley Earth results) it can be seen that the raw data has perturbations, but its overall temperature variation is a good match to the regional average. Not so the adjusted results from GHCNMv3 and ACORN-SAT, both of which have turned transients perturbations into invalid persistent corrections.
Interesting, but I can’t quite work out the differences myself.
Here’s a graph of some data I looked at:

Japanese stations with inhomogeneities versus other nearby stations. Has step-like behaviour.
Yes they are steps, but are they “rectangular”? Tokyo looks like it also had anomalous warming (UHI?), Oshima looks rectangular, the other 2 look more complex.
Yeah perhaps there’s UHI influence in Tokyo. It could also be related to local climate – I don’t know much about Japan, but towards the north trends are expected to depend on altitude.
Hachijo-Jima is >100 miles off the coast so I wasn’t surprised that everything else wiggles compared with it!
The apparent spread later on could also be because I baselined everything earlier before taking the difference. If I rebaselined everything things might look different but the jumps would still be at the red dashes.
I remember when it was all allegedly “microsite influences” so you shouldn’t just trust the metadata, how things change!
Are these results that surprising? Doesn’t seem to change anything we know about the rate of real-world warming. How does it change uncertainties in any of the grid-cell values or trends reported elsewhere? I don’t see that calculated.
But what do we know about the rate of real-world warming? We know that raw data (answer A) needs corrections, we know that all the global temperature products are in quite close agreement (answer B), but that is not surprising because they all use the same data, and very similar methods. What is the actual answer C? Climatology seems to have an answer (B) that it likes, and has little interest in examining its errors and inconsistencies, which it claims are insignificant because everyone agrees with answer B, and nobody is allowed to point out the absurdity of that argument.
Well, there are out-of-sample tests. Berkeley Earth does use muchly the same data, but different methods (point B of course).
Other things I weigh into my judgment:
1) USCRN is independent, isn’t it? So compare against that (already done of course, but a while ago)
2) The same groups have done SST corrections and those corrections validate against independent data (e.g. Argo, AVHRR/ATSR).
3) AIRS is independent. Not strictly measuring the same thing and has its own issues, but independence is nice.
I’d be interested to see homogenisation done just with rural stations as O’Neill et al. talk about. I thought it had already been done but can’t actually find a paper so maybe I imagined it.
Oh, and GNSS data are totally independent. The fact they basically agree with CMIP6 era AMIP outputs is quite incredible given the chain of things that have to go right for that to happen!
I weigh that into my thoughts too. Of course, it’s all Bayesian, I’m not saying that means “X is definitely right”, but it would be really interesting to see what would happen to the AMIP simulated troposphere if you tried what seem like outrageous differences to adjustments. Like just using the raw data.
ACORN-SAT version 2.2 is full of “statistical” adjustments, step change inconsistencies detected in the data, with no known reason. The following figure shows a bar chart of the number of such adjustments in each year, covering the full 96 stations for both Tmax and Tmin. Some of the spikes are understandable, such as the ones at the start/end of WW1/2, but it is bizarre to have adjustments for unknown reasons in the 21st century in climate monitoring stations. I suspect that many of these (invalid) adjustments are due to transient perturbations caused by periods of heavy rain.