Kriging biases in global temperature data

I have made a new calculation of global temperatures using 7300 NOAA/NCDC V3c station data combined with HadSST3 ocean temperature. For the ocean data I use cell locations only where measurements exist for a given month.  I then make a (lat,lon) triangulation of all combined station/ocean locations for that month to form a global irregular grid structure. Then I use the IDL irregular griding routine GRIDDATA to interpolate this triangulation onto a regular grid and thereby calculate global monthly and annual anomaly averages normalised to 1961-1990. Anomalies for each V3C station data are independently calculated relative to their monthly averages over the 30 year period. The end result of this procedure is essentially a full global integration of irregularly interspersed measurements for each month. The annual average shown is then simply the 12 month average.

How does this compare to other ‘kriging’ methods which supposedly remove the coverage bias of Hadcrut4?  What I discovered is that the end result depends critically on what grid spacing you interpolate onto. If you chose a fine grid spacing, such as the 1 degree used by Berkeley Earth, then you get an enhanced warming trend over recent years. If however you chose the same grid size as Hadrut4 (5 degrees) then you get a reduced trend. This implies that a systematic error is introduced by the methodology. Here is the comparison.

Comparison of my values (CBEST) for 2 different grid spacings with Berkeley Earth (BEST), Cowtan & Way and HADCRUT4. BEST has been scaled up by 0.02C to compensate its 1951-1970 baseline and uses their June value for the 12 monthly average (as recommended)

The 2 degree results are very similar to Berkeley Earth but give a slightly larger warming trend. However by using the same 5 degree target grid size as Hadcrut4 the result gives a much reduced warming trend. Cowtan and Way use the HADCRUT4 station data rather than V3C and lies somewhere in the middle. Here is a detailed comparison of results for one month – September 2016.

2-degree target grid (CBEST)

The 2 degree resolution extends the expanse of each warm zonal area.

5 degree target grid (CBEST)

The 5 degree resolution is in line with that of HADSST3 and HADCRUT4

Cowtan & Way Version 2. The trend over Antarctica looks significantly different.

This is Cowtan and Way version 2 which reconstructs ocean and land separately and then blends them during the time period shown.

Original Hadcrut4 results without interpolation. White equates to missing data.

Does kriging actually improve the accuracy of global temperatures? While it is probably correct that Hadcrut4 has a ‘coverage bias’ over polar regions, what is even clearer is that interpolation to remedy this can itself introduce a systematic warming bias dependent on method and target grid size. The other temperature series all use data infilling based on ‘kriging’ type techniques.

This entry was posted in AGW, Climate Change, climate science, NOAA, UK Met Office and tagged . Bookmark the permalink.

12 Responses to Kriging biases in global temperature data

1. You cant compare methods using different input data.

Methodology 101.

The proper way to do the test is to select the SAME input data for all methods ( as we did)
And then compare. You failed 101

Then you also have to POST YOUR CODE. otherwise it’s not science.

Next, You cannot simply regrid our data to 5 degrees since the value of a grid depends on the average elevation of the grid. duh

• Clive Best says:

All data are essentially the same between the different groups which overlap 90% with the stations data in NCDC V3. Likewise the ocean data can all be traced back to the same measurements recorded in ICOADS.

Why does the value of station anomalies within a grid depend on their elevation? Temperatures do but anomalies don’t. That is why they are used.

I will post my code soon. I am still trying out different parameters.

• No the data are NOT essentially the same.
hadcrut has something like 4700 stations, today we have 19000 active stations and 40K+ total.
you cant merely assert they are the same you have to PROVE they are.
and they are not the same.

And the ocean products used by the groups also differ

Until you Normalize the input data ( the same data in) You have
NO defensible conclusion.

Go back to 101 for comparing methods

• You also have our baseline wrong.

2. Hans Erren says:

A kriging map is useless without the accompanying standard error map, could you show these as well?

3. Nick Stokes says:

I can’t seem to post – Error “invalid secuity token”

• Nick Stokes says:

The problem seemed to be using the html code for the degree symbol.

4. Nick Stokes says:

Clive,
“What I discovered is that the end result depends critically on what grid spacing you interpolate onto. If you chose a fine grid spacing, such as the 1 degree used by Berkeley Earth, then you get an enhanced warming trend over recent years.”
If you have a triangular mesh, which I think is a very good idea, then your task is to integrate the linear interpolation on that mesh. Interpolating onto a grid is not a very good way of doing it, but if you must, then the finer the mesh the better. The deviation of the 5deg grid just shows the failings of the method.

Exact integration is easy if you have access to the triangle areas, and can link them to nodes. Just form a weighted average with the weight for each node being the total area of triangles that it is a corner of.

Another arithmetically equivalent way is to form the average (of 3) value for each triangle, multiply by the area, and add, then divide by total area.

• Clive Best says:

I am working on a new version which uses spherical triangulation. In other words all the triangles lie on the surface of the earth so angles no longer add up to 180 degrees.

Results look much more consistent and essentially agree with all the other temperature series but with a few subtle differences. I am in Hong Kong right now so don’t have time to write it up, but will do when I get to Australia !

• mwgrant says:

Mosh,

Uing the article at the link as a basis for comparison, how does the BEST protocol measure up? differ? (Just a rhetorical question here. There are–if memory serves me correct–significant differences. So one naturally gets into geostatistical skinning of cats.)

I think that comparison of BEST (also C&W) with the old school* description laid out at the link would have helped back in earlier discussions.
——
More modern approach? Simulation.

It has been a while now for BEST. Any recent thoughts on a better BEST? or sidebar studies?