In recent posts I have been combining GHCN V3C station data with Hadley HADSST3 Ocean data. The station data are located at specific locations, while HADSST3 data are recorded sporadically within a regular 5×5 degree grid. In addition active station locations and active SST cells vary from month to month. This is particularly true for early years in the temperature series.

I have been using IDL to make a Delauney triangulation in (lat,lon) of all active measurements globally for each month, and then investigate different interpolations of this onto regular grids. However, following a proposal by Nick Stokes, I realised that this is unnecessary because the triangulation itself can produce an almost perfect global average. I believe this is probably the best spatial integration possible because it uses all available data and gives a direct result. There is no need for any interpolation or grid cell averaging. The last two posts showed how such interpolation can introduce biases. This is avoided using triangulation based spatial averaging. Here is the algorithm I used.

- Each triangle contains one measurement at each vertex. Use Heron’s formula to calculate the area of the triangle.
- Calculate the centroid position and assign this an anomaly equal to the average of all 3 vertex values. This centroid value is then used as the average anomaly for the whole triangular area.
- Use a spatial weighting for each triangle in the integration equal to cos(lat)*area. Where Lat is the latitude of the centroid.
- Calculate the global average =

Using this method every possible measurement is used and no extrapolation outside triangular boundaries is needed. Each month is normalised to the surface area covered only by the active triangles. This also means that for areas of dense station data like the US and Europe regional temperature variation can be studied at full resolution. There is no need to average nearby temperature measurements. I think Triangulation is a near perfect perfect solution also to regional and global temperature studies. It is something I will look at next.

To see how all this works in practice we first look at the triangulation results for January 1880 with rather sparse measurements.

In comparison here is January 2015 with dense data in the US and Europe. I am plotting each individual triangle, showing just how dense measurements are in the US. For regional studies it is best to suppress triangular borders and show just the colour coding.

For all anomalies I use the same normalisation period as Hadcrut4 1961-1990. To compare the results to other series I renormalise these also to 1961-1990. The offsets used for each series are shown in the plots. A detailed comparison of recent years is then shown below. The new result neatly lies between those temperature series which use infilling and Hadcrut4 which doesn’t.

A comparison of the monthly values shows a remarkably close agreement with Berkeley Earth.

In conclusion a new method calculating spatial averages of randomly dispersed temperature data has been developed based on triangulation. The original idea for this method is thanks to Nick Stokes. The method has potential benefits also for detailed regional studies.

Annual temperature series are available here

Monthly temperature series are available here

IDL code to generate all results is available here

Clive,

Yes, I think it’s a good scheme, and you have it working well. Regions are a problem in any scheme; with triangles restricted to joining nodes within the region, there can be a lot of fringe that isn’t covered. A related downside to the method is that it doesn’t fit well with land/sea masks that people use to try to allow for an abrupt change between sea and land. So you have triangles that are part sea and part land. I think that balances out.

My preferred way of dealing with regions is to do a global mesh, get the area based weights, and then assign all weights of nodes not in the region to zero, and then make the weighted average. It takes some account of that fringe, but still isn’t ideal.

I think that spherical harmonics fitting is another good scheme, about as good as triangulation. But also no good for regions.

Thanks Nick,

You’re right about regions. I will think about it, but maybe the simplest solution is to drop all the ocean data for large countries/continents.

Could you repeat it in a north pole equal area azimuthal projection, the arctic is the most problematic area.

You’re right. The arctic is the most problematic area if we forget the poor coverage in Africa. However, this method does not interpolate into the arctic at all – so there are no values at the North pole. You can see the coverage in this plot

Basically kriging extends the coverage into unmeasured areas based on the nearest values. This new calculation is based only on triangular areas which have measurement points at each vertex.

“Basically kriging extends the coverage into unmeasured areas based on the nearest values.”

No. That is a (poor) decision on the part of the analyst irrespective of methodology. Because one can do something does not mean that he or she should do it.

Another constraint that is specific to kriging is the effective range of the semi-variogram, correlation function or whatever. Judgement and craft trump algorithm.

Finally, from a qualitative point of view methodology and judgement are undertaken consistent with Tobler’s Law. Abandon that and you undercut the basis of the entire scheme.

Clive,

I had assumed that you were using the /SPHERE flag with TRIANGULATE, but I see now that you aren’t, although you did with griddata. I think you should. Then IDL with give a proper Delaunay mesh for a sphere, and it will mesh the whole Earth (and will do the S Pole station right). What is happening now is that you triangulate lat/lon, and it is Delaunay there, but not when mapped on to the sphere. You can see that toward the poles the triangles get very long and thin along the longitudes. That is not good for interpolation. Some of that would have happened anyway because of the SST grid, but it shouldn’t happen over land, which is more critical.

I did use /SPHERE initially for the interpolation with TRIGRID, but IDL then had problems. So I read the Coyote blog (an IDL expert) who agreed there was a problem with TRIGRID and recomended to use GRIDDATA with /SPHERE instead, which he says essentially does the same thing.

However, now I don’t do any interpolation at all and just calculate the area weighted average. I left TRIGRID as it was. So -yes I should try spherical gridding to see if that makes any difference.

I am now trying out the regional studies where it will be less important. I think this could be very nice. Here is the winter of 1942 which explains why Germany lost the war.

I am already getting into trouble with my wife for spending too much time on this !! We are supposed to be on holiday.

I just checked the documentation and it is more complicated than that. TRIANGULATE and TRIGRID work together so it is not as simple as specifying /SPHERE. It is a keyword SPHERE=f

Clive,

I presume you were looking at this post by Coyote. He’s looking at the interaction with gridding but you don’t need that (or TRIGRID) any more. I think the syntax you need (from here) is:

TRIANGULATE, X, Y, Triangles [, CONNECTIVITY=variable] , SPHERE=variable , /DEGREES , FVALUE=variable

The FVALUE is your T data, by node (it gets reordered). I don’t think you need connectivity or adjacency.

Then all you need are the triangles, as before, and the 3D coords that are in the SPHERE structure. I can’t see how to reference those, but it must be possible (they don’t seem to want you to look, but I assume you can). It’s much better working with 3D cartesian, because there is nothing special about poles. Instead of using Herrons formula, you can just make for each triangle the 3×3 matrix of coords and take its determinant. That is actually 2x the volume of the tetrahedron formed by the triangle and origin, but since the height is very nearly 1, it is a very good approx to 3x the area. Then you can proceed as before. No need for cos/lat weighting.

They have changed the order, so FVALUE and 3D are consistent, but different to the original lat/lon ordering.

I am on to it. Give me a couple of days !

I more or less have it working now.There are very occasionally weird results returned from spherical triangulation which result in undefined values – indexes to points which do not exist (I suspect). However in general it is now working and giving good results. I am still having some difficulty to visualise the triangulation results in 3-d , but nearly there.

Clive,

After criticising the thin triangles from your flat mesh, I thought I should look at my own spherical meshing, which has a different version of the same problem. Even with a Delaunay method on a sphere, the squeezing of nodes on a latitude near the poles makes for the same sort of stretched mesh. So I tried some more sophisticated culling, which I described here. I can afford more culling using the 2×2 grid of ERSST, but it may be a good idea even for HADSST3. Though I’m not sure if the clumping of nodes really did any harm, except for wasting a bit of computer time.

Nick,

I see you have already done much of the work on this already – looks brilliant!

Of course the ocean data have already been regularly gridded in 2 or 5 degree bins. All we really have are those missing grid points each month, so in early years 80% are missing while in latter years something like 10% are missing. (Spherical) gridding just allows us to cover those gaps, and these are especially large at the poles. 5 degree matches better the land based

I am not quite there yet ( especially visualisation) but am now getting very good agreement with all other methods..

It’s a beautiful idea !

I have some spherical results, They are not so different to non-spherical gridding, but a little warmer post 2010. However I am still not convinced that TRIANGULATE has done the spherical gridding properly. It is very confusing to interpret the output.

I need to use the output structure Sphere=s from Triangulate, but this turns out to be no easy task. IDL provides no documentation whatsoever. They clearly don’t want you to use it directly but simply pass it directly through to TRIGRID.

S is a structure with the following pattern for 1880 (much larger in later years)

XYZ DOUBLE Array[853, 3]

IEND LONG Array[853]

IADJ LONG Array[5118]

XYZ are the cartesian coordinates on a unit sphere. However the axes bear no relation to (Lat,Lon) Latitude seems to be a linear combination of X&Y (0.5x+0.5y?) while Lon also spans the z-axis.

IEND is a pointer to the last triangle for each point in xyz. The triangles are defined (I assume) in IADJ. No.triangles : 5118/3 = 1706 triangles as stored as triplet pointers into XYZ.

I thought it would be simple. However when I plot the grid as triangles I get strange results. Every time I think I have solved the riddle – I get a new surprise. I have written to Harris to see if they can help.

I won’t post anything until I am sure of the results.

Clive,

I think IADJ is an adjacency list of nodes. It will start with node 1, list the 6 or so connected to it, then node 2 etc. IEND will contain the pointers to where each nodes lists start. My guess is that IEND will say something like 1,7,13…..

You could reconstruct the mesh from this, but I think it should still be in the structure

trianglesthat was an argument and is returned just as for flat mesh. That is much simpler. For this list it doesn’t matter whether the nodes are in 2D or 3D. There should be 2*853 = 5118/3 triangles, and a total of 5118 integers liisted also intriangles.Nick,

Thanks for the tip. That sounds perfectly reasonable! Tomorrow I will look at linking IEND to IADJ as you suggest. Unfortunately TRIANGULATE does not return the

trianglevariable when run in spherical mode, so I can’t simply use the 2-D surface grid.Clive,

I see they give some more clues in the documentation of TRIANGULATE:

”CONNECTIVITY

Set this keyword to a named variable in which the adjacency list for each of the N nodes (xy point) is returned. The list has the following form:

Each element i, 0 ?i < N, contains the starting index of the connectivity list for node i within the list array. To obtain the adjacency list for node i, extract the list elements from LIST[i] to LIST[i+1]-1.

The adjacency list is ordered in the counter-clockwise direction. The first item on the list of boundary nodes is the subscript of the node itself. For interior nodes, the list contains the subscripts of the adjacent nodes in counter-clockwise order."

That is for flat, but it’s probably the same convention. The key thing is that they are ordered. That makes it fairly easy to convert back to triangles. They are arranged in what WebGL calls fans, and you can regenerate triangles from iadj by just connecting the outer nodes in order, going through fan by fan. So if iend[1]=1, Iiend[2]=7, then the first fan is made from the riangles (1,i[1],i[2]) (using i for iend) up to (1,i[6],i[1]) and so on.

The only trouble is, that would make every triangle twice. You can avoid that by having a vector over nodes which marks when they have been used in a triangle. Reject triangles with all 3 nodes already marked. That should give unique triangles.

The other way is to calc the areas with the determinant I’ve been suggesting. Half will have positive areas, half negative. Reject the negatives. That’s probably better, because you need the areas anyway.

Pretty bad that they make you do that, though.

Thanks for advice – I have got it working properly now. I was fearing that each triangle would be drawn 3 times – once for each node, but it seems that Iend list is curtailed to avoid this.

I ended up not using the determinant method because of sign problems, but position vectors and Heron’s method.

The results are almost the same as the flat (lat,lon) triangulation, but it was worth doing. I am off to the Barrier reef for 5 days but will write it up when I get back.

Why spatial weighting of cos(lat)*area? Cos(lat) is used as weighting with rectangular grid cells of fixed size in degrees, to allow for the change in area of such cells with latitude. But here you have the triangle area already in the weighting, so why cos(lat) also?

The reason is the same as for rectangular grids because the triangulation is done on a linear (lat,lon) grid. Therefore the calculated area is still in lat,lon and needs to be corrected for curvature.

I wanted to do a spherical triangulation meaning that triangles lie on the surface of a sphere. Then triangles could span the poles but became too complicated. In spherical coordinates you get the same result as (lat,lon) so you would probably have to use cartesian coordinates and in IDL it is not clear how to proceed. I think the net effect would be very small anyway because there are no data north of 75N.

Thanks. Had forgotten you would be still be working with latitude/longitude coordinates in calculating cell area, rather than directly obtaining sq. kilometres.

Working on errors in GHCN-M (and Karl et al 2015) errors in gridding data at the moment, so perhaps too prone to jump to conclusions.

You might like to interpret the following addition to the GHCN-M v3 status.txt file:

********************************************************************************

03/11/2017

User feedback indicated a problem with some mean temperature data for select stations in

Ireland. The problems were traced to a particular data source (MCDW), and for

the time being until that source is corrected, the data are now being sourced

to the UK Met Office “Climat” data (“K” source flag”), which are believed to

be the correct values. The data changeover to the UK Met Office has occurred, but

the source flag (“K”) for the corrected values was inadvertently left out. Those

source flags should be added within the next production cycle.

********************************************************************************

I appear to have got some relatively prompt action on this by suggesting that the Irish Met Office take it up with NOAA. (two months, whereas metadata errors I reported directly in GHCN-M v2 seven years ago remain uncorrected in GHCN-M v3. But the NOAA response is still botched. The data is only partially corrected, and still contains some of the rogue values I pointed out. Double digit winter month mean temperatures may not be extreme enough for automated quality checks, but when manually checking an error report I would have expected for example that a double digit mean temperature nearly 2 degrees warmer than the warmest mean temperature recorded in their previous 60 years for that station, nearly two and a half degrees warmer than the second warmest, and one of a number of occurrences of that exact rogue value which had been inserted in the record, would have stood out like a sore thumb. Not to mention missing values which should not be missing.

There is no indication here that errors in their own MCDW might be more widespread than just “select stations” (i.e. any station for which they had a record in the past five years) in a single country. Hopefully a new NOAA director when appointed will insist on a more diligent approach.