I have located an original version of the Global Historical Climatology Network (GHCN) published around 1990. It contains raw temperature data from 6039 weather stations around the world. Quality control procedures corrected a few impossible values mainly due to typing mistakes, and removed any duplicate data. Otherwise they are the originally recorded temperatures. You are welcome to download the metadata and the temperature data in re-formatted csv files, which I hope are self explanatory. The original ‘readme’ file with credits to authors can be downloaded here. Since 1990 there have been a continuous set of adjustments made to GHCN data for a variety of reasons. These include changes in station location, instruments and especially ‘data homogenisation’. These adjustments have had the net effect of cooling the past (pre-1930). The latest GHCN version is 3 which can be downloaded from NOAA.
So what I did next was to process the GHCN V1 data by first gridding the temperatures geographically in a 5×5 monthly degree grid, similar to CRUTEM4. I then calculated the monthly averages across all stations within one grid cell. The monthly temperature anomalies are then just the differences from these average values. Averaging stations within a grid cell is essentially the same thing as data homogenisation, because it assumes that nearby stations have the same climate. The annual temperature anomalies are the geographically weighted averages of the monthly values. So what did the original V1 data say about past temperatures?
There is clearly a huge difference before about 1930. So let’s compare each hemisphere separately.
For the southern hemisphere I compare GHCN V1 with a contemporary version of CRU dated 1988 (see below).
GHCN V1 was available just before the first IPCC assessment report in 1990. At the time CRU had also collected a smaller set of station data from around the world which mostly were included in GHCN. I also have a copy of this data from around 1988 which we can compare directly with V1. The global average temperature anomalies are shown below.
The agreement after 1900 is very good, but they disagree strongly in the 19th century. Now you can also see why the IPCC first assessment report (FAR) was so cagey about any global warming signal (“yet to emerge”). That was because there wasn’t any signal in the temperature data available at that time!
It is very interesting that the current GHCN data now practically matches Jones’ data which had major problems of original data “getting lost somewhere” when it was asked forto check his results.
Now we can see why.
Clive, many thanks. For us mortals this is an amazing and valuable feat. A number of comments, observations and questions:
1) The data begin in 1850. Any idea why most GHCN records begin in 1880?
2) The warm past, pre-1880, is somewhat problematic, is it not? I’m guessing that this may have something to do with the introduction of screens? That Jones et al may have applied a cooling correction to old data. But then, is the picture of global warming based on corrections to data?
3) Struggling to understand your N Hem chart – there seems to be hardly any GHCN data?
4) Can you give the numerical split of stations between N and S hemispheres?
5) Not sure I understand your normalisation procedure. Have you summed all “januaries” (and so on) across time inside a grid block to get a datum and that datum then deducted from the monthly time-temperature series for that block. If so this is similar to my normalisation procedure.
6) How do you account for the difference between Best and BEST?
This is still very much a work in progress !
1) The GHCN data actually start before 1850. I just started the averaging in 1850 to be compatible with CRU. If you look at the station data you can find earlier data.
2) I am sure there are all sorts of reasons to ‘correct’ older data but then it is also human nature, even among scientists, to argue why the result should move to what you seek.
3) I was using filled histograms just like CRU does. I agree it is confusing as the 2 histograms are blending together I think this may be clearer:
4) The coverage is good. This shows a map of where the stations are located.
5) Yes – that is correct. All temperature data are monthly averages. First you locate which [lat,lon] grid point the stations slots into. Then you fill all the data into monthly grids. You calculate an average temperature for a single grid and a single month. Next you loop over each of the 12 months for a fixed time period and calculate the average value at a particular grid location for jan,feb,mar etc. Finally you subtract the individual monthly temperature values from the mean. This gives the “anomaly”.
6) I haven’t looked at BEST yet. My instinct says that they can’t do any better than anyone else. There early data looks to be somewhat strange.
Clive have you or are you going to make a submission to the GWPF Temperature Dataset Enquiry?
I would be quite happy to do that once I am sure of the results. Is there a deadline?
30th June 2015
Clive, I have managed to access data 🙂 for Alice Springs (though the csv format did not want to go into XL as it normally does). Comparison of V1, V2 and BEST raw. Overall its the same data, but with a couple of interesting departures.
Good. I had no problem reading it into XL. I could convert it to excel and upload that if it helps. Alice Springs seems to be getting slightly cooler if anything !
Some of the countries had ‘,’ included in the name ! This was the problem with the metadata csv file. I have now fixed it.
Thanks Clive, working good now.
That explains why the temperature graphs don’t move smoothly, but jump at the critical points. Excellent analysis.
I see an endless succession of these papers where someone finds some allegedly pristine ancient dataset and shows that it is different from some modern adjusted set. Of course. That is what adjustment does.
But for heavens sake, why not first check with GHCN V3 unadjusted? It’s in that directory you pointed to.
“So what I did next was to process the GHCN V1 data by first gridding the temperatures geographically in a 5×5 monthly degree grid, similar to CRUTEM4. I then calculated the monthly averages across all stations within one grid cell. The monthly temperature anomalies are then just the differences from these average values. Averaging stations within a grid cell is essentially the same thing as data homogenisation, because it assumes that nearby stations have the same climate. “
This is all wrong. CRUTEM average anomalies over the 5×5 grid. You have to calculate the monthly anomalies first – not easy if the data doesn’t cover the anomaly period. Then you can average those. There is no resemblance to homogenisation. Your graph does not show what v1 says about global temperatures. It is just wrongly calculated.
Nick, which file is the V3 unadjusted? Is that QCU?
Clive, I tried opening this before with no luck. If you were able to provide a csv file for V3 unadjusted like you have for V1 that would be very handy for me to have. Amongst other things it should have up to date data. Need to run cross checks then that V3 unadjusted is same as V1. If it is, then clearly V3 would be best to use.
And I think that Nick may be right about generating anomalies before gridding. Though if you have done it “all wrong” its surprising that your data fits Crutem 4 over so much of the time series. I overcame the problem of stations falling outside of base time period by averaging whole station and calculating anomaly from that. I’ve checked several data sets doing it this way and with a fixed base and find no material difference. Fixed base is better but using whole station as base works pretty well in most instances. For convenience I have also been using the metANN data from GISS (DJFMAMJJASON). Where I have been calculating metANN from monthly I’ve been patching missing data from same month in prior year.
Roger has another way of doing anomalies called first difference that I’ve not been able to crack.
Yes, it’s QCU. I posted here a portal which gives access to GHCN unadjusted monthly, annually averaged, and also to the NOAA pages that tell you about it.
Nick, I tried your resource – heroic effort way beyond what I could do. For Cloncurry it gave me a few decade averages – maybe I’m not using it right.
I’m hoping Clive will have a csv web link to V3 unadjusted within a day or two – no pressure Clive 🙂
What I see so far in Australia, is that Cloncurry has strings of numbers the same to 6 decimals for BEST, Best V1 and GHCN V2. So what Clive has downloaded has definitely been used before. But in Cloncurry, Alice and Giles BEST and V2 occasionally depart from V1 suggesting that BEST raw is rooted in V2 adjusted. But too soon to draw any conclusions.
The data it shows are in years, by decade row. The 10 numbers following the year are the annual averages.
I’ve put here a 9 Mb zipfile with the full GHCN V3 unadjusted in csv format. There is a readme file which should explain. The format is the same as V1.
Apologies, the first zip file had a format error in the data. Should be OK now.
Many thanks Nick. Extremely helpful! So far I have just checked Cloncurry, but V1, V3 and BEST raw all give same to 6 decimals – it is the same data. V2 of course is not raw even though GISS refer to it as raw which has confused me much in recent months.
V2 also had an unadjusted version, which basically had the same data. V2 had the extra thing that it kept duplicates – where more than one data set was available, they included everything. Usually duplicates were identical where they overlapped, but not always. V3 settled on a single record.
Nick, part of the thing that has confused me is this passage from the GISS FAQ page that Gavin pointed me at:
UK Press reports in January 2015 erroneously claimed that differences between the raw GHCN v2 station data (archived here) and the current final GISTEMP adjusted data were due to unjustified positive adjustments made in the GISTEMP analysis. Rather, these differences are dominated by the inclusion of appropriate homogeneity corrections for non-climatic discontinuities made in GHCN v3.2 which span a range of negative and positive values depending on the regional analysis. The impact of all the adjustments can be substantial for some stations and regions, but is small in the global means. These changes occurred in 2011 and 2012 and were documented at that time.
To recap, from 2001 to 2011, GISS based its analysis on NOAA/NDCD’s temperature collection GHCN v2, the unadjusted version. That collection contained for many locations several records, and GISS used an automatic procedure to combine them into a single record, provided the various pieces had a big enough overlap to estimate the respective offsets; non-overlapping pieces were combined if it did not create discontinuities. In cases of a documented station move, the appropriate offset was applied. No attempt was made to automatically detect and correct inhomogeneities, assuming that because of their random nature they would have little effect on the global mean.
After October 2011, NCDC added no more data to GHCN v2, so GISS used its replacement GHCN v3.1 as the base data. One of its differences from GHCN v2 is that multiple records are replaced by a single record, obtained by using for each month the report from the highest ranked source without applying any offsets when switching from one source to another. The resulting discontinuities are handled by NCDC when creating the adjusted version. Since the multiple records used by the GISS procedure no longer were available, GISS switched to using the adjusted instead of the unadjusted version of GHCN v3.1.
He says raw but the link points at adjusted. Its a good thing that the raw data is still all available if you know how to access it.
That last comment of mine needs to be corrected. The V2 source has three data options – 1) raw, 2) raw merged and 3) adjusted. If I understand recent email for Gavin correctly GISS Temp used to use the raw but merged data.
What I don’t fully understand is why they do not now use the V3 raw.
“What I don’t fully understand is why they do not now use the V3 raw.”
Euan, Gisyemp has been around a long time. For many years they did their own homogenising, for lack of alternative. When v2 developed an adjusted version, they had the option of using it, but no pressing need. But when V3 came out, something had to be done (treatment of duplicates had changed). Either modify their own, or use GHCN adjustment. The Menne/Williams (GHCN) algorithm was well regarded, probably the best available. Why not?
Thanks for your comments.
I am well aware of exactly how CRUTEM processing works. I have their station data and have their code running which exactly reproduces the annual averaged anomaly data. Indeed the only station data used are the sub-set with pre-calculated anomalies between 1961 to 1990. However, the result of this is that they discard all station data which do NOT have a continuous temperature data record between 1961-1990. This means their station count is about half of the GCHN station count.
1. The choice of which period to use in order to calculate normals (1961- 1990) is arbitrary. The final result must not depend on this arbitrary choice except for a small offset. The trend must remain unchanged otherwise there would be an in built bias based on that choice. So if they were to chose say 1941-1970, the selected stations will be different but the curves must remain the same shape to remain valid. In 1988 Phil Jones clearly must have used a different normalisation time period.
2 Berkeley Earth claims to process 39,000 stations with their ‘novel’ algorithms, so clearly they are not normalising monthly temperatures between 1961-1990 either. This whole process started because Euan and Roger Andrews found that several of the BEST processed station data seemed to be very different from those of GHCN V2. So somehow their novel algorithms had changed the underlying data.
3. I had been led to believe that only the first version of GCHN contained the raw instrument measurements. Strangely enough NOAA have on their FTP server GCHN V1 data for the pressure and rainfall BUT the temperaure data has been removed. You are Australian and you know well that station data have been adjusted, perhaps for very good reasons. But the resultant warming trend looks suspicious to climate sceptics. Eventually I found an original GHCN1 version derived from the 9-track magnetic tapes on which it was originally stored.
4. Like BEST, I wanted to use all the station data, and I also wanted to use the raw values. So I have decided on my own arbitrary normalisation. This is different to CRU but it is not ‘wrong’. You can’t normalise ALL individual stations apriori to a fixed overlapping time period. So I decide instead to grid all the monthly temperature data. Each grid point for any given month will contain a varying number of contributing station data as stations come and go, or have periods of no data logging.
5. I calculate the average temperature for every month and for every grid point. I then calculate the 12 monthly (averaged) normals for each grid point over the entire time period (1850 – 1988) and then in a second pass I subtract these for each month to calculate ‘anomalies’. So now I have monthly anomaly grids from which I make a global, NH and SH average, and then make yearly averages. These are what are shown above. They are almost the same as CRUTEM after about 1900.
6. The only assumption is that within a single grid point the seasonal changes are the same for all stations. If there was perfect geographic coverage you wouldn’t need to calculate ‘anomalies’ – Instead you could calculate the global average temperature directly.
7. Where you are right is that if the raw temperature measurements are available in V3 then I we should definitely use them instead. However we can now at least check whether this is really true.
Clive, that is a fairly robust response. I plan a short post on this, maybe tomorrow.
What Roger (and I) have found is that a S Hemisphere average based on V2 (slightly adjusted) has substantially less warming than GISS and BEST. Hadcrut4 is closer but still warmer.
The things we would like to track in BEST is the station weighting process via comparison to regional expectation which from memory is geared *2 to *1/13 = 26. Essentially stations that don’t tell the truth are weighted out of the system. Iteration guarantees truth.
And one thing I’m interested in is how regional averages may use N hemisphere data to shape S hemisphere data. Is N hemisphere warming imported to the S?
When you say here that Crut do not use stations without full cover in the base period I understand why they do it, but it will significantly reduce the number of old records they deploy. My recent post on N Scandinavia had lots of old records that stop.
If you want any of my spread sheets just let me know.
The issue of how to proceed with stations that do not have data in the reference period is ancient. In 1986, Phil Jones explained the issues very clearly, and also gave a good account of the need for homogenisation. He used the reference period 1951-1970. There are well known methods like RSM (reference station method, Jones again), first difference method, etc. I am surprised at your observation that CRUTEM simply abandons stations that don’t have data in the reference period.
If you don’t have a fixed anomaly base, then part of the climate signal can be transferred to the base averages that you are subtracting. There are other ways of avoiding this, and in TempLS I use the fitting of a least squares model. BEST later adopted a similar approach.
“I had been led to believe that only the first version of GCHN contained the raw instrument measurements.”
Well, it’s not so. Actually, the only GHCN that contains original measurements as GHCN Daily, which is kept current. GHCN Monthly, even in 1990, is averaged over a month, and that is not a trivial step. It is where TOBS is an issue. But there should be no substantial difference between the numbers in GHCN V1 and GHCN V3 Unadjusted, except that V3 has more of them, and may have sorted out some local record identification issues.
“You are Australian and you know well that station data have been adjusted, perhaps for very good reasons.”
Data is not a singular. I don’t know why people find it hard to understand this simple proposition. Yes, people adjust their copies of the data for good reasons (to do with calculating continuum averages), but the data is not adjusted. It is unchanged in the Met office records, and is readily available in GHCN Daily and GHCN V3 monthly unadjusted.
“Eventually I found an original GHCN1 version derived from the 9-track magnetic tapes on which it was originally stored.”
Actually, v1 was issued on CD. There was no intent that the numbers would be variable.
“This is different to CRU but it is not ‘wrong’.”
It is wrong, and I explain why here. Certainly it won’t yield comparable results. The big problem with averaging anything before taking anomalies is that the population may change (missing values). Then the result depends on what kind (warm/cold) of stations went missing.
I read the Jones paper. It says: “For a station to be used in our analysis at least 15 years of data are required between 1951-1970. In some parts of the world , however, there were valuable long records that ended in 1950 or 1960. Clearlt, it was desirable to retain these records…Fortunately, in most cases, reference period means could be estimated using data from nearby stations with accuracy better than 0.2C”
This is basically the same as what I am doing, except that they limit the normalisation period to 30 years. I can also do that and see the result.
Thanks for making the .csv file for V3 – it saves me doing it!
You are assuming that there is a wide difference in hot/cold stations within one grid cell, or in other words micro-climates. If that is the case than the whole process is somewhat flawed. For example if the whole of Southern England is represented by two stations – one in Hyde Park and the other in central Birmingham then trends in anomalies will not be representative.
Does BEST use a fixed time base for anomalies ? How do they interpolate a colonial era station from 1780 onto the period 1961-1990 ? Linear interpolation !
“For a station to be used in our analysis at least 15 years of data are required between 1951-1970.”
Yes, that is the Common Anomaly Method, which I see that CRUTEM does use. It is a limitation.
“Does BEST use a fixed time base for anomalies ? How do they interpolate a colonial era station from 1780 onto the period 1961-1990 ?”
No, they don’t and neither do I. There is no need for interpolation, and no base period. The linear model handles the time shift.
“You are assuming that there is a wide difference in hot/cold stations within one grid cell, or in other words micro-climates. If that is the case than the whole process is somewhat flawed.”
Grid cells are big. The one that covers most of England goes from Lands End to Sunderland. The Scot one goes from Sunderland to the Shetlands. And that’s in a not very mountainous or continental country.
The point is that temperature is heterogeneous, but anomalies are fairly homogeneous.
I don’t know why CRUTEM has stations without shown anomalies. But they won’t include them in an average. How could they?
Here is a shaded plot of GHCN station anomalies for a month. The shading is such that each station has a color corresponding to its actual anomaly. Noting the fine gradations in temperature scale, you can see that anomalies are far more continuous than temperature.
I forgot to add that some of the station data provided by CRU does not contain pre-calculated anomalies. These are rejected by their processing. I have a world map interface where you can click on stations and get the plotted results. https://clivebest.com/world/Map-data.html
You can find stations without anomalies.
The provided fortran code makes it clear the original data was on 9-track tape.
Nick, I went back and recalculated all using dT station average, dT 65-74 base and dT 63-92 base. There is no material difference between any of these and in particular no difference between station average and 63-92. Where a station did not cover the base period I used station average instead – a quick fix, but given the small number of stations affected will make little / no difference.
An importnat point for me is that the tops and bottoms on this chart are flat. The data have a small positive gradient because higher temps are weighted to the front end.
Some parts of the world do show significant warming. Central Australia IMO is not one of them
”Predicting” weather / climate was the oldest profession – prostitution was the second oldest. (from prostitution you get less rip-off, and at least you get something for your money) Regarding CO2, there are two versions #1: CO2 makes dimming defect – was used in the 70’s; that: because of CO2 dimming effect we’ll get ice age by year 2000. #2: the contemporary misleading effect is: because CO2 prevents heat to be ”radiated” to out in space, we’ll get global warming…?!(that version was used few times for the last 150years that was THE GRANDMOTHER OF ALL LIES!
Using the temp for some place – to tell the temp on the WHOLE planet, is a sick pro’s joke!!!
THE TRUTH: heat created on the ground AND in the water is neutralized by the ”new cold vacuum” that penetrates into the troposphere every 10 minute. From 2-10km altitude all the heat is neutralized. The thinner the air up -> the more of that ”cold vacuum” penetrates in and out and neutralizes any extra heat. If no extra heat, that ”cold vacuum” just zooms out underutilized, or not utilized at all. Only occasionally super-heated gases from volcanoes and nuclear bombs explosions go above 10km up to 12km-18km, and gases of million degrees heat is neutralized, BUT: for the rest of the year, all that cold vacuum that zooms trough, is unused. Because the planet orbits around the sun into that ”cold vacuum” at 108 000kmh -it means that: that ”cold vacuum” cannot get overheated one bit!!! Bottom line: even if there was not one molecule of CO2 in the atmosphere – heat wouldn’t have ”radiated” out in void; all the cooling is done in the troposphere!!! Heat from the ground ”radiates” only few inches AND: horizontal winds collect that heat / then ”vertical winds” disperse it few km up into the thinner troposphere, where is ”neutralized” by the constantly coming in new ”cold vacuum” Heat from CO2 doesn’t radiate for more than a micron, and is directly cooled by the ”cold vacuum” No ”BACK-RADIATION” at all!!! CO2 is NOT a greenhouse gas! Here is the truth, read every sentence and expose the scam: https://globalwarmingdenier.wordpress.com/2014/07/12/cooling-earth/
Euan Mearns says: ”have found is that a S Hemisphere average based on V2 (slightly adjusted) has substantially less warming than GISS and BEST. Hadcrut4 is closer but still warmer”
Euan, when the sun is on the S/H is ”closer to the earth” because of the elliptical orbit, BUT: .because of the temperature self adjusting mechanism the earth has – is same ”overall” temp always! For you guys It shows different temp, for two reasons:
1] on the northern hemisphere are more thermometers, than on the S/H
2] southern hemisphere has more water – where is more water – day temp is cooler BUT night temp is warmer = overall is same BUT: because the shonky science uses only the hottest minute in 24h and ignores all the other 1439 minutes = is created for confusion and misleading … nothing worse than grown up person misleading himself… tragic.. tragic…
Let me try to get this straight.
First you must be correct that a changing population of stations gives rise to biases in the average temperature. For example if there is a plateau 1000m high in the middle of the area, then the average temperature will depend on how many stations are included on that plateau. To avoid this we have to use normals calculated at each station and to measure ‘anomalies’ relative to this ‘normal’. So if all stations show the same difference then the climate has warmed in that region. Or if the average of all the anolamies increases over time then the local climate is warming.
There are two ways to calculate normals: 1) monthly 2) annual depending on which time resolution you want. In the first case we have 12 normals and in the second place just one. The next choice is the time period over which you define the normal. The community seems to have chosen 30 years currently 1961-1990. This has to be a compromise since why should that particular period be considered ‘normal’. Instead it is most likely chosen because it has the highest number of stations available.
Why 30 years and not 10years?
Why 30 years and not 60 years?
If you look at a hypothetical climate which is warming at 0.2C/decade. For the annual normalisation it doesn’t really matter how you make the normals the trend will allways emerge.
This process of detecting global warming from station data is fraught with biases. Are you sure that the current methods have the least in-built bias ?
Clive, I too am still a bit uncertain about how you have normalised. I think you say you populate grid cells with temperatures, take an average for the grid cell and anomalies from that average. I think what Nick is saying is that all stations should be converted to anomalies first – station anomalies – and grid cells then populated with these anomaly stacks where they can be averaged.
Nick actualy replied already to this already on his site. I mostly agree with what he says but there are still some oddities which I will discuss later. see: http://moyhu.blogspot.co.uk/2015/03/central-australian-warming.html
Clive, can you do me a favour, can you take a look at station 62103953000, which is Valentia in Ireland, in your V1 copy of GHCN.
I have looked at V3 and the Dataset starts in 1961.
What has happened to all the original Valentia data going back to the 1800s?
Valentia is well known for having a very long, very flat raw temp record.
I am having trouble finding the original data to compare it to.
The V1 data starts in 1869. Here it is :
year is appended to the ID (1 less zero than V3). The 12 monthly temperatures are in 10ths of a degree C.
That is great Thanks
The GHCN V3 records for Valentia also starts in 1869. The data description page is here.
Nick, thanks, I had already realised that.
I started by looking at TMax, that is the dataset that is incomplete.
TMin may also be as well.
TAve is pretty well complete.
A few years ago I did an article on Judith’s Climate Etc. about hadSST3 adjustments:
In it I noted that they had removed the majority of the long term variability for the majority of the record. It was just this pre-1900 cooling that had been severely attenuated by their speculative “corrections”.
It is also worth comparing to Jevrejeva’s sea level analysis which shows that rate of change went from -ve to +ve some time around 1870, not 1960 when CO2 is supposed to have become significant.
this shows that what the adjustment takes out is very similar to 2/3 of the original ICOADS SST data. The only bit they don’t seem to play down is the recent , post 1980, warming.
Zeke has reminded me that he and Steve Mosher posted an analysis of GHCN V1 vs V3 at WUWT.. It’s very thorough. Histograms – even a recon. The recon results are identical for V1 and V3 unadjusted. I did my own look at Iceland here.
Certainly looks convincing at first sight I agree. However they identify some underlying differences as shown in their overlapping raw station differences. There is also an evident step function in V3 stations at 1895 which implies they dropped a lot of early V1 stations for some reason. Why ?
Why also does the station number drop dramatically after 1990? This is true of all datasets CRU, V3 etc. One would imagine there would be an effort to increase coverage not decrease it, especially as this is the period of strongest warming.
The global anomalies trend they show indeed looks the same for all 4 data sets. But then perhaps it should do as they must all be using the same set of stations due to the 1961-1989 normalisation.
For that reason I want to use all the stations and develop your linear model further, In the meantime my original (biased) normalisation shows small differences between V1 and V3 of order 0.1C (see next post).
“Why also does the station number drop dramatically after 1990?”
That’s because of the nature of the project. V1 was a grant-funded, archiving project. It came at the end of a period when vast amounts of hand-written etc data had been digitised by the national met offices. They wanted to collect the result in a central repository.
As archiving, they put in every decent dataset they could find. It wasn’t until about 1997 that NOAA was persuaded to undertake maintenance. Updating monthly is a very different proposition to a one-off inclusion of a record in an archive. Ongoing cooperation from other nations is required. So they rationalised.
In the GHCN inventory, of 7280 stations, there are 1921 from the US. 847 from Canada. 254 from Turkey, and 57 from Brazil. There is no need to maintain 847 stations from Canada.
There are some errors in the csv data file. I’m correcting them now, and will post shortly when I have completed the corrections, and make the corrected file available. The errors seem to be confined (so far) to stations with data earlier than 1800, where the station id and year have not been separated, giving a “new” station id and data for January to November only. The earliest data appears to come from 1701.
I have another v1 data set dating from 1994, but also with data just to 1990. When I’ve completed corrections I’ll compare the two. I suspect that they may be the same data, but with slightly more metadata.
This is identical to the version I downloaded from http://cdiac.ornl.gov/ftp/ndp041/, dated 28 July 1992 (1994 above was my memory at fault)
The corrected data, both as csv and txt (extension .doc, added to enable WordPress upload, may be deleted – these are not Microsoft Word documents):
I’ll post additional metadata later today.
The additional metadata (again extension .doc added which may be deleted):
Station names in this file differ from those in Clive Best’s csv file in that some contain commas, and so are unsuitable for reading as a simple csv file. (Note that one station in Clive Best’s csv file, CENTRO MET.ANTARTICO”VICE, contains a double quote mark which may cause a problem when the csv file is read). The latitude and longitude coordinates for each station are identical in the two versions, as are the start-tears and end-years.
Four additional values are added for each station. The elevation follows the longitude. Two additional values from the original inventory follow the end-year. These areas described in the readme file:
MISSING is the percent of the record with missing data.
DISC is a code which can be used to identify a time series which
contains a “gross” discontinuity (i.e., one which was readily
identified when the time series was plotted and analyzed
visually). If DISC is 1, then the station has a major
discontinuity. If DISC is 0, then the station has no major
discontinuities. However, it could still contain more subtle
Finally, I have added a nightlight luminance for each station, for anyone who may wish to try adjusting the data following Gistemp procedures. These luminance values are taken from the F16_2006 version rather than the deprecated earlier version still used by GISS. (If there is a demand for this, I can generate luminance values using the deprecated version and add these to the file). Generally, with a relatively small number of exceptions, urban/rural classification is the same for both F16_2006 and deprecated versions. Experience with GHCN v3 indicates that it is correction of location coordinates which leads to more frequent classification changes. With GHCN v3 approximately 20% of stations outside the US, Canada and Mexico which are also WMO stations show changed urban/rural classification when the WMO coordinates are substituted for those in the GHCN inventory file. The coordinates used to determine luminance correspond to the latitude and longitude coordinates given in the inventory file, and as these coordinates have not been corrected the luminance values may in some cases correspond to a location sufficiently distant from the station to give a misleading urban/rural classification. 2034058101 KUWAIT INTL AIRP is a good example of erroneous coordinates, located at sea rather than at the airport. I have not corrected any coordinates in the v1 inventory file, and do not at present plan to do so. (I am gathering corrections for the v3 inventory coordinates).
It looks like you did a more thorough job than I did!
I converted the data to CSV files because I know many people use excel for their analysis. I had no problem myself loading the csv into my old MAC version of excel so assumed they were OK.
Then I saw some of the place names!! – so did a regular expression substitution of all the ‘,’ s
I tend to avoid csv with data such as place names, which may contain characters which can throw off csv. Importing into Excel as fixed width fields works fine when the original data, as here, is indeed fixed width.
I spotted the csv failure for the 1700’s and CENTRO MET.ANTARTICO”VICE quite quickly as I have a habit of sanity checking new data where possible by finding the minimum and maximum of columns where appropriate, and this quickly identified problems by showing values in the three year columns which could not be right.