A Ghost from the Past

I have located an original version of the Global Historical Climatology Network (GHCN) published around 1990. It contains raw temperature data from 6039 weather stations around the world. Quality control procedures corrected a few impossible values mainly due to typing mistakes, and removed any duplicate data. Otherwise they are the originally recorded temperatures. You are welcome to  download the metadata and the temperature data in re-formatted csv files, which I hope are self explanatory. The original ‘readme’ file with credits to authors can be downloaded here.  Since 1990 there have been a continuous set of  adjustments made to GHCN data for a variety of reasons. These include changes in station location, instruments and especially ‘data homogenisation’. These adjustments have had the net effect of cooling the past (pre-1930). The latest GHCN version is 3 which can be downloaded from NOAA.

So what I did next was to process the GHCN V1 data by first gridding the temperatures geographically in a 5×5 monthly degree grid, similar to CRUTEM4. I then calculated the monthly averages across all stations within one grid cell. The monthly temperature anomalies  are then just the differences from these average values. Averaging stations within a grid cell is essentially the same thing as data homogenisation, because it assumes that nearby stations have the same climate. The annual temperature anomalies are the geographically weighted averages of the monthly values. So what did the original V1 data say about past temperatures?

Global-compare

 

There is clearly a huge difference before about 1930. So let’s compare each hemisphere separately.

NH-compare-Crutem4

 

For the southern hemisphere I compare GHCN V1 with a contemporary version of CRU dated 1988 (see below).
SH-compare-CRU86

 

GHCN V1 was available just before the first IPCC assessment report in 1990. At the time CRU had also collected a smaller set of station data from around the world which mostly were included in GHCN. I also have a copy of this data from around 1988 which we can compare directly with V1. The global average temperature anomalies are shown below.

CompareJones-V1The agreement  after 1900 is very good, but  they disagree strongly in the 19th century. Now you can also see why the IPCC first assessment report (FAR) was so cagey about any global warming signal (“yet to emerge”).  That was because  there wasn’t any signal in the temperature data available at that time!

 

This entry was posted in AGW, Climate Change, climate science, Science and tagged , , . Bookmark the permalink.

51 Responses to A Ghost from the Past

  1. A C Osborn says:

    It is very interesting that the current GHCN data now practically matches Jones’ data which had major problems of original data “getting lost somewhere” when it was asked forto check his results.
    Now we can see why.

  2. Euan Mearns says:

    Clive, many thanks. For us mortals this is an amazing and valuable feat. A number of comments, observations and questions:

    1) The data begin in 1850. Any idea why most GHCN records begin in 1880?
    2) The warm past, pre-1880, is somewhat problematic, is it not? I’m guessing that this may have something to do with the introduction of screens? That Jones et al may have applied a cooling correction to old data. But then, is the picture of global warming based on corrections to data?
    3) Struggling to understand your N Hem chart – there seems to be hardly any GHCN data?
    4) Can you give the numerical split of stations between N and S hemispheres?
    5) Not sure I understand your normalisation procedure. Have you summed all “januaries” (and so on) across time inside a grid block to get a datum and that datum then deducted from the monthly time-temperature series for that block. If so this is similar to my normalisation procedure.
    6) How do you account for the difference between Best and BEST?

    • Clive Best says:

      Euan,

      This is still very much a work in progress !
      1) The GHCN data actually start before 1850. I just started the averaging in 1850 to be compatible with CRU. If you look at the station data you can find earlier data.
      2) I am sure there are all sorts of reasons to ‘correct’ older data but then it is also human nature, even among scientists, to argue why the result should move to what you seek.
      3) I was using filled histograms just like CRU does. I agree it is confusing as the 2 histograms are blending together I think this may be clearer:

      4) The coverage is good. This shows a map of where the stations are located.

      5) Yes – that is correct. All temperature data are monthly averages. First you locate which [lat,lon] grid point the stations slots into. Then you fill all the data into monthly grids. You calculate an average temperature for a single grid and a single month. Next you loop over each of the 12 months for a fixed time period and calculate the average value at a particular grid location for jan,feb,mar etc. Finally you subtract the individual monthly temperature values from the mean. This gives the “anomaly”.
      6) I haven’t looked at BEST yet. My instinct says that they can’t do any better than anyone else. There early data looks to be somewhat strange.

  3. Euan Mearns says:

    Clive, I have managed to access data 🙂 for Alice Springs (though the csv format did not want to go into XL as it normally does). Comparison of V1, V2 and BEST raw. Overall its the same data, but with a couple of interesting departures.

  4. Cytokinin says:

    That explains why the temperature graphs don’t move smoothly, but jump at the critical points. Excellent analysis.

  5. Nick Stokes says:

    Clive,
    I see an endless succession of these papers where someone finds some allegedly pristine ancient dataset and shows that it is different from some modern adjusted set. Of course. That is what adjustment does.

    But for heavens sake, why not first check with GHCN V3 unadjusted? It’s in that directory you pointed to.

    “So what I did next was to process the GHCN V1 data by first gridding the temperatures geographically in a 5×5 monthly degree grid, similar to CRUTEM4. I then calculated the monthly averages across all stations within one grid cell. The monthly temperature anomalies are then just the differences from these average values. Averaging stations within a grid cell is essentially the same thing as data homogenisation, because it assumes that nearby stations have the same climate. “

    This is all wrong. CRUTEM average anomalies over the 5×5 grid. You have to calculate the monthly anomalies first – not easy if the data doesn’t cover the anomaly period. Then you can average those. There is no resemblance to homogenisation. Your graph does not show what v1 says about global temperatures. It is just wrongly calculated.

    • Euan Mearns says:

      Nick, which file is the V3 unadjusted? Is that QCU?

      Clive, I tried opening this before with no luck. If you were able to provide a csv file for V3 unadjusted like you have for V1 that would be very handy for me to have. Amongst other things it should have up to date data. Need to run cross checks then that V3 unadjusted is same as V1. If it is, then clearly V3 would be best to use.

      And I think that Nick may be right about generating anomalies before gridding. Though if you have done it “all wrong” its surprising that your data fits Crutem 4 over so much of the time series. I overcame the problem of stations falling outside of base time period by averaging whole station and calculating anomaly from that. I’ve checked several data sets doing it this way and with a fixed base and find no material difference. Fixed base is better but using whole station as base works pretty well in most instances. For convenience I have also been using the metANN data from GISS (DJFMAMJJASON). Where I have been calculating metANN from monthly I’ve been patching missing data from same month in prior year.

      Roger has another way of doing anomalies called first difference that I’ve not been able to crack.

      • Nick Stokes says:

        Euan,
        Yes, it’s QCU. I posted here a portal which gives access to GHCN unadjusted monthly, annually averaged, and also to the NOAA pages that tell you about it.

        • Euan Mearns says:

          Nick, I tried your resource – heroic effort way beyond what I could do. For Cloncurry it gave me a few decade averages – maybe I’m not using it right.

          I’m hoping Clive will have a csv web link to V3 unadjusted within a day or two – no pressure Clive 🙂

          What I see so far in Australia, is that Cloncurry has strings of numbers the same to 6 decimals for BEST, Best V1 and GHCN V2. So what Clive has downloaded has definitely been used before. But in Cloncurry, Alice and Giles BEST and V2 occasionally depart from V1 suggesting that BEST raw is rooted in V2 adjusted. But too soon to draw any conclusions.

          • Nick Stokes says:

            Thanks, Euan,
            The data it shows are in years, by decade row. The 10 numbers following the year are the annual averages.

            I’ve put here a 9 Mb zipfile with the full GHCN V3 unadjusted in csv format. There is a readme file which should explain. The format is the same as V1.

          • Nick Stokes says:

            Apologies, the first zip file had a format error in the data. Should be OK now.

          • Euan Mearns says:

            Many thanks Nick. Extremely helpful! So far I have just checked Cloncurry, but V1, V3 and BEST raw all give same to 6 decimals – it is the same data. V2 of course is not raw even though GISS refer to it as raw which has confused me much in recent months.

          • Nick Stokes says:

            Euan,
            V2 also had an unadjusted version, which basically had the same data. V2 had the extra thing that it kept duplicates – where more than one data set was available, they included everything. Usually duplicates were identical where they overlapped, but not always. V3 settled on a single record.

          • Euan Mearns says:

            Nick, part of the thing that has confused me is this passage from the GISS FAQ page that Gavin pointed me at:

            UK Press reports in January 2015 erroneously claimed that differences between the raw GHCN v2 station data (archived here) and the current final GISTEMP adjusted data were due to unjustified positive adjustments made in the GISTEMP analysis. Rather, these differences are dominated by the inclusion of appropriate homogeneity corrections for non-climatic discontinuities made in GHCN v3.2 which span a range of negative and positive values depending on the regional analysis. The impact of all the adjustments can be substantial for some stations and regions, but is small in the global means. These changes occurred in 2011 and 2012 and were documented at that time.

            To recap, from 2001 to 2011, GISS based its analysis on NOAA/NDCD’s temperature collection GHCN v2, the unadjusted version. That collection contained for many locations several records, and GISS used an automatic procedure to combine them into a single record, provided the various pieces had a big enough overlap to estimate the respective offsets; non-overlapping pieces were combined if it did not create discontinuities. In cases of a documented station move, the appropriate offset was applied. No attempt was made to automatically detect and correct inhomogeneities, assuming that because of their random nature they would have little effect on the global mean.

            After October 2011, NCDC added no more data to GHCN v2, so GISS used its replacement GHCN v3.1 as the base data. One of its differences from GHCN v2 is that multiple records are replaced by a single record, obtained by using for each month the report from the highest ranked source without applying any offsets when switching from one source to another. The resulting discontinuities are handled by NCDC when creating the adjusted version. Since the multiple records used by the GISS procedure no longer were available, GISS switched to using the adjusted instead of the unadjusted version of GHCN v3.1.

            He says raw but the link points at adjusted. Its a good thing that the raw data is still all available if you know how to access it.

            http://data.giss.nasa.gov/gistemp/station_data_v2/
            http://data.giss.nasa.gov/gistemp/FAQ.html

          • Euan Mearns says:

            That last comment of mine needs to be corrected. The V2 source has three data options – 1) raw, 2) raw merged and 3) adjusted. If I understand recent email for Gavin correctly GISS Temp used to use the raw but merged data.

            What I don’t fully understand is why they do not now use the V3 raw.

          • Nick Stokes says:

            “What I don’t fully understand is why they do not now use the V3 raw.”
            Euan, Gisyemp has been around a long time. For many years they did their own homogenising, for lack of alternative. When v2 developed an adjusted version, they had the option of using it, but no pressing need. But when V3 came out, something had to be done (treatment of duplicates had changed). Either modify their own, or use GHCN adjustment. The Menne/Williams (GHCN) algorithm was well regarded, probably the best available. Why not?

    • Clive Best says:

      Nick,

      Thanks for your comments.

      I am well aware of exactly how CRUTEM processing works. I have their station data and have their code running which exactly reproduces the annual averaged anomaly data. Indeed the only station data used are the sub-set with pre-calculated anomalies between 1961 to 1990. However, the result of this is that they discard all station data which do NOT have a continuous temperature data record between 1961-1990. This means their station count is about half of the GCHN station count.

      1. The choice of which period to use in order to calculate normals (1961- 1990) is arbitrary. The final result must not depend on this arbitrary choice except for a small offset. The trend must remain unchanged otherwise there would be an in built bias based on that choice. So if they were to chose say 1941-1970, the selected stations will be different but the curves must remain the same shape to remain valid. In 1988 Phil Jones clearly must have used a different normalisation time period.

      2 Berkeley Earth claims to process 39,000 stations with their ‘novel’ algorithms, so clearly they are not normalising monthly temperatures between 1961-1990 either. This whole process started because Euan and Roger Andrews found that several of the BEST processed station data seemed to be very different from those of GHCN V2. So somehow their novel algorithms had changed the underlying data.

      3. I had been led to believe that only the first version of GCHN contained the raw instrument measurements. Strangely enough NOAA have on their FTP server GCHN V1 data for the pressure and rainfall BUT the temperaure data has been removed. You are Australian and you know well that station data have been adjusted, perhaps for very good reasons. But the resultant warming trend looks suspicious to climate sceptics. Eventually I found an original GHCN1 version derived from the 9-track magnetic tapes on which it was originally stored.

      4. Like BEST, I wanted to use all the station data, and I also wanted to use the raw values. So I have decided on my own arbitrary normalisation. This is different to CRU but it is not ‘wrong’. You can’t normalise ALL individual stations apriori to a fixed overlapping time period. So I decide instead to grid all the monthly temperature data. Each grid point for any given month will contain a varying number of contributing station data as stations come and go, or have periods of no data logging.

      5. I calculate the average temperature for every month and for every grid point. I then calculate the 12 monthly (averaged) normals for each grid point over the entire time period (1850 – 1988) and then in a second pass I subtract these for each month to calculate ‘anomalies’. So now I have monthly anomaly grids from which I make a global, NH and SH average, and then make yearly averages. These are what are shown above. They are almost the same as CRUTEM after about 1900.

      6. The only assumption is that within a single grid point the seasonal changes are the same for all stations. If there was perfect geographic coverage you wouldn’t need to calculate ‘anomalies’ – Instead you could calculate the global average temperature directly.

      7. Where you are right is that if the raw temperature measurements are available in V3 then I we should definitely use them instead. However we can now at least check whether this is really true.

      cheers

      • Euan Mearns says:

        Clive, that is a fairly robust response. I plan a short post on this, maybe tomorrow.

        What Roger (and I) have found is that a S Hemisphere average based on V2 (slightly adjusted) has substantially less warming than GISS and BEST. Hadcrut4 is closer but still warmer.

        The things we would like to track in BEST is the station weighting process via comparison to regional expectation which from memory is geared *2 to *1/13 = 26. Essentially stations that don’t tell the truth are weighted out of the system. Iteration guarantees truth.

        And one thing I’m interested in is how regional averages may use N hemisphere data to shape S hemisphere data. Is N hemisphere warming imported to the S?

        When you say here that Crut do not use stations without full cover in the base period I understand why they do it, but it will significantly reduce the number of old records they deploy. My recent post on N Scandinavia had lots of old records that stop.

        If you want any of my spread sheets just let me know.

      • Nick Stokes says:

        Clive,
        The issue of how to proceed with stations that do not have data in the reference period is ancient. In 1986, Phil Jones explained the issues very clearly, and also gave a good account of the need for homogenisation. He used the reference period 1951-1970. There are well known methods like RSM (reference station method, Jones again), first difference method, etc. I am surprised at your observation that CRUTEM simply abandons stations that don’t have data in the reference period.

        If you don’t have a fixed anomaly base, then part of the climate signal can be transferred to the base averages that you are subtracting. There are other ways of avoiding this, and in TempLS I use the fitting of a least squares model. BEST later adopted a similar approach.

        “I had been led to believe that only the first version of GCHN contained the raw instrument measurements.”
        Well, it’s not so. Actually, the only GHCN that contains original measurements as GHCN Daily, which is kept current. GHCN Monthly, even in 1990, is averaged over a month, and that is not a trivial step. It is where TOBS is an issue. But there should be no substantial difference between the numbers in GHCN V1 and GHCN V3 Unadjusted, except that V3 has more of them, and may have sorted out some local record identification issues.

        “You are Australian and you know well that station data have been adjusted, perhaps for very good reasons.”
        Data is not a singular. I don’t know why people find it hard to understand this simple proposition. Yes, people adjust their copies of the data for good reasons (to do with calculating continuum averages), but the data is not adjusted. It is unchanged in the Met office records, and is readily available in GHCN Daily and GHCN V3 monthly unadjusted.

        “Eventually I found an original GHCN1 version derived from the 9-track magnetic tapes on which it was originally stored.”
        Actually, v1 was issued on CD. There was no intent that the numbers would be variable.

        “This is different to CRU but it is not ‘wrong’.”
        It is wrong, and I explain why here. Certainly it won’t yield comparable results. The big problem with averaging anything before taking anomalies is that the population may change (missing values). Then the result depends on what kind (warm/cold) of stations went missing.

        • Clive Best says:

          In 1986, Phil Jones explained the issues very clearly, and also gave a good account of the need for homogenisation.

          I read the Jones paper. It says: “For a station to be used in our analysis at least 15 years of data are required between 1951-1970. In some parts of the world , however, there were valuable long records that ended in 1950 or 1960. Clearlt, it was desirable to retain these records…Fortunately, in most cases, reference period means could be estimated using data from nearby stations with accuracy better than 0.2C”

          This is basically the same as what I am doing, except that they limit the normalisation period to 30 years. I can also do that and see the result.

          Thanks for making the .csv file for V3 – it saves me doing it!

          You write

          The big problem with averaging anything before taking anomalies is that the population may change (missing values). Then the result depends on what kind (warm/cold) of stations went missing.

          You are assuming that there is a wide difference in hot/cold stations within one grid cell, or in other words micro-climates. If that is the case than the whole process is somewhat flawed. For example if the whole of Southern England is represented by two stations – one in Hyde Park and the other in central Birmingham then trends in anomalies will not be representative.

          Does BEST use a fixed time base for anomalies ? How do they interpolate a colonial era station from 1780 onto the period 1961-1990 ? Linear interpolation !

          • Nick Stokes says:

            “For a station to be used in our analysis at least 15 years of data are required between 1951-1970.”
            Yes, that is the Common Anomaly Method, which I see that CRUTEM does use. It is a limitation.

            “Does BEST use a fixed time base for anomalies ? How do they interpolate a colonial era station from 1780 onto the period 1961-1990 ?”
            No, they don’t and neither do I. There is no need for interpolation, and no base period. The linear model handles the time shift.

            “You are assuming that there is a wide difference in hot/cold stations within one grid cell, or in other words micro-climates. If that is the case than the whole process is somewhat flawed.”
            Grid cells are big. The one that covers most of England goes from Lands End to Sunderland. The Scot one goes from Sunderland to the Shetlands. And that’s in a not very mountainous or continental country.

            The point is that temperature is heterogeneous, but anomalies are fairly homogeneous.

            I don’t know why CRUTEM has stations without shown anomalies. But they won’t include them in an average. How could they?

          • Nick Stokes says:

            Here is a shaded plot of GHCN station anomalies for a month. The shading is such that each station has a color corresponding to its actual anomaly. Noting the fine gradations in temperature scale, you can see that anomalies are far more continuous than temperature.

        • Clive Best says:

          I forgot to add that some of the station data provided by CRU does not contain pre-calculated anomalies. These are rejected by their processing. I have a world map interface where you can click on stations and get the plotted results. http://clivebest.com/world/Map-data.html
          You can find stations without anomalies.

        • Clive Best says:

          The provided fortran code makes it clear the original data was on 9-track tape.

          C FORTRAN data retrieval code to read and print the GHCN station
          C inventory files (Files 8-11 on the first magnetic tape)...
          C
          C Variable declarations...
          C
                INTEGER COUNTRY, STATION, ELEV, FIRST, LAST, DISC
                REAL LAT, LON, MISSING
                CHARACTER * 25 NAME
          C
          C Initialize a record counter...
          C
                NREC = 0
          C
          C Read in one line of data...
          C
             10 READ (1, 1, END=99) COUNTRY, STATION, NAME, LAT, LON, ELEV,
               *FIRST, LAST, MISSING, DISC
          C
              1 FORMAT (I3, I7, 2X, A25, 1X, F6.2, 1X, F7.2, 1X, I4,
               *1X, I4, 1X, I4, 1X, F4.1, 1X, I1)
          
        • Euan Mearns says:

          Nick, I went back and recalculated all using dT station average, dT 65-74 base and dT 63-92 base. There is no material difference between any of these and in particular no difference between station average and 63-92. Where a station did not cover the base period I used station average instead – a quick fix, but given the small number of stations affected will make little / no difference.

          An importnat point for me is that the tops and bottoms on this chart are flat. The data have a small positive gradient because higher temps are weighted to the front end.

          Some parts of the world do show significant warming. Central Australia IMO is not one of them

  6. Nick Stokes
    ”Predicting” weather / climate was the oldest profession – prostitution was the second oldest. (from prostitution you get less rip-off, and at least you get something for your money) Regarding CO2, there are two versions #1: CO2 makes dimming defect – was used in the 70’s; that: because of CO2 dimming effect we’ll get ice age by year 2000. #2: the contemporary misleading effect is: because CO2 prevents heat to be ”radiated” to out in space, we’ll get global warming…?!(that version was used few times for the last 150years that was THE GRANDMOTHER OF ALL LIES!
    Using the temp for some place – to tell the temp on the WHOLE planet, is a sick pro’s joke!!!

    THE TRUTH: heat created on the ground AND in the water is neutralized by the ”new cold vacuum” that penetrates into the troposphere every 10 minute. From 2-10km altitude all the heat is neutralized. The thinner the air up -> the more of that ”cold vacuum” penetrates in and out and neutralizes any extra heat. If no extra heat, that ”cold vacuum” just zooms out underutilized, or not utilized at all. Only occasionally super-heated gases from volcanoes and nuclear bombs explosions go above 10km up to 12km-18km, and gases of million degrees heat is neutralized, BUT: for the rest of the year, all that cold vacuum that zooms trough, is unused. Because the planet orbits around the sun into that ”cold vacuum” at 108 000kmh -it means that: that ”cold vacuum” cannot get overheated one bit!!! Bottom line: even if there was not one molecule of CO2 in the atmosphere – heat wouldn’t have ”radiated” out in void; all the cooling is done in the troposphere!!! Heat from the ground ”radiates” only few inches AND: horizontal winds collect that heat / then ”vertical winds” disperse it few km up into the thinner troposphere, where is ”neutralized” by the constantly coming in new ”cold vacuum” Heat from CO2 doesn’t radiate for more than a micron, and is directly cooled by the ”cold vacuum” No ”BACK-RADIATION” at all!!! CO2 is NOT a greenhouse gas! Here is the truth, read every sentence and expose the scam: https://globalwarmingdenier.wordpress.com/2014/07/12/cooling-earth/

  7. Euan Mearns says: ”have found is that a S Hemisphere average based on V2 (slightly adjusted) has substantially less warming than GISS and BEST. Hadcrut4 is closer but still warmer”

    Euan, when the sun is on the S/H is ”closer to the earth” because of the elliptical orbit, BUT: .because of the temperature self adjusting mechanism the earth has – is same ”overall” temp always! For you guys It shows different temp, for two reasons:

    1] on the northern hemisphere are more thermometers, than on the S/H
    2] southern hemisphere has more water – where is more water – day temp is cooler BUT night temp is warmer = overall is same BUT: because the shonky science uses only the hottest minute in 24h and ignores all the other 1439 minutes = is created for confusion and misleading … nothing worse than grown up person misleading himself… tragic.. tragic…

  8. Clive Best says:

    Nick,

    Let me try to get this straight.
    First you must be correct that a changing population of stations gives rise to biases in the average temperature. For example if there is a plateau 1000m high in the middle of the area, then the average temperature will depend on how many stations are included on that plateau. To avoid this we have to use normals calculated at each station and to measure ‘anomalies’ relative to this ‘normal’. So if all stations show the same difference then the climate has warmed in that region. Or if the average of all the anolamies increases over time then the local climate is warming.

    There are two ways to calculate normals: 1) monthly 2) annual depending on which time resolution you want. In the first case we have 12 normals and in the second place just one. The next choice is the time period over which you define the normal. The community seems to have chosen 30 years currently 1961-1990. This has to be a compromise since why should that particular period be considered ‘normal’. Instead it is most likely chosen because it has the highest number of stations available.

    Why 30 years and not 10years?
    Why 30 years and not 60 years?

    If you look at a hypothetical climate which is warming at 0.2C/decade. For the annual normalisation it doesn’t really matter how you make the normals the trend will allways emerge.

    This process of detecting global warming from station data is fraught with biases. Are you sure that the current methods have the least in-built bias ?

    • Euan Mearns says:

      Clive, I too am still a bit uncertain about how you have normalised. I think you say you populate grid cells with temperatures, take an average for the grid cell and anomalies from that average. I think what Nick is saying is that all stations should be converted to anomalies first – station anomalies – and grid cells then populated with these anomaly stacks where they can be averaged.

      • Clive Best says:

        Nick actualy replied already to this already on his site. I mostly agree with what he says but there are still some oddities which I will discuss later. see: http://moyhu.blogspot.co.uk/2015/03/central-australian-warming.html

        Clive,
        The basic issue is that. When you average, you are trying to ascertain the mean of a population from a sample. If the population is heterogeneous, you have to do a lot of work to ensure that the sample is representative. The more homogeneous it is, the less that is required.

        Say you are polling on an issue where men and women think differently. You have to be sure to get equal numbers in the sample, or to re-weight. But if there is no indication they think differently, it doesn’t matter.

        With temp, you get the sample there is, and can’t improve it. So you have to try to improve homogeneity – ie deal with a quantity that has about the same distribution at each point. Or at least the same expected value.

        That is what anomaly does. You try to subtract your best estimate of the expected value, so the residue is zero everywhere. Then there isn’t a predictable difference when the sample changes. It’s safe to average.

        An anomaly based on a single point is a big improvement on nothing at all. A wider base gives more stability. Exactly how wide isn’t critical. 30 years is a reasonable compromise, balanced against the problem of finding stations reasonably represented in the range. 20 years would be fine too.

        “For the annual normalisation it doesn’t really matter how you make the normals the trend will always emerge. ”
        I did an example here to show why this isn’t true. Just take 1 station, uniform rise .1°/dec from 1900-2000. Then split it into two stations, one pre-1950, one post. If you use each average as the anomaly base, the trend is drastically reduced – to about 0.025. The reason is that the anomalies you subtract do themselves make a step function in time, which contains most of the trend.

  9. A C Osborn says:

    Clive, can you do me a favour, can you take a look at station 62103953000, which is Valentia in Ireland, in your V1 copy of GHCN.
    I have looked at V3 and the Dataset starts in 1961.
    What has happened to all the original Valentia data going back to the 1800s?
    Valentia is well known for having a very long, very flat raw temp record.
    I am having trouble finding the original data to compare it to.

    • Clive Best says:

      The V1 data starts in 1869. Here it is :

      60503953001869   87   93   70  111  108  138  163  155  140  123  101   66
      60503953001870   68   54   76   98  113  143  161  164  148  120   81   49
      60503953001871   62   88   87  104  128  142  147  158  132  119   82   73
      60503953001872   76   85   86   92  104  129  154  153  140   98   83   79
      60503953001873   75   55   69   97  115  139  147  148  128  103   84   94
      60503953001874   79   81   88   99  118  144  156  152  132  113   97   64
      60503953001875   96   63   76  102  119  131  145  160  153  112   82   71
      60503953001876   76   81   66   92  111  130  154  157  133  122   98   83
      60503953001877   81   86   74   92  106  141  142  147  128  121   92   84
      60503953001878   83   88   86  101  118  138  167  162  145  118   66   41
      60503953001879   53   64   74   81  100  127  131  139  125  106   81   69
      60503953001880   68   79   92   92  117  137  149  171  152   84   87   81
      60503953001881   37   66   78   91  123  129  144  139  132  111  112   73
      60503953001882   87   88   91   95  118  129  139  147  122  108   86   65
      60503953001883   79   74   57   87  104  134  135  146  134  112   92   79
      60503953001884   88   78   81   87  117  131  149  149  141  111   78   76
      60503953001885   69   74   68   83   95  129  152  152  131   94   93   71
      60503953001886   55   63   64   90  103  138  150  149  139  114   92   63
      60503953001887   75   74   64   77  111  163  164  156  130   97   72   62
      60503953001888   72   48   53   79  112  138  139  148  130  112   97   83
      60503953001889   75   69   73   81  111  138  146  142  142   98   96   84
      60503953001890   79   63   74   90  112  130  137  140  144  124   86   52
      60503953001891   59   83   61   86   97  144  143  139  136   99   75   83
      60503953001892   56   61   53   89  116  133  147  147  126   79   92   79
      60503953001893   65   67   93  115  137  158  158  167  134  107   75   74
      60503953001894   58   81   86   99  103  139  143  141  128  108   91   85
      60503953001895   42   27   73   93  117  146  143  147  146   93   84   77
      60503953001896   76   84   86  101  126  150  152  146  137   78   69   65
      60503953001897   47   86   74   84  109  139  160  150  125  122  100   82
      60503953001898   91   75   63   90  107  137  159  160  159  122   93   94
      60503953001899   67   77   77   90  114  152  158  179  141  113  106   75
      60503953001900   73   42   55   98  113  140  160  150  141  109   83   90
      60503953001901   70   46   61   84  123  134  156  150  139  106   77   65
      60503953001902   76   50   81   83  101  136  148  149  138  114   96   78
      60503953001903   69   86   76   82  110  138  150  139  134  106   87   63
      60503953001904   69   62   62   88  109  134  152  142  132  112   88   83
      60503953001905   81   72   73   90  114  146  159  141  124   93   68   88
      60503953001906   77   61   74   80  102  145  148  156  141  112   92   73
      60503953001907   68   61   87   82  107  122  149  146  147  102   83   74
      60503953001908   64   81   67   79  122  137  151  152  131  137  108   88
      60503953001909   77   68   60   94  115  125  143  154  133  112   68   66
      60503953001910   69   68   77   78  112  137  147  148  129  121   75   82
      60503953001911   71   73   71   84  121  141  165  164  137  115   78   79
      60503953001912   72   72   78   99  122  132  143  126  127  106   91   90
      60503953001913   68   69   72   83  105  128  148  155  143  115  100   75
      60503953001914   72   80   78  100  110  138  144  154  145  115   90   63
      60503953001915   67   56   66   95  123  145  145  151  148  117   66   74
      60503953001916   91   58   51   87  108  123  150  168  145  124   87   54
      60503953001917   47   47   64   70  122  134  157  149  137   95   72   62
      60503953001918   66   88   76   88  123  138  149  153  123  103   89   92
      60503953001919   63   65   56   85  125  133  141  157  133  108   52   84
      60503953001920   75   85   74   83  111  136  135  144  136  127  104   72
      60503953001921   92   75   82   94  108  145  176  144  141  138  104  100
      60503953001922   77   79   73   68  119  131  130  136  131  104   89   79
      60503953001923   83   80   81   81   99  127  157  147  127  106   61   82
      60503953001924   79   64   73   82  112  132  141  141  130  117   93   90
      60503953001925   84   68   70   82  104  141  143  149  126  120   68   66
      60503953001926   79   91   83  101  108  131  169  159  146  102   79   68
      60503953001927   73   80   85   88  121  126  149  149  127  123   84   66
      60503953001928   83   87   73   84  118  135  148  147  133  118   95   75
      60503953001929   58   75   90   90  113  133  148  148  150  110   90   76
      60503953001930   66   44   72   85  113  139  142  140  138  118   87   78
      60503953001931   72   73   82   93  113  140  147  150  132  114   97   93
      60503953001932   93   59   75   83  110  147  151  166  137  108   99   84
      60503953001933   62   67   89  102  123  137  166  164  151  119   87   62
      60503953001934   82   65   76   81  111  153  170  147  136  114   87   99
      60503953001935   74   78   86   94  117  134  156  154  140  112   81   63
      60503953001936   64   72   84   84  115  139  149  158  148  126   89   85
      60503953001937   77   79   54  101  119  136  150  159  137  109   84   68
      60503953001938   80   74   97   96  112  137  141  154  139  121  109   70
      60503953001939   66   87   77   95  122  148  144  159  147  100  108   64
      60503953001940   62   81   87   99  126  159  144  159  138  110   92   79
      60503953001941   52   62   73   82  107  145  151  147  150  127   87   94
      60503953001942   74   60   91  104  119  139  149  158  136  107   62   98
      60503953001943   84   89   91  108  117  141  158  149  131  116   95   74
      60503953001944   95   66   80  109  118  139  157  161  131  108   90   81
      60503953001945   48   94   98  109  123  138  156  159  151  136  107   92
      60503953001946   77   86   81   98  119  130  147  141  141  122  101   72
      60503953001947   65   28   68   95  112  140  146  173  147  127   99   80
      60503953001948   78   77  106   95  121  133  148  153  143  121  118   90
      60503953001949   91   91   89  106  114  151  165  163  158  131   97   91
      60503953001950   87   78   98   95  130  149  157  149  133  115   82   52
      60503953001951   63   51   64   83  106  138  154  146  136  118   96   79
      60503953001952   59   71   88   93  122  132  159  154  122  113   81   74
      60503953001953   73   73   76   82  122  133  148  151  142  112   98   96
      60503953001954   67   67   81   97  112  129  136  140  129  126   86   91
      60503953001955   61   41   56  106  104  137  164  178  151  106   95   92
      60503953001956   68   44   93   91  116  130  146  136  142  109   89   86
      60503953001957   73   68  106   96  116  141  154  153  136  117   83   77
      60503953001958   71   79   72   92  107  131  146  152  150  118   99   68
      60503953001959   61   86   86   93  123  141  153  158  152  136   87   77
      60503953001960   68   57   90   96  126  147  144  148  133  104   85   68
      

      year is appended to the ID (1 less zero than V3). The 12 monthly temperatures are in 10ths of a degree C.

    • Nick Stokes says:

      The GHCN V3 records for Valentia also starts in 1869. The data description page is here.

      • A C Osborn says:

        Nick, thanks, I had already realised that.
        I started by looking at TMax, that is the dataset that is incomplete.
        TMin may also be as well.
        TAve is pretty well complete.

  10. Greg Goodman says:

    A few years ago I did an article on Judith’s Climate Etc. about hadSST3 adjustments:
    http://judithcurry.com/2012/03/15/on-the-adjustments-to-the-hadsst3-data-set-2/

    In it I noted that they had removed the majority of the long term variability for the majority of the record. It was just this pre-1900 cooling that had been severely attenuated by their speculative “corrections”.

    It is also worth comparing to Jevrejeva’s sea level analysis which shows that rate of change went from -ve to +ve some time around 1870, not 1960 when CO2 is supposed to have become significant.
    ?w=596

    • Greg says:

      this shows that what the adjustment takes out is very similar to 2/3 of the original ICOADS SST data. The only bit they don’t seem to play down is the recent , post 1980, warming.

  11. Nick Stokes says:

    Zeke has reminded me that he and Steve Mosher posted an analysis of GHCN V1 vs V3 at WUWT.. It’s very thorough. Histograms – even a recon. The recon results are identical for V1 and V3 unadjusted. I did my own look at Iceland here.

    • Clive Best says:

      Nick,

      Certainly looks convincing at first sight I agree. However they identify some underlying differences as shown in their overlapping raw station differences. There is also an evident step function in V3 stations at 1895 which implies they dropped a lot of early V1 stations for some reason. Why ?

      Why also does the station number drop dramatically after 1990? This is true of all datasets CRU, V3 etc. One would imagine there would be an effort to increase coverage not decrease it, especially as this is the period of strongest warming.

      The global anomalies trend they show indeed looks the same for all 4 data sets. But then perhaps it should do as they must all be using the same set of stations due to the 1961-1989 normalisation.

      For that reason I want to use all the stations and develop your linear model further, In the meantime my original (biased) normalisation shows small differences between V1 and V3 of order 0.1C (see next post).

    • Nick Stokes says:

      “Why also does the station number drop dramatically after 1990?”
      That’s because of the nature of the project. V1 was a grant-funded, archiving project. It came at the end of a period when vast amounts of hand-written etc data had been digitised by the national met offices. They wanted to collect the result in a central repository.

      As archiving, they put in every decent dataset they could find. It wasn’t until about 1997 that NOAA was persuaded to undertake maintenance. Updating monthly is a very different proposition to a one-off inclusion of a record in an archive. Ongoing cooperation from other nations is required. So they rationalised.

      In the GHCN inventory, of 7280 stations, there are 1921 from the US. 847 from Canada. 254 from Turkey, and 57 from Brazil. There is no need to maintain 847 stations from Canada.

  12. There are some errors in the csv data file. I’m correcting them now, and will post shortly when I have completed the corrections, and make the corrected file available. The errors seem to be confined (so far) to stations with data earlier than 1800, where the station id and year have not been separated, giving a “new” station id and data for January to November only. The earliest data appears to come from 1701.

    I have another v1 data set dating from 1994, but also with data just to 1990. When I’ve completed corrections I’ll compare the two. I suspect that they may be the same data, but with slightly more metadata.

  13. This is identical to the version I downloaded from http://cdiac.ornl.gov/ftp/ndp041/, dated 28 July 1992 (1994 above was my memory at fault)

    The corrected data, both as csv and txt (extension .doc, added to enable WordPress upload, may be deleted – these are not Microsoft Word documents):

    https://oneillp.files.wordpress.com/2015/05/clivebest_data-csv.doc
    https://oneillp.files.wordpress.com/2015/05/clivebest_data-txt.doc

    I’ll post additional metadata later today.

  14. The additional metadata (again extension .doc added which may be deleted):

    https://oneillp.files.wordpress.com/2015/05/temp-statinv-txt.doc

    Station names in this file differ from those in Clive Best’s csv file in that some contain commas, and so are unsuitable for reading as a simple csv file. (Note that one station in Clive Best’s csv file, CENTRO MET.ANTARTICO”VICE, contains a double quote mark which may cause a problem when the csv file is read). The latitude and longitude coordinates for each station are identical in the two versions, as are the start-tears and end-years.

    Four additional values are added for each station. The elevation follows the longitude. Two additional values from the original inventory follow the end-year. These areas described in the readme file:

    MISSING is the percent of the record with missing data.

    DISC is a code which can be used to identify a time series which
    contains a “gross” discontinuity (i.e., one which was readily
    identified when the time series was plotted and analyzed
    visually). If DISC is 1, then the station has a major
    discontinuity. If DISC is 0, then the station has no major
    discontinuities. However, it could still contain more subtle
    discontinuities.

    Finally, I have added a nightlight luminance for each station, for anyone who may wish to try adjusting the data following Gistemp procedures. These luminance values are taken from the F16_2006 version rather than the deprecated earlier version still used by GISS. (If there is a demand for this, I can generate luminance values using the deprecated version and add these to the file). Generally, with a relatively small number of exceptions, urban/rural classification is the same for both F16_2006 and deprecated versions. Experience with GHCN v3 indicates that it is correction of location coordinates which leads to more frequent classification changes. With GHCN v3 approximately 20% of stations outside the US, Canada and Mexico which are also WMO stations show changed urban/rural classification when the WMO coordinates are substituted for those in the GHCN inventory file. The coordinates used to determine luminance correspond to the latitude and longitude coordinates given in the inventory file, and as these coordinates have not been corrected the luminance values may in some cases correspond to a location sufficiently distant from the station to give a misleading urban/rural classification. 2034058101 KUWAIT INTL AIRP is a good example of erroneous coordinates, located at sea rather than at the airport. I have not corrected any coordinates in the v1 inventory file, and do not at present plan to do so. (I am gathering corrections for the v3 inventory coordinates).

    • Clive Best says:

      Thanks Peter,

      It looks like you did a more thorough job than I did!
      I converted the data to CSV files because I know many people use excel for their analysis. I had no problem myself loading the csv into my old MAC version of excel so assumed they were OK.

      Then I saw some of the place names!! – so did a regular expression substitution of all the ‘,’ s

      • I tend to avoid csv with data such as place names, which may contain characters which can throw off csv. Importing into Excel as fixed width fields works fine when the original data, as here, is indeed fixed width.

        I spotted the csv failure for the 1700’s and CENTRO MET.ANTARTICO”VICE quite quickly as I have a habit of sanity checking new data where possible by finding the minimum and maximum of columns where appropriate, and this quickly identified problems by showing values in the three year columns which could not be right.

Leave a Reply