Zeke’s Wonder Plot

Zeke Hausfather who works for Carbon Brief and Berkeley Earth has produced a plot which shows almost perfect agreement between CMIP5 model projections and global temperature data. This is based on RCP4.5 models and a baseline of 1981-2010. First here is his original plot.

I have reproduced his plot and  essentially agree that it is correct. However, I also found some interesting quirks. Firstly here is my version of his plot where I have added the CMIP5 mean to compare with the new blended TOS/TAS mean. I have also included the latest HadCRUT4.6 annual values in purple.

Original plot with RCP4.5 model ensemble members overlaid and unblended model mean shown in red. HadCRU4.6 annual values have been added in purple. Click to expand

The apples to apples comparison (model SSTs blended with model land 2m temperatures)  reduces the model mean by about 0.06C. Zeke has also smoothed out the temperature data by using a 12 month running average. This has the effect of exaggerating peak values as compared to using the annual averages. To see this simply compare HadCrut4 (annual) in purple with his Hadley/UEA.

So now what happens if you change RCP?

Here is the result for RCP2.6 which has less forcing that RCP4.5

The same plot but now overlaid with RCP2.6 model ensemble and mean. click to expand

The model spread and the mean have increased slightly. So the model mean and grey shading should also  slightly rise.

Next, does the normalisation (baseline) affect the result ?

Effect of changing normalisation period. Cowtan & Way uses kriging to interpolate Hadcrut4.6 coverage into the Arctic and elsewhere.

Yes it does. Shown above is the result for a normalisation from 1961-1990. Firstly look how the lowest 2 model projections now drop further down while the data seemingly now lies below both the blended (thick black) and the original CMIP average (thin black). HadCRUT4 2016 is now below the blended value.

This improved model agreement has nothing to do with the data itself but instead is due to a reduction in warming predicted by the models. So what exactly is meant by ‘blending’?

Measurements of global average temperature anomalies use weather stations on land and sea surface temperatures (SST) over oceans. The land measurements are “surface air temperatures”(SAT)  defined as the temperature 2m above ground level. The CMIP5 simulations however used SAT everywhere. The blended model projections use simulated SAT over land and TOS (temperature at surface) over oceans. This reduces all model predictions slightly, thereby marginally improving agreement with data.  See also Climate-lab-book

The detailed blending calculations were done by Kevin Cowtan using a land mask and ice mask to define where TOS and SAT should be used in forming the global average. I downloaded his python scripts and checked all the algorithm, and they look good to me. His results are based on the RCP8.5 ensemble. These are the results I get using his Python code.

RCP 8.5 ensemble. The original projections are in blue and the blended ones in red. The ensemble mean is reduced by up to 0.07C . Data shown is Cowtan & Way.

Agreement has definitely now improved between the data (Cowtan a& Way) and the models, but they are still running warmer from 1998 to 2014.

Here finally is my 1950-2050 overview, where the blended RCP4.5 result has been added.

The solid blue curve is the CMIP5 RCP4.6 ensemble average after blending. The dashed curve is the original. Click to expand.

Again the models mostly lie above the data after 1999.

This post is intended to demonstrate just how careful you must be when interpreting plots that seemingly demonstrate either full agreement of climate models with data, or else total disagreement.

In summary, Zeke Hausfather writing for Carbon Brief 1) used a clever choice of baseline, 2) of RCP for blended models and 3) by using a 12 month running average, was able to show an almost perfect agreement between data and models. His plot is 100% correct.  However exactly the same data plotted with a different baseline and using annual values (exactly like those in the models), instead of 12 monthly running averages shows instead that the models are still lying consistently above the data. I know which one I think best represents reality.

About Clive Best

PhD High Energy Physics Worked at CERN, Rutherford Lab, JET, JRC, OSVision
This entry was posted in AGW, Climate Change, IPCC. Bookmark the permalink.

30 Responses to Zeke’s Wonder Plot

  1. Windchaser says:

    Here’s my Reviewer #3 comment:

    What’s the argument for using a 1961-1990 baseline over any other one? (If baseline matters this much, and you can’t give a strong argument for why a particular baseline is better, that suggests an additional, unaccounted source of uncertainty in the comparison).

    With regard to running averages: it seems like the appropriate thing to do is use the 12-month RA for both. This reduces statistical noise; you’re not sub-sampling with a climatologically arbitrary (i.e., Jan-Dec) average out of the 12 possible yearly averages; you’re including them all. If you *do* pick a specific 12 month period to use, this certainly reduces the statistical strength of the comparison. Better to use a running average, not a specific one.

    I’d nix HadCRUT entirely, or mask the model results to use the same coverage. There’s no point in including a comparison that we know is wrong.

    Last, there’s one other thing I’d want to see, if doing a proper comparison between models to date and observations: I’d run the simulations with the actual forcings to date, including volcanic and solar forcings to date. (Which were cooler than average; cooler than the actual forcings used in these simulations).

    With all that, you get the best comparison of models to observations. You’ve matched the timelines, the forcings, and the geographical coverage, so the model results are now an apples-to-apples comparison to the observations, unless I’m missing something.

    Then, to look at the importance of the result, I’d want to know how much the models over-/under-estimated the sensitivity, with error bars. Was the model mean an ECS of 3.0C/doubling, but the observations show 2.9? Or do the observations show 2.5? Etc.

    • dpy6629 says:

      There was a paper on the forcing adjustments by Outten et al showing that in one model they made no significant difference in the surface temperature result. I’d be careful too here as there is considerable uncertainty in forcing estimates.

  2. Clive Best says:

    Thanks for the review.

    The reason to use 1961-1990 is that it optimises coverage in the station data.

    GISS and Berkeley use 1951-1980 (I think). Models should not really care which normalisation period is chosen.

    I don’t like running averages because each point is not independent. It adds an artificial smoothing so for example in El Nino peaks over just 3 months in one year it will appear accentuated. Furthermore we are comparing yearly values of the models so there should be just one data value to compare rather than 12.

    It you don’t like Hadcrut then take Cowtan and Way or Berkeley as a yearly average, but don’t use artificially generated moving 12 month averages.

    Regarding forcings, I am sure that right now modellers are tuning them for CMIP6 to match that last 5 years. I expect also that they will produce blended projections directly for AR6.

  3. dpy6629 says:

    Thanks for doing the work Clive. I’m not sure I totally believe the model difference in TAS and SST idea since the theoretical justification is weak and there is very little data. Nic Lewis dealt with this at Judith’s (July 2016) in response to the Richardson et al paper. He found only a 2% trend difference between late 1800’s and recent decades in the model output. Also there is some data from bouys in the tropical Pacific that shows no significant difference in the real world.

  4. dpy6629 says:

    Clive, Why would the Cowtan and Way adjustment only be significant over the last 15 years and not before?

    • Clive Best says:

      I am not sure of the real answer to that. Officially it is because the Arctic has been warming faster than the rest of the surface recently and because C&W krigs (extrapolates ) over the whole arctic it gives a larger warming. Of course this effect cannot continue for ever.

      The Arctic warms more for another reason. To increase temperatures in Africa from 30 to 31 C requires far more energy than increasing Arctic winter temperatures from -30C to -28C.

      • Hans Erren says:

        Indeed that is the derivative of Stefan-Boltzmann dT = dE/(4 ?T3)

      • dpy6629 says:

        Yes but it would seem that the same thing would have been at work say in the 1990’s when C&W is virtually identical with HADCRUT. Why the increasing effect of the infilling?

      • phil chapman says:

        Two comments:

        1) Global Warming is a misnomer. What it actually means is a reduction in the equator-to-pole gradient, mostly in the NH, so Arctic Warming is more accurate. Most of the warming since 1880 has been at high latitudes. Even at the peak of the Hothouse Earth, 100 Mya, the low-latitude temperature was almost unaffected — but tropical conditions extended to high latitudes, and there were palm trees and crocodiles in the Arctic.

        2) Using a trailing running average displaces the average from the data. It is better to use a centered average (i.e. an average over the preceding and following n points). This means the average is over 2n+1 points, so it must cover an odd number. It is thus better to average over 11 or 13 months (i.e., n = 5 or 6) rather than 12.

        • oz4caster says:

          Phil, I agree on your (1) point and would go one step farther to call it “Arctic winter warming”. The Arctic summers have not shown much if any warming recently. Although, I do somewhat hesitate to use the word “warming” when applied to Arctic winters, since they are so cold. In reality they are not just not quite so cold now as they were 30 years ago.

          My guess is that most of the Arctic winter warming effect is ocean circulation related in slowly and periodically transferring heat to the Arctic for dissemination to space to balance the ocean heat content from input in the tropics. From what I have read, there is historical anecdotal evidence for such cycling over the last few hundred years. Time will tell, but it may take decades or centuries of modern observations to be more confident.

  5. Pingback: Climate Models Cover Up | Science Matters

  6. oz4caster says:

    Clive, why not compare the model forecasts directly to reanalysis data, such as ERAI (Copernicus), CFSR, or NCAR/NCEP R1? These approaches to assessing global mean surface temperature are compatible with the climate models, so no adjustment is necessary. So just replace the HadCRUT4.6 with Copernicus and forget about adjusting the climate model forecasts.

    • Clive Best says:

      The reanalysis data is limited to recent times so we have to rely on historical measurements before ~1980. However reanalysis data allows you to compare absolute temperatures to models. This is what you get !

      • oz4caster says:

        Thanks Clive. The NCAR/NCEP R1 was run back to 1948. You can get the monthly GMST output provided by Climate Reanalyzer here, and from that generate GMST anomalies relative to 1961-1990 if you like. At the link, for “Dataset” select the “Reanalysis [1st Gen] – NCEP/NCAR V1 (1948-2017)” then click the “Plot” button, which will plot a graph of annual averages. Below the plot, click the “CSV” link to download a comma-separated table of the associated monthly GMST estimates. That reanalysis uses a 2.5 degree lat/long grid which is probably fairly close to what is used in most long-range climate models.

      • dpy6629 says:

        Clive, Once again thanks for doing this work. It looks to me that the reanalysis is even further below the models than HADCRUT.

        It’s harder to argue with as I assume its TAS everywhere just like the model outputs. And its also the basis of initializing the models with initial conditions. If it was way off weather forecasts would also be a lot less skillful.

    • Olof R says:

      I suggest to use the ERA5 instead, the new state-of-the-art reanalysis from ECMWF.
      KNMI Climate explorer has data from 1996 through Oct 2018, but there is data available from 1979 at the Copernicus Climate data store (to huge for me to handle though)

      Anyway, a model/obs comparison from 1996 til now looks like this:

      The RCP scenario trends (1996-2018) are between 0.22 and 0.25 C/decade, Era 5 0.23 C/decade, and Berkeley earth 0.19 C/decade.

      • oz4caster says:

        Olof R, thanks. I was not aware that ERA5 was available on KNMI and I will check it out. I have been following Copernicus for a couple of years now and set up an account there a couple of years ago, but now I can’t find my login info. In looking there recently I get the impression that unless you work for a privileged agency, you must pay to download data sets. Since I am retired and independent with a daughter in college, paying for data is not an option. I can barely keep up with the four times per day download of the NOAA CDAS pgbh files (~91 mb each) in order to extract the 2-meter air temperature grid to compile daily GMSATA averages (plotted here).

        I’ve read that ERA5 has hourly output, which is excellent if you have the computer capacity and can write scripts to automate the analyses. Hopefully someone will take up this adventure and share online, because it should be very worthwhile and interesting. Although, I suspect Copernicus copyrights may not allow near real-time dissemination of the analyses except to a private paying audience unfortunately.

        I have not seen what NOAA is up to for improving CDAS. I know they are working on a replacement for GFS (FV3 model) that has variable grid spacing.

        • Olof R says:

          I would be happy if it were possible to download monthly ERA5 data from the Copernicus CDS. With daily and subdaily data, although reducing it from 24 to 4 times daily, the complete download (1979-2018) is still much too large for me to handle.

          Hopefully, ERA5 data will show up in more user friendly sites, like KNMI Climate explorer or ESRL/WRIT.

          Copernicus will introduce a near real-time daily reanalysis, ERA5T, in early 2019.

  7. Ron Graf says:

    Thanks for doing this Clive. You’ve aptly demonstrated the power of just a couple of choices to create a step differences in a chart. When the force of bias have billions of dollars and thousands of careers riding one can imagine a whole staircase here unrevealed.

  8. Ron Graf says:

    I think one of the largest biases is that the models project no volcanic aerosol cooling. They correct for it in hind-view. And since they get to place their own “estimate” of what the cooling effect was they can match the climate record with it, deceptively creating an image of past accuracy (and validation). The 1991 Mt. Pinatubo eruption is the best recent example.

    • Clive Best says:

      Yes they get the aerosol forcing right when they know what the answer should be.

      • Ron Graf says:

        Yes, although aerosols from major volcanic eruptions are unpredictable we can predict with certainty that when the eruption occurs modeled and observed temperature will converge and plot in lock step, erasing any gap of divergence accumulated prior.

  9. Howard Dewhirst says:

    Open any text book on any subject where modelling is involved and you will find something like the following “Only if the model effectively mimics reality can it be trusted to provide accurate ***** forecasts.’ How does the IPCC get away with their models failure to even come close to reality, why do governments not ask for a ‘Please explain’ before committing to such things as carbon tax?

  10. Pingback: Zeke’s Wonder Plot – Climate Collections

  11. David Laufer says:

    Clive – thanks for this very detailed and informative post and thread.
    Question: Is there a chart that demonstrates that weather prediction has become more accurate since, say, 1950 and today? We know that weather prediction absorbs a lot of supercomputer time, but has the result of all that computing power made weather prediction more accurate? Thanks!

  12. John Carr says:

    Clive, I think you are very generous to Zeke Hausfather qualifying his choice of baseline as “clever”. He has a plot from 1970-2020 and normalizes everything to the middle range 1981-2010, so it is no surprize that the data and models agree for most of the plot. Your choice 1961-1990 is much more reasonable, if the only analysis is staring at the plot to see if data and models agree, with your choice you see the discrepancy with that of Zeke it is camouflaged. Clearly if the analysis goes beyond the one picture to include changes from 1970 to 2020 the re-normalization makes no difference but with just the picture it is paramount.

    • Clive Best says:

      Absolutely right. The funny thing was that Berkeley Earth started out as an attempt by Richard Muller to bring in an independent fresh look after the leaked climate-gate emails showed some dishonest practices at work. Instead now Berkeley earth has morphed into the most alarmist temperature series data as demonstrated here. I think partly this is due to career progression which I can understand.

Leave a Reply