Arthur’s Questions about Figure 9.5

Yesterday, Arthur Smith asked some good questions about my version of Figure 9.5. I answered in comments, but I think it’s worth repeating ‘above the fold’. These are Arthur’s questions:

Hi Lucia,

your graph doesn’t match the IPCC one very closely – why? Just looking around the 1998 El Nino peak, there are the following clear differences:

* IPCC peak of observed data (black curve) is very close to 0.9; your peak (black curve) is maybe 0.84.

* In the IPCC curve, the first model data point over 0.9 is in 1995. In your graph, the first model point over 0.9 is 1986 or 87 and you’re already hitting 1.2 not long after 1995.

* After the 1998 peak, the IPCC curve also shows observed temperatures above the model mean from 2001 through 2005 (looks like). Your curve shows observed temperatures below the model mean after 1998.

Why the differences? You picked a different collection of models to compare against? Why?

These are obviously good questions, because one expects to be able to replicate a figure based on the documentation once data are available. Unfortunately, it’s not quite as easy as one might think. So, I’ll respond to each observation:

IPCC peak of observed data (black curve) is very close to 0.9; your peak (black curve) is maybe 0.84

According to the Figure caption under figure 9.5 a in the AR4, “All data are shown as global mean temperature anomalies relative to the period 1901 to 1950, as observed(black, Hadley Centre/Climatic Research Unit gridded surface temperature data set (HadCRUT3); Brohan et al., 2006)” Hadley actually provides two products, under the “HadCrut3” label. The annual average values are available at:

NH/SH
and Area Average. I re-baselined each relative to 1901 to 1950, and plotted the result for both

Figure 1: HadCrut 1901-1950 baseline
Figure 1: HadCrut 1901-1950 baseline

Note using annual average data, the maximum is less than 0.80 not over 0.90 as it appears in IPCC figure 9.5. Anyone can repeat this easily using EXCEL.

The plots I showed yesterday were based on 12 month averages, computed from monthly data. This raises the peak to about 0.84C as shown on my graph. Those monthly data are also available. Anyone can download those and compare to the IPCC Figure 9.5(a).

So why is the HadCrut peak about 0.1 C higher on the IPCC figure 9.5a than on figured created using the current versions of HadCrut data? I don’t know. I could speculate…. but ….I don’t know.

“In the IPCC curve, the first model data point over 0.9 is in 1995. In your graph, the first model point over 0.9 is 1986 or 87 and you’re already hitting 1.2 not long after 1995.

Why the differences? You picked a different collection of models to compare against? Why?”

I think Arthur us discussing the ‘yellow’ traces in the AR4. The figure caption for figure 9.5 in the AR4 says these correspond to “individual simulations”

My graph certainly includes slightly different runs than are used in the IPCC figure. The reason is simple: It’s difficult to figure out which runs are used in IPCC figure 9.5.

Here’s the scoop: The IPCC figure 9.5(a) includes only 14 of the 23 AR4 AOGCM models. Their caption says, “Simulations are selected that do not exhibit excessive drift in their control simulations (no more than 0.2°C per century). Each simulation was sampled so that coverage corresponds to that of the observations.” So, the excluded runs from more than 1/3s of the models.

To exactly reproduce graph, I would first need to figure out which 9 models to toss. I’ve traced through the path leading from the figure caption, the appendix 9C, to the FAQ 9.2 figure 1 and back to appendix 9C and did not discover which 14 models the IPCC authors selected when showing how well the models hind-cast.

Based on the appearance of Figure 9.5, one might guess that, for the most part, the runs without volcanic forcing were removed. Since 14 model were forced with aerosols, this might seem reasonable. However, that’s not what the caption for Figure 9.5 actually says. Also, there is a problem with the theory: Model CGCM3.1 is listed as including volcanic forcings in a table in Chapter 10 of the AR4. It also appears that models is probably left off. (Otherwise, it’s cut off above 1.0 C)

So, did the IPCC authors leave off runs not forced with volcanic aerosols? Or did then leave only those with excess drift? If someone can find a list, that would be great. Better yet, if anyone can find documentation of the amount of drift in each models 20th century control, I’d love to see it. I’m sure I’m not the only one who would like to know which models exhibited more than 0.2C per century of climate drift in control runs corresponding to the 20th century.

Now, assuming I had a list that told me precisely which models to exclude from the hind-cast, to perfectly replicate the graph in Figure 9.5a, I would need to obtain PCMDI data and recompute the monthly data masking as discussed in a cited reference. Presumbably, that could be done. I’ve heard rumors it’s a snap, so presumably, I quickly and easily write a script to do as great a job as Santer evidently did with the CNRM3.0 runs for the Tropical Troposphere, (see discussion by SteveM.)

But, I didn’t want to reproduce graphs that look like this, so I took a short cut, relying on Geert Jan of The Climate Explorer, whose scripts create temperature fields that don’t look like lumpy top hats. I downloaded runs from the 25 using the SRES A1B scenario from the climate explorer except for the runs from “Essence”. To create the model mean shown in read, I ignored the two models not used in the IPCC. I chose this scenario because the caption for Figure 9.5(a) says, “Those simulations that ended before 2005 were extended to 2005 by using the first few years of the IPCC Special Report on Emission Scenarios (SRES) A1B scenario simulations that continued from the respective 20th-century simulations, where available”.

I did include those runs from the extra models as traces for the “runs”, so my plot includes 59 runs.

Why are the “highs” for the runs I display higher than those in the IPCC AR4? Whatever they did, it appears that the CGCM3.1, which supposedly include aerosol forcings are among the models the authors removed. I created a graph comparing observations to the hind-cast by CGCM3.1 so you can compare the two:

Figure 2: Comparison of CGCM3.2 Simulations to Observed Surface Temperatures.
Figure 2: Comparison of CGCM3.2 Simulations to Observed Surface Temperatures.

It’s also possible the masking to sample data to match the HadCrut grid made a differences resulting in slightly higher temperatures for the realizations. However, I suspect the main reason for the difference is CGCM3.1 were two of the 9 models the IPCC authors dumped when illustrating the ability of AR4 models to hindcast.

So, yes, because the documentation does not clearly call out which model are used in the IPCC Figure 9.5, I almost certainly used a slightly different collection of models than those in Figure 9.5.

Bear in mind: The projections in Chapter 10 evidently include all simulations, even those removed to display the fidelity of the “hind-cast” in Chapter 9. So, my choice shows how the models used for projections performed hind-casting the 20th century, while the IPCC choice shows how a subset of models did on the hind-cast. Also, I did not apply special masks; the IPCC authors did.

The difference in what our graphs show is subtle and may or may not be important to those deciding whether the skill of the hind-cast gives us confidence about the ability of models ability to forecast. It clearly makes sense to exclude the models that did not experience volcanic forcing when comparing in the hindcast. But, I’m not entirely sure that’s what the authors did when creating Figure 9.5(a) in the AR4.

6 thoughts on “Arthur’s Questions about Figure 9.5”

  1. Unfortunately the temps used here (GISS etc) to compare A with B are really really unreliable and in my view definitely not credible (read Urban Island Effect, spreading/extrapolating theoretical Arctic temperatures etc).
    http://wattsupwiththat.com/2009/05/10/a-report-on-the-surfacestations-project-with-70-of-the-ushcn-surveyed/
    and more etc
    Brooks, Ashley Victoria. M.S., Purdue University, May, 2007. Assessment of the Spatiotemporal Impacts of Land Use Land Cover Change on the Historical Climate Network Temperature Trends in Indiana.

    Christy, J.R., W.B. Norris, K. Redmond, and K.P. Gallo, 2006, Methodology and results of calculating Central California surface temperature trends: Evidence of human-induced climate change?, J. Climate, 19, 548-563.

    Hale, R. C., K. P. Gallo, and T. R. Loveland (2008), Influences of specific land use/land cover conversions on climatological normals of near-surface temperature, J. Geophys. Res., 113, D14113, doi:10.1029/2007JD009548.

    This is why it would seem there is no interest in using satellite data to play with here. Ironically the HADCRUT team that said UHI wasn’t important are now starting a project to show how hot is going to be come in cities due to AGW UHI What a joke!. I think the AGW story is rapidly collapsing all around and its time we all moved on… In fact Junk science has dropped it as a amin subject due to lack of interest LOL

  2. Lucia–
    You say in your post, “So the(y) excluded runs from more than 2/3s of the models.” Maybe I’m not understanding correctly, so correct me if I am wrong, but didn’t you mean “from more than 1/3 of the models” (9 out of 23)?

  3. Why do you not write to the authorities in southern hemisphere countries to ask how the results that they collect compare with all of GISS, HADCRUT and KNMI? Try as I can, I do not find that the Australian rural data reconcile with any of those 3. An example is the 1998 high, which scarcely shows a blip on the Australian rural graphs I have looked at (which are but a sub-set of the total). Also, it is not uncommon for an Australian rural station to show negative, zero or tiny positive warming in the last 40 years, especially the coastal rural ones and remote places like Macquarie Island.

    It is hard to reconcile a hindcast to data with big question marks as to reliability. As a minimum, the model should be able to explain why Macquarie Island (and the 3 main Australian Antarctic bases, Mawson, Davis and Casey) show trivial or no warming since their commencement about 1955.

    In strict terms, the failure of a hindcast model to reproduce these results should falsify it.

  4. Shallow–Yes, I meant 1/3rd. Fixed. Thanks.

    Geoff– Valid models should hindcast well. The hindcasts show a little too much warming. I’m planning to get the control runs to see if “drift” is the issue. But the batches I have show too much warming.

  5. lucia (Comment#13670)

    I’m rather concerned with the many sites that show NO warming. If you find that there is too much modelled warming in the control runs, does this make more sites show cooling after correction? Over the last 40 years?

  6. Geoff–
    If a control run showed a warming drift, you subtract that drift when correcting. But I don’t know what the control runs show.

Comments are closed.