GMST: Projections vs Observations
I’m still on preliminaries for the analysis I’ve been doing. But today, I’ll show two graphs, and then discuss the general goals of the analysis I will be posting soon.
Here are some graphs comparing observations of global mean surface temperature (GMST) to “hindcast/projections” from the AR4. The graphs are based on an ensemble consisting of 38 runs from 19 separate models. Observations are from NCDC/ GISS and HadCrut; land/ocean merges were selected because the IPCC compares projections to Land/Ocean merges in their documents. The anomalies for each data set and each of the 38 model runs are calculated relative to their average temperature from 1980-1999; this was the convention used in the AR4 projections.
The discussion today will be limited to discussing the two graphs, and explaining a bit of motivation for the upcoming statistical analyses. I’ll also be providing links to “model data” so those who might wish to find the data and replicate my analyses can find it. There will be no important statistical tests.
Data comparison
Figure 1 illustrates monthly data. The red solid curve indicates the GMST averaged over all 38 runs in the ensemble; the red dashed curve indicates the 2C/century central tendency from the IPCC. The pink solid lines indicate ±1 standard deviation for GMST predicted over the 38 model runs. If spaghetti curves for individual models were shown, roughly 2/3 of “model data” would fall inside the pink ±1 standard deviation at all times; “model data” would fall outside those bounds roughly 1/3 of the time.
The remaining colors are observations, with blue indicating the “merge 3 average” over NOAA, GISS and HadCrut:
As you can see, the data and observations seem to agree in some “average” sense. Having set everything to common 1980-1999 baselines, the temperature anomalies for data and observations are forced to agree, on average, during that period. The model projections shown in red were based on model runs/ SRES etc. available after 2001, when the TAR was published; that model data formed the foundation for projections in the AR4. Note that since 2001, the rate of increase in temperature observed has been small or negative. The data are currently dancing around the lower ±1 standard deviation bound for model predictions.
Since many people prefer to examine filtered data, I computed 12 month lagging averages for the data shown above. This is shown below:
For now, enjoy the graphs! I’ll update those as new data arrive; we’ll be able to see whether the data revert back toward model projections or stray away.
Upcoming analyses
Since I haven’t posted any results, people have been speculating. No, I’m not looking at long term persistence. That’s probably an interesting thing to look at, but I usually go for easy questipns and tests first!
The goal of my analyses will be to provide information one might use to answer a fairly narrow question. The generic question is: Is the spread in any observable predicted over the collections of GCM runs underlying the projections in the IPCC AR4 a realistic measure of “weather noise”? More precise statements of the question will be required (and provided) as I do a variety of tests.
Roughly the types of question I am asking would be: Do those pink ±1 standard deviation curves represent the scatter in GMST due to earth weather only or something else? Candidates for “something else” include uncertainty in our ability to predict climate trends, due to uncertainty in parameterizations for various physical processes.
As it happens, I think the truth of statement, “No. The spread in any observable is not d a realistic measure of ‘earth weather noise’ only” is obvious.
Moreover, I think any fair reading of the IPCC AR4, written and published before the temperature dropped in January 2008, tells us the answer is, “No”. The spread is influenced by “weather noise”, but the spread differs from true “earth weather noise” for many reasons, including different choice of parameterizations, and different methods of implementing numerics.
However, though I think the answer to the question is obvious, it appears that some others believe the spread is a measure of weather noise. So, the obvious must be shown– statistically! More on all that later.
For now, I’ll just provide details I can point back to later!
What data will I be using?
To create the graphs, and perform analysis, I downloaded model runs from: The Climate Explorer. I downloaded cases that I could use to create data sets that spanned from 1900 to 2099, with the future projected using the SRES A1B scenario. So, my “ensemble” includes the following models and corresponding runs:
- BCCR BCM2.: 1 run.
- CGCM3.1 (T47): 5 runs
- CGCM3.1 (T63): 1 run.
- CNRM CM3: 1 run.
- CSIRO Mk3.0: 1 run.
- GFDL CM2.0: 1 run.
- GFDL CM2.1: 1 run.
- GISS AOM: 2 runs.
- GISS EH: 3 runs.
- GISS ER: 1 run.
- FGOALS g1.0: 3 runs.
- INM CM3.0: 1 run.
- MIROC3.2 (hires): 1 run.
- MIROC3.2 (medres): 3 runs
- ECHO G: 3 runs.
- MRI CGCM 2.3.2: 5 runs.
- ECHAM5/ MPI-OM: 3 runs.
- UKMO HadGEM1: 1 run.
- UKMO HadCM3: 1 run.
The following two sets which meet my criteria are currently omitted:
- IPSL CM4 is listed as having 1 SRES run. I encountered an error message trying to access the run.
- PCM is listed as having 1 SRES A1B run and 2 historical runs. On downloading, there are 4 A1B’s and only 2 historical runs. I have enquired about this issue.
That’s all for today!
Yep. No big proclamations. Just a catchall to show some graphs, tell you where the data come from, and explain the motivation for what’s to come.
Questions welcomed.
Written by lucia.Comments Closed: If you would like them re-opened, Contact Lucia




Comments
John F. Pittman (Comment#5254) August 27th, 2008 at 2:24 pm
Is it stated for each model that aerosols were included?
lucia (Comment#5256) August 27th, 2008 at 2:31 pm
John–
The modeling groups are permitted a range of choices within the constraint of the SRES. So, some models include aerosols. But the parameterization they chose is up to the professional judgment of the modelers.
I think some of the spread in model projections is due to each group making slightly different choises.
There is a table in the AR4 detailing some choices for each model.
Chad (Comment#5257) August 27th, 2008 at 2:36 pm
Lucia-
I noticed when I was importing the data for sresa1b a while back that they don’t all have a common time interval. Some start in 1850,1860,2000,2001 or 2004 like GISS ER. When calculating a trendline, I think it might be a good idea to calculate it over an interval where the data represents the same number of models throughout. I suspect that calculating over the whole interval of time with a different number of models on different intervals might skew the results in some way. Just a thought.
Speaking of GISS ER, did you notice that strange dip around 2100?
lucia (Comment#5258) August 27th, 2008 at 2:43 pm
Yes! I noticed that dip, and didn’t even blog about the weirdness. It’s weird– isn’t it?
I’ve got a note in to Greet Van “xxx” about the some of the data sets.
I’m not sure what you mean about “the same number of models throughout”. In this post, I didn’t compute any trend lines. The red curve is just the average of over the 38 runs. There are 38 runs all the way from 1980-2030 (and even well before.) I added the dashed 2C/century line so people could see the average is increasing at 2C/century– as the IPCC said it does for these cases.
For now, when doing analyses, I will be using these 38 runs for precisely the reason you mention. For some analyses, the it’s convenient to have matching numbers. But, as it happens, I’m never looking at anything prior to 1900, so the different start dates for the different runs won’t affect anything I’m currently looking at.
Clark (Comment#5259) August 27th, 2008 at 2:48 pm
Why different number of runs for individual models?
Wouldn’t the s.d. of your model average GMST change if you changed the number of runs assigned to specific models? If so, doesn’t this prove by simple logic that the s.d of the model mean does not in any way represent weather noise?
lucia (Comment#5260) August 27th, 2008 at 2:53 pm
Clark-
The modeling community gets together in various forums and comes up with SRES etc. The propose a variety of challenges. Then individual groups decide which SRES to run, how many of each case to run etc.
The statistical tests will be based on what you are describing as simple logic. However, the tests are made systematic so we can quantify and estimate the probability that one would get a particular result given some paricular hypothesis.
So, I always try to explain the hypothesis, explain why that particular hypothesis is relevant as a matter of simple logic and then apply the test.
One test is going to relate to the residuals to ordinary least squares fits, one will relate to the variability of 8 year trends… and so on.
Fred Nieuwenhuis (Comment#5266) August 27th, 2008 at 6:43 pm
“Having set everything to common 1980-1999 baselines, the temperature anomalies for data and observations are forced to agree, on average, during that period.” I’ll put in the token skeptic comment regarding using GISTemp, Hadcrut and NOAA: with all the historical and current adjustments to these datasets, one could state rather that the “observed” temperatures are forced to agree with the models…
lucia (Comment#5268) August 27th, 2008 at 7:05 pm
Fred–
If I were to show the full hindcast, we’d see that the data don’t fully agree with the models. Certaintly, they don’t agree with individual model runs.
TomVonk (Comment#5283) August 28th, 2008 at 8:51 am
The figure 2 is interesting .
So all the complex models come with is T = at + b or in other words dT/dt = a ?
Why bother with LES when the truth is the simplest form and a 0D model – a straight line ?
There is one striking exception : years 92 – 93 .
In general the red doesn’t even get the right sense of variation of the blue , let alone the right value .
But in 92-93 it gets both remarkably right .
As this is filtered , the interpretations are tricky .
Is there something special in those early 90ies years that would explain that remarkable behaviour ?
Daniel Klein (Comment#5284) August 28th, 2008 at 11:05 am
TomVonk-
Yes, Mt. Pinatubo. Which happened before AR4. You do the math.
Francois O (Comment#5288) August 28th, 2008 at 2:28 pm
Lucia,
Are my eyes playing tricks on me, or is the spread between the one-SD curves widening over time? If so, why would that be?
MarkR (Comment#5289) August 28th, 2008 at 2:31 pm
So if we do an annual correlation between Merge L & O , and Models, and find that 92-93 is the period that correlates best, but that it was the period with the most un Model like conditions, and that these conditions were reported before the Models were designed, what should we conclude?
PI (Comment#5291) August 28th, 2008 at 2:41 pm
TomVonk,
It’s not correct to directly compare the variation in the red and blue curves. The red curve is the average of many different model runs, and as such will generally be less variable than the observations, which are just a single realization of the weather. Averaging together a bunch of different “model weather” realizations tends to hide the variability present in the models because high temperature fluctuations in some runs cancel with low temperature fluctuations in others.
The red line matches the blue better at Mt. Pinatubo because all the models have Pinatubo in them, so when you average them together you still get a net cooling. By contrast they don’t all get the big 1998 peak from El Nino, because while they do have internal ENSO variability, it’s stochastic and the El Ninos they generate happen at random times with varying strengths. So the El Ninos and La Ninas mostly cancel each other out in the average. (Sometimes people compare the models to an “ENSO corrected” set of observations to compensate.)
lucia (Comment#5292) August 28th, 2008 at 2:45 pm
Francois–
The spread does increase over time. This is a symptom of the fact that the spread is not “weather noise!”
The spread is the spread in model projections of GMST at any month. Since the individual models project slightly different things on average, the spread is due to “model weather noise” (that is, the variability for an individual model) plus the spread between the average that we would get if we ran any particular model one bajillion times.
Since we basedlined everything to 1980-1999, the spread is minimized in that region. Basically, we’ve forced the difference in model averages to zero during the baseline period, and in that period, the spread is dominated by “model weather noise”.
lucia (Comment#5293) August 28th, 2008 at 2:46 pm
Pi–
Actually, some models don’t have Mt Pinatubo in them. I’d need to look up the proportion. But, some models do have Mt. Pinatubo in them.
You’ll notice the model spread is a little larger near the eruption. The fact that some models include the eruption and others don’t likely contributes to the spread.
PI (Comment#5294) August 28th, 2008 at 2:51 pm
lucia,
Have you made the 38 temperature time series available for download anywhere?
PI (Comment#5295) August 28th, 2008 at 2:55 pm
I made a comment about El Chichon in 1982, but I got the years mixed up. We do see El Chichon in the observations; I’m not sure why we don’t see it much in the models, unlike Pinatubo.
lucia (Comment#5296) August 28th, 2008 at 3:32 pm
Pi.
No. They are available from the climate explorer. I’m not going to take the responsibility of maintaining an archive which would require me to institute some form of quality control. In anycase, I’m not sure it would be permitted. It’s not “my” data and copyright might be an issue.
BarryW (Comment#5315) August 29th, 2008 at 9:06 am
Not sure whether this has any bearing on what you’re doing right now, but I wanted to pass it along.
http://people.iarc.uaf.edu/~sa....._LIA_R.pdf
Jonathan asked me to respond to Penguindreams: Here’s the response. | The Blackboard (Pingback#5326) August 29th, 2008 at 3:47 pm
[...] We could calculate the central tendency from the models ourselves! Should you feel the need to see the average computed over the model runs that have been made available at the climate explorer, I’ve plotted them here [...]
Result of Hypothesis Tests: Very Low Confidence 2C/century Correct. | The Blackboard (Pingback#5574) September 29th, 2008 at 10:34 am
[...] It’s September 29, and I’m finally reporting the test of the hypothesis that the global means surface temperatures are rising at a rate of 2C/century, ( which is ‘purt dang close to the trend represented by the average of models in the IPCC AR4: That is to say, it represents a central tendency for the predictions of IPCC models used in the AR4. For graph of average of model runs compared to 2C/century, read 1.) [...]
Effect of Including Volcanic Eruptions on Hindcast/Forecast of GMST | The Blackboard (Pingback#5776) October 10th, 2008 at 10:16 am
[...] As usual, I’ll use the data I downloaded from The Climate Explorer, discussed earlier here. [...]
Let’s apply the method in Santer17 to GMST? (Part 1) | The Blackboard (Pingback#5865) October 16th, 2008 at 9:58 am
[...] Since this is a blog, documentation will be less involved than one might expect of a journal article. I will take the liberty of assuming you all have access to Santer17 which itself describes the caveats to the test. To minimize typing of equations, I will refer to various hypotheses and equations numbers using the values in that paper. Also, I will only compare models to one data set for GMST. It is the one I call “Merge 3″ and represents the average of NOAA/NCDC, GISS Land/Ocean and HadCrut. Also, I have not obtained all runs included in the IPCC AR4 and extending to the current period. Instead, I have used the runs discussed in an earlier post. [...]
Santer Method Applied Since Jan 2001: Average based on 38 IPPC AR4 models rejected. | The Blackboard (Pingback#5947) October 23rd, 2008 at 1:20 pm
[...] For this test, I use the 38 runs downloaded from The Climate Explorer are discussed here. [...]