I’m still on preliminaries for the analysis I’ve been doing. But today, I’ll show two graphs, and then discuss the general goals of the analysis I will be posting soon.
Here are some graphs comparing observations of global mean surface temperature (GMST) to “hindcast/projections” from the AR4. The graphs are based on an ensemble consisting of 38 runs from 19 separate models. Observations are from NCDC/ GISS and HadCrut; land/ocean merges were selected because the IPCC compares projections to Land/Ocean merges in their documents. The anomalies for each data set and each of the 38 model runs are calculated relative to their average temperature from 1980-1999; this was the convention used in the AR4 projections.
The discussion today will be limited to discussing the two graphs, and explaining a bit of motivation for the upcoming statistical analyses. I’ll also be providing links to “model data” so those who might wish to find the data and replicate my analyses can find it. There will be no important statistical tests.
Data comparison
Figure 1 illustrates monthly data. The red solid curve indicates the GMST averaged over all 38 runs in the ensemble; the red dashed curve indicates the 2C/century central tendency from the IPCC. The pink solid lines indicate ±1 standard deviation for GMST predicted over the 38 model runs. If spaghetti curves for individual models were shown, roughly 2/3 of “model data” would fall inside the pink ±1 standard deviation at all times; “model data” would fall outside those bounds roughly 1/3 of the time.
The remaining colors are observations, with blue indicating the “merge 3 average” over NOAA, GISS and HadCrut:
As you can see, the data and observations seem to agree in some “average” sense. Having set everything to common 1980-1999 baselines, the temperature anomalies for data and observations are forced to agree, on average, during that period. The model projections shown in red were based on model runs/ SRES etc. available after 2001, when the TAR was published; that model data formed the foundation for projections in the AR4. Note that since 2001, the rate of increase in temperature observed has been small or negative. The data are currently dancing around the lower ±1 standard deviation bound for model predictions.
Since many people prefer to examine filtered data, I computed 12 month lagging averages for the data shown above. This is shown below:
For now, enjoy the graphs! I’ll update those as new data arrive; we’ll be able to see whether the data revert back toward model projections or stray away.
Upcoming analyses
Since I haven’t posted any results, people have been speculating. No, I’m not looking at long term persistence. That’s probably an interesting thing to look at, but I usually go for easy questipns and tests first! 🙂
The goal of my analyses will be to provide information one might use to answer a fairly narrow question. The generic question is: Is the spread in any observable predicted over the collections of GCM runs underlying the projections in the IPCC AR4 a realistic measure of “weather noise”? More precise statements of the question will be required (and provided) as I do a variety of tests.
Roughly the types of question I am asking would be: Do those pink ±1 standard deviation curves represent the scatter in GMST due to earth weather only or something else? Candidates for “something else” include uncertainty in our ability to predict climate trends, due to uncertainty in parameterizations for various physical processes.
As it happens, I think the truth of statement, “No. The spread in any observable is not d a realistic measure of ‘earth weather noise’ only” is obvious.
Moreover, I think any fair reading of the IPCC AR4, written and published before the temperature dropped in January 2008, tells us the answer is, “No”. The spread is influenced by “weather noise”, but the spread differs from true “earth weather noise” for many reasons, including different choice of parameterizations, and different methods of implementing numerics.
However, though I think the answer to the question is obvious, it appears that some others believe the spread is a measure of weather noise. So, the obvious must be shown– statistically! More on all that later. 🙂
For now, I’ll just provide details I can point back to later!
What data will I be using?
To create the graphs, and perform analysis, I downloaded model runs from: The Climate Explorer. I downloaded cases that I could use to create data sets that spanned from 1900 to 2099, with the future projected using the SRES A1B scenario. So, my “ensemble” includes the following models and corresponding runs:
- BCCR BCM2.: 1 run.
- CGCM3.1 (T47): 5 runs
- CGCM3.1 (T63): 1 run.
- CNRM CM3: 1 run.
- CSIRO Mk3.0: 1 run.
- GFDL CM2.0: 1 run.
- GFDL CM2.1: 1 run.
- GISS AOM: 2 runs.
- GISS EH: 3 runs.
- GISS ER: 1 run.
- FGOALS g1.0: 3 runs.
- INM CM3.0: 1 run.
- MIROC3.2 (hires): 1 run.
- MIROC3.2 (medres): 3 runs
- ECHO G: 3 runs.
- MRI CGCM 2.3.2: 5 runs.
- ECHAM5/ MPI-OM: 3 runs.
- UKMO HadGEM1: 1 run.
- UKMO HadCM3: 1 run.
The following two sets which meet my criteria are currently omitted:
- IPSL CM4 is listed as having 1 SRES run. I encountered an error message trying to access the run.
- PCM is listed as having 1 SRES A1B run and 2 historical runs. On downloading, there are 4 A1B’s and only 2 historical runs. I have enquired about this issue.
That’s all for today!
Yep. No big proclamations. Just a catchall to show some graphs, tell you where the data come from, and explain the motivation for what’s to come. 🙂
Questions welcomed.
Is it stated for each model that aerosols were included?
John–
The modeling groups are permitted a range of choices within the constraint of the SRES. So, some models include aerosols. But the parameterization they chose is up to the professional judgment of the modelers.
I think some of the spread in model projections is due to each group making slightly different choises.
There is a table in the AR4 detailing some choices for each model.
Lucia-
I noticed when I was importing the data for sresa1b a while back that they don’t all have a common time interval. Some start in 1850,1860,2000,2001 or 2004 like GISS ER. When calculating a trendline, I think it might be a good idea to calculate it over an interval where the data represents the same number of models throughout. I suspect that calculating over the whole interval of time with a different number of models on different intervals might skew the results in some way. Just a thought.
Speaking of GISS ER, did you notice that strange dip around 2100?
Yes! I noticed that dip, and didn’t even blog about the weirdness. It’s weird– isn’t it?
I’ve got a note in to Greet Van “xxx” about the some of the data sets.
I’m not sure what you mean about “the same number of models throughout”. In this post, I didn’t compute any trend lines. The red curve is just the average of over the 38 runs. There are 38 runs all the way from 1980-2030 (and even well before.) I added the dashed 2C/century line so people could see the average is increasing at 2C/century– as the IPCC said it does for these cases.
For now, when doing analyses, I will be using these 38 runs for precisely the reason you mention. For some analyses, the it’s convenient to have matching numbers. But, as it happens, I’m never looking at anything prior to 1900, so the different start dates for the different runs won’t affect anything I’m currently looking at.
Why different number of runs for individual models?
Wouldn’t the s.d. of your model average GMST change if you changed the number of runs assigned to specific models? If so, doesn’t this prove by simple logic that the s.d of the model mean does not in any way represent weather noise?
Clark-
The modeling community gets together in various forums and comes up with SRES etc. The propose a variety of challenges. Then individual groups decide which SRES to run, how many of each case to run etc.
The statistical tests will be based on what you are describing as simple logic. However, the tests are made systematic so we can quantify and estimate the probability that one would get a particular result given some paricular hypothesis.
So, I always try to explain the hypothesis, explain why that particular hypothesis is relevant as a matter of simple logic and then apply the test.
One test is going to relate to the residuals to ordinary least squares fits, one will relate to the variability of 8 year trends… and so on.
“Having set everything to common 1980-1999 baselines, the temperature anomalies for data and observations are forced to agree, on average, during that period.” I’ll put in the token skeptic comment regarding using GISTemp, Hadcrut and NOAA: with all the historical and current adjustments to these datasets, one could state rather that the “observed” temperatures are forced to agree with the models… 🙂
Fred–
If I were to show the full hindcast, we’d see that the data don’t fully agree with the models. Certaintly, they don’t agree with individual model runs.
The figure 2 is interesting .
So all the complex models come with is T = at + b or in other words dT/dt = a ?
Why bother with LES when the truth is the simplest form and a 0D model – a straight line ?
There is one striking exception : years 92 – 93 .
In general the red doesn’t even get the right sense of variation of the blue , let alone the right value .
But in 92-93 it gets both remarkably right .
As this is filtered , the interpretations are tricky .
Is there something special in those early 90ies years that would explain that remarkable behaviour ?
TomVonk-
Yes, Mt. Pinatubo. Which happened before AR4. You do the math.
Lucia,
Are my eyes playing tricks on me, or is the spread between the one-SD curves widening over time? If so, why would that be?
So if we do an annual correlation between Merge L & O , and Models, and find that 92-93 is the period that correlates best, but that it was the period with the most un Model like conditions, and that these conditions were reported before the Models were designed, what should we conclude?
TomVonk,
It’s not correct to directly compare the variation in the red and blue curves. The red curve is the average of many different model runs, and as such will generally be less variable than the observations, which are just a single realization of the weather. Averaging together a bunch of different “model weather” realizations tends to hide the variability present in the models because high temperature fluctuations in some runs cancel with low temperature fluctuations in others.
The red line matches the blue better at Mt. Pinatubo because all the models have Pinatubo in them, so when you average them together you still get a net cooling. By contrast they don’t all get the big 1998 peak from El Nino, because while they do have internal ENSO variability, it’s stochastic and the El Ninos they generate happen at random times with varying strengths. So the El Ninos and La Ninas mostly cancel each other out in the average. (Sometimes people compare the models to an “ENSO corrected” set of observations to compensate.)
Francois–
The spread does increase over time. This is a symptom of the fact that the spread is not “weather noise!”
The spread is the spread in model projections of GMST at any month. Since the individual models project slightly different things on average, the spread is due to “model weather noise” (that is, the variability for an individual model) plus the spread between the average that we would get if we ran any particular model one bajillion times.
Since we basedlined everything to 1980-1999, the spread is minimized in that region. Basically, we’ve forced the difference in model averages to zero during the baseline period, and in that period, the spread is dominated by “model weather noise”.
Pi–
Actually, some models don’t have Mt Pinatubo in them. I’d need to look up the proportion. But, some models do have Mt. Pinatubo in them.
You’ll notice the model spread is a little larger near the eruption. The fact that some models include the eruption and others don’t likely contributes to the spread.
lucia,
Have you made the 38 temperature time series available for download anywhere?
I made a comment about El Chichon in 1982, but I got the years mixed up. We do see El Chichon in the observations; I’m not sure why we don’t see it much in the models, unlike Pinatubo.
Pi.
No. They are available from the climate explorer. I’m not going to take the responsibility of maintaining an archive which would require me to institute some form of quality control. In anycase, I’m not sure it would be permitted. It’s not “my” data and copyright might be an issue.
Not sure whether this has any bearing on what you’re doing right now, but I wanted to pass it along.
http://people.iarc.uaf.edu/~sakasofu/pdf/Earth_recovering_from_LIA_R.pdf