Where does data fall on the IPCC projection chart?

I’ve been downloading monthly surface temperature ‘model-data’ for the SRES A1B prjections from the climate explorer and mostly just looking at it. So far, I’ve downloaded data for the FOALS, GISS EH, CGCM 3.1 (T47) and ECHO-G. The “model weather noise” looks, shall we say “interesting”. But, I’m going to defer discussing that while I figure out how to run a consistent statistical test. Meanwhile, I’ve done the following:

  1. Removed the annual variability.
  2. Rebaselined each ‘run’ of a model to the average temperature from 1980-1999 inclusive for that speciric model run. (That is, “run 0” for model ECHO-G has a baseline, model “run 1” fomr ECHO-G has its own baseline. IPCC projections are communicated relative to this period in the AR4.
  3. Computed the average temperature over all runs, weighting equally, and also computed the standard deviation over all runs.
  4. Rebaselined HadCrut, GISS LandOcean and NOAA NCDC data using their average temperature from 1980-1999.

This method of re-baselining has the effect of showing us how that specific realization of weather varied over time. Here’s a plot.

IPCC Models Compared to Dataclick for larger

The central tendency of model temperature anomaly is shown in red; the pink denotes 1 standard deviation calculated over the 13 model runs. The average temperature anomaly of NOAA, GISS and HadCrut is shown in blue.

Comparing the model to data, we see the models and data fall in the same range during 1980-1999 (inclusive). This is not surprising as rebaselining each realization forces agreement on average during the rebaseline period. The ability of models to project should be gauged by whether or not the data and models diverge after 2001. Currently, the earth weather falls the 1 standard deviation range for the models.

Note that the temperatures currently fall below one standard deviation of the model spread, as can be seen by comparing the data to the 1-standard deviation ranges for model predictions shown in dark pink.

In this post, I’m not doing any particular analysis to test any hypothesis. However, I would like to remind readers that model spread is not necessarily “earth weather noise”. The model spread results from two factors:

  1. “model weather noise”– that is the internal variations for model predictions for a single model, like, for example Echo-G. If any model were capable of correctly modeling earth, this “model weather noise” would have the same statistical properties as “earth weather noise. However, it’s not clear model weather noise has identical properties to “earth weather”. (This is something I plan to example.) and
  2. differences in the average trend predicted by each model. So, for example, if modelers could run both CGCM 3.1 and GISS model E a million times, and then average, the two models might predict somewhat different results on average. This difference contributes to the spread of model predictions.

Which factor dominates depends on the relative magnitude of differences across models vs. the magnitude of true weather noise. The first includes the effects of parameterizations, gridding, initial conditions, and/or decisions by modelers to include or exclude volcanic eruptions. Quick comparison of “model weather” during periods free from volcanic eruptions suggests the magnitude and properties of “model weather” is rather dissimilar on each “model planet”. I’ll be able to say something more concrete at later points.

For now, what I can say is: compared to the model data I’ve downloaded, the earth temperatures are on the low end of the projected spread. Moreover, the year 2001, where I happen to start my hypothesis testing slopes is also on the low end of the spread.

7 thoughts on “Where does data fall on the IPCC projection chart?”

  1. Raven–
    I don’t think these describe “weather noise”. So, whatever % I use, the would describe the range of surface temperatures in the model spread. I’ll be making these fancier as I download more data.

    In particular, I’ll add confidence intervals after I do a full comparison of “model weather noise” to “real earth weather noise”. That test is my main reason for downloading. But to fully do it I need to a) consult with someone about correlation functions, b) get all the models downloaded and organized, and figure out how to do the test I want to do!

    However, I think, by Gavin’s recent expression of a theory of testing models, the data have to fall outside the 95% confidence for the spread of model runs. Those would be roughly 2 sigma– but the exact number will depend on how many runs I end up with. (FWIW, if one were to apply the theory that one can’t say the models are “off” unless the data is outside the 2 sd for individual runs, Ramstorf et al should never have suggested the data was “high” compared to the TAR prediction when he and his co-authors wrote a paper suggested it was high. Eyeballing, this data is “low” by more of a margin than that data was high! Also, he should never have shown the figure he showed. He should have dug up the TAR run models, and shown the data compared to the individual model runs and shown it was outside the 95% range! )

  2. Lucia,

    You say in the second “1.” above:
    “If any model were capable of correctly modeling earth, this “model weather noise” would have the same statistical properties as “earth weather noise.”

    Assume for the moment that you and I both had in our hot little hands the “correct model,” or at least one that was good for 100 year predictions within the bounds that nature gives us. Then assume we wanted to make the thing more realistic by adding stochastic elements to it, and we each did. Our resulting model noise would not necessarily be the same, yet per assumption we each started with a good enough model.

    Now the actual climate models are not, of course, assembled from a give good enough model and then random elements inserted to make the final outputs stochastic — at least I hope not. But, to put it in the terms of regressions models, there is no apriori reason why a modeler can’t get the deterministic part correct while messing up the stochastic part.

  3. Lucia,

    PS: My comment above notwithstanding, the comparison of model noise to between model variation (ceteris parabus) is very interesting. I would very much like to see what you get. While the analysis may run under your “Another Boring Post” heading, the implications of some possible ratios should definitely bring out even a most lay interest.

  4. Martin–
    I’m discussing GCM’s above. The GCM’s are not regression models. They based on solving a set of PDE’s that are attempts to model conservation of mass momentum and energy. If models, based on the correct physics, did correctly model the earth, the weather noise would be the same as the earth weather.

    In terms of physical models of this sort, there actually are reasons why, if the random parts are wrong, the mean goes wrong. Or at least, if the random parts aren’t replicated well enough, the mean ends up wrong.

  5. Lucia,

    Let me ask you this: are GCMs based on data gathered from and about our little planet? If the answer is yes, then the difference between GCMs and regression models is simply one of mathematical techniques, data sets, constraints. Not of underlying philosophy. The format Y = f( X, e| B) — where X is a vector of just about anything including the anti derivatives of various parts of Y (or maybe I should say it that all or parts of Y (think of it as a row vector of a set of endogenous variables) can be the total or partial derivatives of parts of the X — can be considered to include even GCMs. For there to be philosophic difference, the GCM modeler would have to removed from any observation of the meteorological data they are trying to model. Yeah, they should, but they don’t. Hence, the lack of difference or to use the words of G. B. Shaw “… we have established that. Now we are haggling over the price.”

    As for the requirement to get he random parts right, I think you mean get the distributions of the random parts right. Even in physics — or maybe especially in physics — one does not get the random parts right, only the distributions, tendencies, quantile ranges or what have you. The given those parameters or parameter proxies and a finite time range, I give you the Weierstrass polynomial approximation theorem with a finite set of splines and we have Y = f( X, e | B) for which the ‘e’ may be degenerate or mis-specified by the F but the whole thing fits reality within the possible limits posed by nature for the dimensions of the specification of Y, some N by P matrix of endogenous.

  6. Their noise is IMO highly unrealistic. Not nearly enough high-order LTP. Can’t wait to see your analysis. Consider using autocorrelation functions ACF of varying window size and position. Of course, the major problem with AR modeling is that if the noise is truly Hurst-like (or, say, 1/f), the ACF coefficients may not converge. Good chance of that. Interesting question for a qualified statistician. Possibly already in the literature. Good luck.

    P.S. Nice new look to The Blackboard.

Comments are closed.