As many readers know, I like to compare observations to projections rotating through various methods. Some statistical methods have more power (meaning they correctly detect statistically significant differences between a theory and data using smaller amounts of data.) Other methods have less power. Simply comparing the observations of global mean surface temperature (GMST) to the value predicted right now can be shown to have less statistical power when used to detect deviations, but it’s still worth looking at.
The figure below compares the 12 month lagging average GMST to the 12 month running averaged based on the average of 29 IPCC AR4 runs that included volcanic forcing before 2000 and project past 2000 using the A1B SRES:

The temperature for the observation and the individual runs are relative to their individual monthly averages from Jan 1970-Dec 1999. The ±95% uncertainty intervals represent the spread of all weather in all models, based on the assumption that the deviations from mean are normally distributed.
As you can see
- Using the simple eyeball test, the IPCC AR4 simulations were in quite good agreement with observations during the period between publication of the IPCC TAR and the AR4. This was a period when analysts would have the opportunity to make decisions such as tweaking model parameters, screening model runs, deciding whether to weight model runs when finalizing the projections in the AR4.
- Using the simple eyeball test, we see temperatures dropped publication of the document, but remain within the ±95% uncertainty intervals for all weather in all models.
It is worth noting that the IPCC projections are, evidently, based on averaging over models not runs. The uncertainty bands in figures in the IPCC AR4 are evidently 1 standard deviation for the average trend for all models rather than all possible weather in all possible models. As ±95% is roughly equal to two standard deviations, their uncertainty bands would be tighter. (I’ll eventually put together a graph using the weighting described verbally in the IPCC. I’m pondering how best to do it– no matter how I look at it, the temperature projections in my files always look much noisier than the ones in the IPCC AR4. I suspect smoothing was done. . . )
if this ain’t a picture of a massively cooling world I don’t know what it is..
http://www.osdpd.noaa.gov/PSB/EPS/SST/data/anomnight.11.17.2008.gif
In hindsight I think the lack of solar activity may be beginning to kick in on the SST wonder what Leif has to say on this or … David Archibald LOL
Vincent, the image is interesting. I’m thinking of looking at mid month images for the past year.
Lucia; sorry to be a pest but being non-statistical and having a ‘debate’ with an equally non-statistical AGW supporter, are you saying the model simulations/predictions are consistent with the observations to a confidence level of 95% during the period 2000-2007 only, or during the whole period of the graph?
Roger–
In this graph, the individual observations GMST are “not inconsistent” with all possible weather in all possible models.
This test has very low power. What that means is that it takes longer that necessary to show wrong things are wrong.
There is a sort of general principle that if you have two possible statistical tests, you would pick the one with greater power as better than the one with low power. So, using this graph as the criterion for testing models is unwise– because the method has low power! (That is, really seriously your goal is to avoid admitting an incorrect theory is wrong as long as you possibly can.)
Still, even though it’s a low power test, it is interesting to look at. But the test of the slopes have higher power.
Oh– Also– the observations are inside the ±95% in the entire graph shown. It would be amazing if the observations fell outside the ±95% bewteen 1970-1999. The baselines for the anomalies are defined to force the averages equal during that period. So, the deviations are mostly going to happen outside that region.