Yesterday, I showed that the if we treat a time series begining in January of a specified year (e.g. 2001) as the sum of “linear trend + ARMA11 ‘noise'”, that for start years in the early 21st century (aka ‘the forecast period), GISTemp trend for is to low to be consistent with the nominal trend of “0.2C/decade” in the AR4. We cannot however make the same conclusion if we test forecasting ability using data from the hindcast.
That said: it is the case that the multi-model mean is not constant during the hindcast. Some models included the effects of volcanic aerosols in their simulations. Consequently, trends computed with start years near the eruption of Pinatubo are higher than 0.2C/decade in the multi-model mean (just as they are in the data.) IF we are going to compare observed and modeled trends using a range of start years, consistency requires us to compare trends with similar start years.
Since this is so, in comments, I promised I’d replace the trace for 0.2C/decade with the computed multi-model mean trend based on the models used in the AR4. I have runs forced using the A1B SRES handy, so those are the ones I’ve super-imposed on the graph. I also computed the ±95 confidence intervals for our estimate of the multi-model mean trend. These are shown in slate grey below:
Oddly enough, it is difficult to fully evaluate which cases reject and fail to reject based on the graph above. Using the eyeball test we can confidently state:
- We would reject the multi-model mean trend as too warm relative to the observed trend if we based our judgement on trends computed starting in 2001, 2002. For these cases, the observed trend falls below the lower ±95% confidence interval for the multi-model mean.
- We would fail to reject the multi-model mean as too warm if we based our judgement on trends computed starting in 1999 or 1985. In those cases, the upper 95% confidence itnerval for the observation lies above the multi-model mean.,
Because confidence intervals do not sum linearly, other cases require computation. Unfortunately, I don’t quite know how to estimate the confidence intervals in this particular case. So, as an expedient, I did the following to estimate the 95% confidence intervals for the difference in two quantities:Compute what I call a “pooled confidence” interval by taking the square root of the sum of the squares of the two confidence intervals. (Monte Carlo will be required to determine whether this method results in an interval that is too strict or too lenient. But I suspect it will be close to correct.)
I then computed the ratio of the difference between the multi-model mean trend and the trend based on GISTemp to the pooled confidence intervals at each year. If this method of pooling confidence intervals is correct (and it would be in some limits), when this ratio is greater than 1 we reject the hypothesis that the multi-model mean is correct and decree that it is biased warm with a confidence level of 95%. If it is less than 1 we decree the mulit-model mean is too cold. Of course, these conclusions are contingent on accepting the statistical model of (linear+ARMA11) during the trend periods and the notion that mean trend from individual models are independent from one another.
For year tested, if we were to base our judgement on trends since 1980, 1995, 2001, 2002 or 2003, we would reject the multi-model mean as too warm. It’s worth nothing that the trend since 1980 is interesting because projections in the AR4 are stated relative to the mean of the 20 years from Jan 1980 to Dec. 2000. So, the trend since 1980 happens to correspond to the first year the authors of the IPCC selected for their baseline; 1990 would be the center of their baseline.
Comparisons in the 21th century are also interesting because those are in the forecast period. I think 2001 is the most rational year for the first test of a forecast, but others differ and think 2000 is the preferred year.
These are the results I’m getting right now. As I noted: I’m not sure my method of pooling confidence intervals is correct. But I suspect it’s close. I’ll be testing a bit later. Naturally, the weather will do what it will do while I am adding to this code. If it begins to warm rapidly, all trends will drop. If a big El Nino hits, the models may start to show no rejections.
Here is the graph with UAH. Note that with start years near 1980, trends with RSS and UAH are both below 0.2C/dec.
Example: with UAH clicked it reads:
Temperature Anomaly trend
Jan 1980 to Nov 2010
Rate: 1.3953°C/Century;
From -0.211°C to 0.220°C
But Tamino’s paper clearly shows the upper uncertainty intervals well below 0.2. What’s the deal?
Perhaps I’m not understanding the 2nd graph, but wouldn’t you expect the distribution of black circles to be random around 0? Doesn’t the fact that they are all close to or outside the 95% on the warm side indicate a systematic bias in all the models?
Or does it just mean that recent temps are at the low end of the expected range, and picking some other end point would have found results near to -1 mark?
Lucia –
The “about 0.2” K/decade figure bothered me as being too loose. So I looked at the multi-model mean temperature for the A1B scenario in AR4 WG1 Figure 10.5. By hand I added a trend line for the near term (2000-2020), and estimated its slope at 0.22 K/decade. Perhaps someone has access to the underlying numerical values and can evaluate that slope more precisely?
HaroldW–
Yes. The fits and the value in the table indicate “about 0.2C/dec” was rounded down relative to the multi-model mean. I guess I can add the 0.2C/dec for reference to the figure above– then you can see.