May T Anomalies: Cooler than April.

I was beginning to think HadCrut May anomalies wouldn’t be posted until July! They are in now. I can now report that all three main observational groups report that the May anomaly was lower than the April anomaly. NOAA/NCDC was the first to report, and as I mentioned there, I’m showing graphs with start dates of Jan 2000 this month. (I plan to continue to haphazardly rotate between 1980, 2001 and 2001.)

In today’s post, I’ll show data from HadCrut (NH&SH), GISTemp and NOAA/NCDC compared to the multi-model mean for the A1B SRES which I computed from runs downloaded downloaded from KNMI’s climate explorer. I’ll begin by showing the 25 months running mean of monthly compared to projections under the A1B scenario. Trends computed based on monthly data are also shown. All data are baselined using Jan 1980-Dec 1999:

Examining this graph, two features jump out at me:

  1. The 25 months smoothed mean has remained largely below the multi-model mean projection during the entire projection period. This is true whether we consider the projection period to begin in Jan 2000 or Jan 2001. GISTemp and NOAA’s 25 month mean did rise above the multi-model mean for a very brief period early in the decade, but never wavered above the 1-σ uncertainty for the estimate of the multi-model mean. (That is: It never went above the upper solid grey trace.
  2. Computing trends beginning in Jan 2000 when the earth was in the influence of La Nina, and ending in May 2008 when it is near the end of a La Nina, the trends associated with observations range between 0.06C/decade and the 0.12C, which are considerably lower than 0.205 C/dec associated with the multi-model mean over the same period. (The trend for the multi-model mean is not affected by the timing of Earth’s ENSO cycle.)

Some visitors have wondered whether the 1-&sigma uncertainties for the model-means indicated by the dashed grey lines represents what we expect from “weather noise”. I’ve previously discussed why I think it does not showing a graph where runs from models with multi-runs were given identical colors. I’ll reproduce a version using the time scale shown above:


To my eyes, it appears the spread in temperature anomalies in the models is not primarily the result of “weather noise” in models; an appreciable portion arises from differences in forced trends across models. The earth’s temperature anomaly falls within the spread of anomalies of for all model runs, but is clearly inconsistent with some of the “warmer” models (for example, ncar-ccsm30. ) The reason the earth’s observed trend is inconsistent with the multi-model mean is the multi-model mean is running hot because the selections of models on which it is based are biased hot. That is: the models mean trends are not equally likely to be “too hot” and “too call”. A larger portion appear to be “too hot”.

Individual groups:
I previously discussed the comparison of trends and intercepts for NOAA and the multi-model mean. I also briefly discussed three separate tests that could be done to compare whether the observed data agrees with the multi-model mean projection. The results of these tests are not independent, they merely represent different ways of quantifying any deviation. Because the tests are statistical and the uncertainty in the data affects the outcome of each tests slightly differently, each test has slightly different statistical power and also has slightly different ‘features’ (or bugs if you prefer.) Also: all three tests are subject to the caveat that the results are affected by the statistical model I chose to represent the residuals to a straight line. I’m not going to discuss that in detail today (and have not fully explored all “features” anyway) but I’m going to be showing the outcome of each test on the figures below:

I’ll begin by comparing the GISTemp monthly anomalies and 13 months smooth temperature to 25 month smooth projections since Jan 2000:

In 2000, was below the multi-model mean temperature and outside the 1σ spread of 25-month smoothed temperatures for all model means. At the time, many would have attributed this to the strong 1999 La Nina. Since 2000, GISTemp has warmed at a rate of 0.12C/decade, which, despite computation starting during an Earth La Nina, is slower than the mean rate of 0.205 for models forced with the A1B SRES.

If Jan 2000 is deemed the the appropriate start date for testing whether observations are lagging the multi-model mean, a t-test for the difference bewteen the observed and multi-model mean results in d*=-1.26 if I assume residuals to the trend can be described by red noise. The quick and dirty (and sometimes wrong) short cut for assessing statistical significance is to deem the multi-model mean and observations differ if the absolute value of d* is greater than 2, so this would indicate difference in trends not is statistically significant. You can also see that the legend indicates the upper 95% confidence interval using red noise is 0.245; this exceeds 0.205 C/dec. So, the multi-model mean is consistent with the trend computed since 2000; so is 0C/dec. (For those wondering: This would appear to be the “Phil Jones” criteria used to decree warming since 1995 is statistically significant.)

However, there are many who dispute ‘red noise’ as an appropriate model for the residuals. For that reason, I resorted to an unconventional method of identifying the ARIMA(p,0,q) model with p≤4 and q≤4 that gave the largest uncertainty intervals on the trend, and recomputed d*. Using these uncertainty intervals d* for the difference in trend is -0.98. The absolute value is less than 2.0, and so using the short cut, we do not deem this statistically significant. (We wouldn’t if we ignored the short cut either.)

So, if someone insists that 2000 is the year for testing, currently, the multi-model mean is not-inconsistent with trends computed using GISTemp. (As many know, I favor 2001 as the most objective start year; this is based on when the SRES were published, and also the year in which most modeling groups formally end their 20th century runs and start their projections. For that choice, the d* for trends are -2.42 and -2.02 computed using red noise and ARIMA respectively.)

Having noted the the difference in trends computed since Jan 2000, I would like to draw attention to a second test. This one compares the difference in the 137 month mean (Jan 2000 to May 2011) and can only be applied to periods with start and end dates entirely outside the baseline. Jan 2000 represents the first possible start date to apply this test when temperature are rebaselined using Jan 1980-Dec 2000. (Note: A diffuse explanation of this tests can be found in focusing on statitical power and which explains what is actually being compared.)

In this test, the difference in the means are compared, with the uncertainty estimated based on residuals to a linear fit. That is: under the assumption that the trend in the multi-model mean and the trend in the observations are non-zero and equal to each other, we estimate the probability that the 137 month observed and mean temperature would differ as much as observed. If I assume the residuals to the fit can be described using ‘red’ noise, I find the normalized difference is d*=-2.38; if I assume the residuals are described using ARIMA, I obtain d*-1.89. The former is statistically significant; the latter is not (but it’s close. Because temperature remain well below the multi-model mean, I can state with some confidence that the latter will pass the threshold by year end.)

Readers will notice a third d* value indicated in the figure. It is the d* value for the normalized sum of the two previous d*’s. If I’d thought up this parameter back in 2008, I’d consider it a “robust to cherry picking” value, and has greater power than the test on trends, but I’m going to defer discussing it because formal tests shows it’s less powerful than the d* computed based on intercepts only and at this point in discussions, it happens to be giving very strong rejections, while the more statistically powerful test on the means alone is not. But the numerical result will be indicated for reference on future graphs.

For now: the summary for GISTemp: If we pick Jan 2000 as the start date for comparing models: The multi-model mean trend is consistent with GISTemp; the 137 month mean anomaly is inconsistent with the observed temperature if we use Red noise to model the data but consistent if we use the largest uncertainty intervals computed using ARIMA(p,0,q) with p≤4 and q≤4. Meanwhile, the 137 month multi-model mean anomaly is inconsistent with GISTemp if we assume residuals are models using ‘red’ noise, but consistent with we use the larger ARIMA(p,0,q) uncertainty intervals.

HadCrut
Examining HadCrutNH&NH we get somewhat different conclusions:

With HadCRUT:

  1. If we pick Jan 2000 as the start date for comparing models: The multi-model mean trend is in consistent with HadCrut; the 137 month mean anomaly is also inconsistent with the observed temperature. The absolute values of both d* values are 2.81 and 2.31 computed assuming residuals are ‘red’ or ‘arima’ respectively; both larger than 2. So, even starting comparison in Jan 2000, these observed trends are inconsistent with the multi-model mean..
  2. if we pick Jan 2000 as the start date: The 137 month multi-model mean anomaly is inconsistent with the observed 137 month mean using either ‘red’ noise or the wider uncertainty intervals computed using ‘arima’. The d* values are -3.06 and -2.53 respectively.

NOAA/NCDC
I’ve already discussed NOAA/NCDC, but I’m reproducing the graph with the d* values added:


For NOAA, both the multi-model mean trends and intercept computed since 2000 are inconsistent with the observations.

Summary

Using 2000 as the start date for analysis, and assuming red noise (i.e. “Phil Jones-like” noise) assumption to model residuals from a linear the multi-model mean trend under A1B forcing is inconsistent with observed trends based on NOAA and HadCrut, but remains ‘not-inconsistent’ with GISTemp. The 137 month (i.e. Jan 2000-May 2011) multi-model mean anomaly is inconsistent with all three observational data set if residuals are models using “red noise”; it is inconsistent with NOAA/NCDC and HadCRut but remains not-inconsistent with GISTemp if we use using maximal-uncertainty ARIMA to estimate the uncertainty intervals.

For those wondering about cherry picking: If the test is repeated starting with data in 2001, the absolute value d* for trends will increase, resulting in greater apparent inconsistency based on trends but the d* for the mean over the analysis period will decrease less apparent inconsistency for means. Qualitatively, the reported results would be similar to those found starting in 2000: If residuals are modeled as ‘red noise’ the multi-model mean is inconsistent with NOAA/NCDC and HadCRUT NH&SH. Whether it is inconsistent with GISTemp depends on whether we compare trends or period-means and whether we model the residual using ‘red noise’ or uncertainty interval maximizing ARIMA.

If I play Carnac, barring a Pinatubo or Agung size eruption, or the discovery that NOAA/NCDC and HadCrut’s temperature estimates are biased very low, I predict that in two years quite a bit of cherry picking will be required to not reject the hypothesis that the observed temperature are rising as rapidly as the multi-model mean. I also think the observed trend since both 2000 and 2001 will be positive. I could be wrong; I’m not betting any money on it. We’ll all still be watching temperatures in two years; we’ll see if I was wrong.

16 thoughts on “May T Anomalies: Cooler than April.”

  1. lucia
    Insightful.
    PS How do you “haphazardly rotate between 1980, 2001 and 2001.”?

  2. With HadCRUT:
    1.If we pick Jan 2000 as the start date for comparing models: The multi-model mean trend is in consistent with GISTemp; the 137 month mean anomaly is inconsistent with the observed temperature

    Should you change GISTemp to HadCRUT?

  3. Lucia,

    Can you give a link to the IPCC model data you used to construct trends of the individual models?

  4. Lucia, can you give a link to the source of your Hadcrut data? The offical CRU website still only provides data up to April 2011.

  5. I know you don’t want/like/do speculation about motives, but something jumps out at me. (And I know that you’ve done it before, but …… things change/develop).
    Anyway, is (do you think that?) a significant difference opening between the GISS product and the other two? GISS vs each?
    Maybe I’m just suspicious, but foxes and hen-coops spring to mind.
    I would greatly appreciate it were you (time & other efforts permitting) to keep an eye on these.

  6. GISS extrapolates over the poles. This means that any of the following could be true:

    1) GISTemp will be noisier because it overweights the temperatur anomalies in a ring around the pole.
    2) GISTemp might better detect the true trend if the rate of warming is different at the poles relative to the equator.

    The two are not mutually exclusive. The intended purpose of extrapolating over the poles is to reduce bias and better detect the true trend because it is thought the pole warm more than the equator. However, extrapolation does mean that the stations near the poles are weighted more heavily than would occur if you just didn’t include the poles in the calculation. So, GISTemp is likely to be ‘noisier’ in the sense of having greater monthly variability.

    Of hand, I can’t think of other reasons for differences between GISTemp, but I think the two I’ve noted are widely accepted as reasons.

  7. Yes. I think many of us are aware of these purported reasons.
    You are kinder than I.
    Models, schmodels.
    Particularly when extrapolating (isn’t there a ‘Thou shalt not …. ‘ somewhere?)

  8. I think I would only be gently enquiring. After all, activism is not necessarily synonymous with objectivity.

  9. Heretic– we have to wait a while to see if there is a significant difference between GISS and the others. This topic comes up all the time.

Comments are closed.