Do IPCC projections falsify?
(Are Swedes Tall?)
(Are Swedes Tall?)
Recently, Gavin at Real Climate suggested that the IPCC projections don’t falsify. He also explained the reasons he thinks they do not.
Today, I will explain that the IPCC projections do indeed falsify in any sense that is meaningful. But, you don’t think so, I think I will demonstrate you must also think we can use the average height of people from all countries to correctly predict the average height of Swedes.
My main counter argument to Gavin’s post is made by means of an analogy, and will be illustrated by a synthetic experiment comparing the “predictions” of the average height of Swedes to the actual “measured” (aka– synthetically generated). The point will be: If a predictive model (for heights or climate)is biased, but which contains lots of “model noise”, the falsification will manifest itself precisely in the way we are seeing, and which is illustrated in the figure below:

Figure 1: Illustration of distribution of Swedes heights to predictions of the height of Swedish men based on men of four other nationalities (based on a synthetic experiment). Note: The central tendency of “prediction” and the 1-sigma uncertainty bound on the means based on “models” on which it is based both fall outside the uncertainty bounds for the group of interest: Swedish men. The full interpretation of this graph is deferred,
What do Swedes have to do with climate change?
Nothing really. I bring them up because I’m pretty sure the difference in Gavin and my answers to the question of falsification from asking different questions, and I think a simple example using heights is helps me explain the answer to these questions:
- Is the mean trend in surface temperature over time predicted by the IPCC consistent with the temperature trends we have been experiencing? (That is: is 2C/century consistent with the trend we’ve seen? )
- Is the lowest uncertainty bound the IPCC shows the public consistent with the trend in GMST (global mean surface temperature) we have seen since 2001?
I think these questions are important to the public and policy makers. They are the questions people at many climate blogs are asking and they are the questions many voters and likely policy makers would like answered.
I think the answer to both questions is “No, the IPCC predictions are inconsistent with recent data. ”
What question is Gavin answering? I don’t know. I have my guesses, but preferring not waste time arguing strawman, I won’t go there.
On to men’s heights and climate change.
Let us now imagine a fictional panel of “height-o-logists” who will do their best to “predict” the height of Swedish men. However, they will be restricted as follows:
- The height-o-logists will have access to data for men’s heights in Norway, Vietnam, Malta and Portugal.
- The panel will not be permitted to know that Swedes are more like Norwegians than like Vietnamese etc.
- The height-o-logists will then average over all countries to develop a “model” that “predicts” the range of heights of Swedes.
How is this similar to Climate models?
Obviously, this clunky “model” to predict the height of Swedes is not a climate model. But it shares these factors:
- None of the “models” is the real thing we want to predict. Vietnamese men share some similarity with Swedish men: they are both homo sapiens. However, just as Vietnamese aren’t Swedes, the GISS Model E is not, strictly speaking, the planet earth.
- The panels don’t have enough information to know which “sub-model” is most closely resembles the thing they wish to predict. My height-o-logists don’t know Norwegians are similar to Swedes; the climatologists don’t know which climate model contains the best set of parameterizations.
- The panel makes a prediction based on the average of all possible “models”. We will call their full model the VMPN model. (Vietnamese - Maltese - Portuguese - Norwegian height model”
What does the panel create?
After much contemplation, the height-o-logists realize they have access to four “models” of humans. Since they don’t have access to Swedes, they decide to create a model by averaging over all the four “model” groups.
To mimic this process, I created a synthetic “super ensemble model”, whose properties I will not describe at length expect to say:
- The super ensemble model includes four “sub-models”, each of which creates a population of heights with an average height that matches a particular nationality out of the four sampled (that is Norwegians.) Because each model predicts a different average height, following the analogy, this mimics the effect of the IPCC including a range of models that predict different average trends in mean surface temperature under different forcings and histories.
- Each height sub-model also includes a random number generator to create “height noise” to mimic the effect of variations in human height on the average. In our analogy, this mimics the “weather noise” which exists in the real world.
- Each height sub-model also includes “regional, ethnic, generational ” noise in the measurement of the height within a country. In our analogy, this mimics the variability climate modelers introduce when including a range of different initial conditions into their model.
I ran the synthetic “sub-models” then “measured” 6 individuals and calculated an average. I ultimately created 56 averaged heights, and created this histogram showing the number of outcomes in each of the 56 “runs”:

Figure 2: Histogram of average height measured from batches of 6 men. The horizontal bars represent the number of samples in a 5 cm wide “bin”, the smooth yellow curve is the equivalent Gaussian curve, the solid yellow line is the average of all heights, the dashed yellow lines represent the ±1sigma bands on heights of groups of 6 men, the orange dashed line represents the 95% uncertainty bands for average heights of 6 men, calculated based on a Gaussian assumption. (Yes, they are “pseudo-error bars” because this distribution is not actually gaussian. Note however, that 3 realizations lie on the highside of these “pseudo-error” bars.)
Note that the graph provided by the “height-o-lotists” includes more information than conveyed in IPCC documents. The IPCC documents communicate a) the mean of the predicted trend in temperature (analogous to the mean height) and b) the ±1 sigma uncertainty intervals on the mean trend (analogous to the vertical dashed lines.)
To give more detail, I added the number count data to this diagram. This permits the reader to compare it to a graph Gavin provided at Real Climate:
What are the predicted heights?
Based on what they know, the height-o-logist panel decides to make the following prediction about those elusive Swedes:
- The predicted average height of Swedes is 171.2 cm.
- The standard deviation in the average height of all six-measurement batches used in our model is 5.9 cm.
- The height-o-logists explain that their findings are robust. For example, they explain that if they remove 1/2 the samples, they get the same average answer– but with more “height noise”. They can also note that the model also shows the rich are taller than the poor in all countries– a robust finding.
In the IPCC analogy, these correspond to the best estimate of the global means surface temperature (GMST) for particular year, and its standard deviation on graphics like this:

Figure 4: In this figure, the “mean” temperature as a function of time is communicated by the IPCC the bold solid curves. The spread of the haze communicates the 1-sigma uncertainty bands for predicted value of the underlying trend as reported by the IPCC. I have highlighted the “uncertainty” interval near the year 2007 with a vertical yellow line.
How are the height predictions this analogous to the IPCC Temperature projections?
These stated predictions are equivalent to the IPCC providing estimates of the central tendency at any time (say 2008) and 1 -sigma standard deviation of central tendencies predicted by all climate models. There is no “weather” or “height noise” in these predictions.
Now let’s test against the “real” world!
For outsiders to test against the real world, we must now collect data and treat it in a way that lets us test like to like: That is, average height of Swedes to average predicted height. (For climate modeling we collect data to compare average temperature trends to average temperature trends.)
How do we do this?
In the case of the height study, we go to Sweden and “measure us some Swedes!”
Since this is a simulated study, I “synthesized” the height of 6 Swedes using a random number generator set to provide an average height of 181 cm. Even though I only “measured” 1 batch of 6 Swedes, using the magic of Excel, I also computed 95% uncertainty intervals on the estimate of the average height of Swede based on my sample of 6 Swedes. (Similarly, just as I can only sample 1 realization of the earth’s global mean surface temperature, I can calculate the average trend consistent with measurements over time, and also estimate the uncertainty in the true underlying trend.)
Here is an illustration of the outcome of one particular experiment:

Figure 5: The true average height of Swedes is compared to the outcome from one measurement from a sample of 6. Note that the true mean falls between the 95% uncertainty bands: this is expected to occur in 1/20 experiments based on 6 measurements.
So, what does this figure tell me?
The graph indicates that based on the 6 measurements of Swedes the current best estimate of the height of Swedish men is 182.8 cm( illustrated with a solid purple line.) This happens to be greater than the “true” value of 181 cm illustrated in red. (In a normal experiment, we would not know the 181 cm, but I know this because this is a synthetic experiment.)
Having computed the uncertainty around my estimate, I would report that based on 6 measurements, the true height of Swedes is 182.8 cm ±3.8 cm with a confidence of 95%. These uncertainty intervals are illustrated with purple dashed lines above; they arise purely due to “height noise” in the specific sample population. Notice the red vertical line illustrating the true mean falls inside these uncertainty intervals.
So, with regard to Swedes, without knowing anything about the predictions of the height-o-logists, we can say that, based on the data from Swedes — not the height-o-logist model– the average height falls within a certain range. This estimate has nothing to do with predictive models, or model uncertainties.
The range is based on the properties of the Swedes we happened to sample.
The analogy to climate.
In the same sense, with regard to estimating the temperature trends experienced on earth: We can calculate a mean trend over a period of time and also estimate the uncertainty in the underlying trend based on the “weather noise” of the actual earth.
These are the sorts of trends and uncertainty intervals I computed in posts discussing falsification of the IPCC projections, and which can be read here as well as in many previous blog posts. (What we found was the trends consistent with weather since 2001 are inconsistent with IPCC projections for the central tendency.)
In both the case of the Swedes and the case of the empirically determined temperature trend, the estimates themselves and the stated uncertainty bounds can be questioned based on data or some physical understanding. But the prediction of the climate or height models are largely irrelevant to the empirical estimate.
So how do these compare to the “predictions”?
Of course, I already showed readers the outcome.
This is how the real data for Swedes compares to the predictions:
Notice the following:
- The central tendency predicted by models falls outside the 95% confidence intervals for all possible heights for Swedes. This can be observed by noticing the solid yellow line indicating the central prediction does not fall between the dashed purple lines indicating 95% confidence interval for the Swedes.
This is analogous to what I find when falsifying the IPCC projections for temperature trends: The central tendency predicted by the IPCC falls outside the confindence intervals consistent with recent measurements of the real, honest to goodness earth.
Conclusion: the predicted value of the central tendencies are falsified relative to the “true” value.
- The 1-sigma uncertainty intervals for the height fall outside the 95% confidence for all possible values consistent with real Swede. The 1 sigma error bars fall outside the central tendency for Swedes. This can be observed by noticing the dashed yellow lines indicating the 1-sigma intervals for the prediction do not fall between the dashed purple lines indicating 95% confidence interval for the Swedes.
This is analogous to what I found when falsifying the IPCC projections in March. The +1 sigma trend predicted by the IPCC falls outside the 95% confidence intervals for the real temperature consistent with the measurements on earth.
So, the full region between the 1 sigma error bars is falsified.
- The average height of Swedes falls inside the 95% uncertainty bands for the full range model height outcomes. This can be seen by noticing the solid purple line for the average height of Swedes falls inside orange dashed lines for the 95% bands of the individual model outcomes.
In fact, in the model I concocted 3 groups of 6 Norwegians and 1 group of Portuguese ended up “taller” than the average of Swedes. So, 4 out of 55 “realizations” were taller than the Swedes.
But: Notwithstanding Gavin’s laser like focus on this point, this feature does not unfalsify the two previous items! The full model gives biased predictions.
What is the significance of the fact that the heights of Swedes do fall in the full range for models? It means the models are biased and imprecise.
In the case of the “height” analogy we know why they are biased: All of the models are “shorter” on average than Swedes. The falsification of the mean is not due to “height noise” in individual men. The issue of “height noise” is accounted in the uncertainty intervals for the heights of Swedes and is reflected in the uncertainty intervals shown with purple dashed lines.
In the “height” analogy, the average height for Swedes falls inside the total range of heights predicted by the climate models because even though
a) the models are biased on average it is also true that
b) some of the models happened to be close to right.
In particular, the Swedes tend to fall in the range of height for Norwegians.
So, the whole VMPN -model biased– despite the inclusion of Norwegians.
Interestingly, we could keep collecting lots and lots of data on Swedes, we would likely continue to find they fall in the range of heights for Norwegians. The result is: even though over all, the “height” model as a whole is distinctly biased, the average heights of Swedes will tend to fall inside the full span of predictions for the model predictions.
Is this model useful for planning purposes? I guess that depends. But you requisition uniforms for the Swedish army based on that model, expect to run out of uniforms for tall men quickly.
What does this mean about the IPCC predictions?
When compared to earth, the IPCC AR4 predictions appear biased and falsified for that reason. So, one might want to consider this when making plans for the future.
The fact that the some models in the lower end may be predicting things accurately doesn’t magically erase the issue of bias. The fact that the climate model has huge uncertainty bars doesn’t “unfalsify” the full model with regard to the question about the central tendency:
The central tendency predicted by the model appears biased relative to the weather data.
What is the cause of the bias? Beats me!
In the case of IPCC predictions, we might suspect that at least some of the models contained in the ’super ensemble’ over-predict temperature increase on earth during the current period of time.
But some models may be ok. Just as Norwegians are a fairly decent model for predicting the height of Swedes, some of the individual models used by the IPCC may be less biased relative to the true earth.
In this regard, it might be best if future panels engage in a winnowing process to remove individual models that appear less trustworthy from the full collection of model used to make projections. I suspect they will– but this doesn’t retroactively fix the AR4 projections, which appear biased.
So… about that falsifiability issue.
Because Gavin brought up such a novel idea for not-falsifying models I need to comment on Roger Pielke Jr’s frequent discussions of falsifiability.
One of the interesting things about Gavin’s method is that, oddly enough, the use of many models results in a humongounourmous range of “weather” that can be consistent with model predictions. If this were due solely to the range of variability of weather on earth, that would be fine. But, a sizeable amount is due to the “climate parameterization noise” which causes a sizeable spread in predictions. Insisting that we cannot observe the central tendency of the predictions is clearly biased does, indeed, result in the “unfalsifiability” problem often brought up by Roger Pielke Jr.
Why? Because no matter how biased the VMPN-height model is relative to Swedes, the fact that it contains Norwegians means that the heights of Swedes will always be contained in some realizations in the ensemble.
But the average will still always be wrong, and the VMPN model has no skill. But evidently, we are not permitted to observe this average is wrong for. For… some… reason.
In conclusion
The IPCC projections remain falsified. Comparison to data suggest they are biased. The statistical tests accounts for the actual weather noise in data on earth.
The argument that this falsification is somehow inapplicable because the earth data falls inside the full range of possibilities for models is flawed. We know why the full range of climate models is huge: It contains a large amount of “climate model noise” due to models that are individually biased relative to the system of interest: the earth.
It will continue to admit what I have always admitted: When applying hypothesis tests to a confidence limit of 5%, one does expect to be wrong 5% of the time. It is entirely possible that the current falsification fall in the category of 5% incorrect falsifications. If this is so, the “falsified” diagnosis will reverse, and not we won’t see another one anytime soon.
However, for now, the IPCC projections remain falsified, and will do so until the temperatures pick up. Given the current statistical state ( a period when large “type 2″ error is expected) it is quite likely we will soon see “fail to falsify” even if the current falsification is a true one. But if the falsification is a “true” falsification, as is most likely, we will see “falsifications” resume. In that case, the falsification will ultimately stick.
For now, all we can do is watch the temperature trends of the real earth.
Update: I realized I’d left the word “model” out of “climate model noise”.
Comments
lucia (Comment#2769) May 14th, 2008 at 1:43 pm
It declined from March? Wow!
I haven’t mostly been expecting increases right now.
John V (Comment#2772) May 14th, 2008 at 2:55 pm
lucia,
After just a quick read I can see a couple of major issues with your response to Gavin. I don’t have much time right now so I’ll just mention them quickly:
#1: In the graph showing “Error bars according to Gavin” you took the error bars on a 7-year trend and extended them out 100 years. Of course they look too large. The 20-year trend would look much more reasonable.
#2: Your analogy to the height of Swedes is only applicable if the height of Swedes changes over time. How does ENSO fit into your analogy? What about the Schwabe cycle? You can wave these off as “weather noise” but your confidence intervals do not include this noise.
John V (Comment#2773) May 14th, 2008 at 3:00 pm
Also, don’t forget that the IPCC prediction is *not* for a constant trend. The IPCC model results which form the basis of the prediction clearly show that. What you have falsified is a constant trend of 2.0C/century. The confusing thing from a scientific pov is why you insist on calling the constant trend the IPCC prediction.
lucia (Comment#2774) May 14th, 2008 at 3:03 pm
JohnV–
Yes, the “weirderman” lines for uncertainty bars by Gavin apply only up to 7 years. It is very difficult to make these short and with the correct slope with the graphics software I have.
The ENSO issue falls in the category of meaningful issues with regard to discussing calculation of the uncertainty bars for the experimental data. As you know, it is the sort of question I take quite seriously and has been raised by you and others. Of course… I’ve addressed in a not statistically rigorous way here. (There are flaws in that analysis because the MEI contains a lagged variable, so C-O isn’t quite right. As I’ve mentioned, figuring out exactly how to deal with t hat is something I am planning to do. IN the meantime, that stands as a blog-quality estimate which indicate that given the several switches experienced during the current period, ENSO should have mostly averaged out.)
If there are other cycles, I’m willing to do similar back of the envelop estimates of the effect to magnitude of the effect and discuss them.
Oh… I think I noticed Gavin thinks the solar effect isn’t large? ![]()
lucia (Comment#2776) May 14th, 2008 at 3:06 pm
JohnV:
About the non-constant trend: What are you talking about? The IPCC projections for the underlying trend are nearly constant in the first 30 years of this century.
The falsification assumed the temperature itself will not vary monotonically, but will vary around an underlying trend due to the natural variability of “weather”.
And no, the analogy of heights applies is terms of statistics. This is an issue about how one falsifies predictions of averaged properties. Averaged trends are averages, have error bars etc.
It is an analogy, but analogies share some features and don’t share others. That’s the way they work.
John V (Comment#2777) May 14th, 2008 at 3:15 pm
lucia,
I agree that the “IPCC projections for the underlying trend are nearly constant in the first 30 years”.
Where we disagree is that you’ve taken a 7-year trend, found that it is not consistent with the underlying trend, and concluded that the IPCC prediction is falsified. The IPCC prediction for 7-year trends is all over the place. The model results clearly show 7-year trends in the range of the observed trend.
Here’s how I see it:
- IPCC prediction = model results (by definition);
- Model results include the observed trend;
- Therefore, IPCC prediction includes the observed trend;
lucia (Comment#2778) May 14th, 2008 at 3:22 pm
JohnV–
The IPCC projections have a central tendency. The IPCC communicates that. That central tendency is inconsistent with the data we have.
If your point is that some individual models predict lower trends than the others, and those models may be ok, then I agree with you. The models that predict trends below the 1 sigma uncertainty intervals may be ok. That’s what I say in the article. Just as the “Norwegian” model give pretty good results for the “Swedes” some of the models may be ok.
But falsifications to test skill are done on central tendencies of the overall projections. That central tendency is higher than consistent with the data we have recorded since the projections were made. The fact that the collection of modles used all together results in a set of projections that is both imprecise and biased does not, in anyway, make it impossible to show the preditions are biased.
Biased means falsified.
Larry Bolz (Comment#2779) May 14th, 2008 at 3:36 pm
A quarter drop in the month to month anomaly, over the entire planet, for an entire month is “whopping”? Next we’ll be hearing a three-quarter positive linear trend over 125 years is important.
Besides .26 lower than the March anomaly, it’s also .23 lower than last year’s April anomaly, .10 lower than the average of the preceding 10 April anomalies or .10 higher than the average of the preceding 30 April anomalies.
So this April is .+41 which is plus/minus .10 of the decade and three decade averages for the month.
So far that puts the year at +.35 What’s maybe also curious is that DJF is at +.27 (the lowest it’s been since 1994, although close to 2001 and 1997). So we seem to be tracking for a year of an anomaly around +.35 plus or minus .5 or so, so far.
That would be more consistent with 1977-2000 at an average of +.23 (trend around +.3, .1 to .4)) then 2001-2007 of +.54 (trend around +.15, .45 to .60)
The important thing to remember here is these are non-physical conceptual numbers and we can look at them and combine them all we wish.
Larry Bolz (Comment#2780) May 14th, 2008 at 3:39 pm
With the models, I agree when Roger Pielke Jr said:
I am sure that some model somewhere has foretold how the next 20 years will evolve (and please ask me in 20 years which one!). And if none get it right, it won’t mean that any were actually wrong. If there is no future over the next few decades that models rule out, then anything is possible. And of course, no one needed a model to know that.
Don’t get me wrong, models are great tools for probing our understanding and exploring various assumptions about how nature works. But scientists think they know with certainty that carbon dioxide leads to bad outcomes for the planet, so future modeling will only refine that fact. I am focused on the predictive value of the models, which appears to be nil. So models have plenty of scientific value left in them, but tools to use in planning or policy? Forget about it.
Those who might object to my assertion that models are of no practical use beyond political promotion, can start by returning to my original question: What can be observed in the climate over the next few decade that would be inconsistent with climate model projections? If you have no answer for this question then I’ll stick with my views.
John V (Comment#2781) May 14th, 2008 at 4:00 pm
lucia,
Do you agree that that the model results define the IPCC prediction? If so, then the IPCC prediction is not falsified.
You have demonstrated that *short-term* observations are inconsistent with the *long-term* central tendency of the IPCC prediction. I’m not arguing that.
What I’m saying is that the short-term observations are *consistent* with the short-term predictions. Apples with apples.
Boris (Comment#2782) May 14th, 2008 at 4:12 pm
About the non-constant trend: What are you talking about? The IPCC projections for the underlying trend are nearly constant in the first 30 years of this century.
Haven’t we been through this before? You’re still comparing climate predictions with observed weather and thinking the observed weather can falsify the climate predictions. It’s just not true, even if Roger really really wants unforced variability to be the same as forced variability. No amount of blog posts or lanky Scandinavians will turn weather into climate. Only time can do that.
I asked Roger, and of course never got an answer:
What observations in the weather of the next two weeks would falsify the hypothesis that summer is approaching?
Thinking about this question might help people understand the difference between variability (weather/weather) and signal (seasonal/CO2).
lucia (Comment#2783) May 14th, 2008 at 4:18 pm
JohnV-
I think the predictions as published in the IPCC AR4, in tables and figure define the IPCC predictions themselves.
There may be many ways to interpret and present the underlying model runs. The way the IPCC selected and presented constitutes their predictions.
It is perfectly possible to estimate an underlying climate trend that is predicted to vary smoothly using short periods of data. The fact that the period is short results in large uncertainty intervals; longer periods result in smaller uncertainty interals. The large uncertainty intervals associated with short amounts of time usually results in large rates of “failure to falsify” even highly biased, inaccurate predictions.
But that doesn’t mean we can never falsify. True falsifications can happen with short amounts of data. In this instance, we got a falsification of an underlying 2C/century trend.
lucia (Comment#2784) May 14th, 2008 at 4:24 pm
Boris–
GMST isn’t reported at two week intervals. So, with regard to that: there are truly no two week trends that could falsify warming. Moreover, if we could get GISS or Hadley to scurry and provide this data, it would a) possible to answer that and b) show the answer requires weather variability outside that ever experienced in any two week period ever recorded.
So, as a practical matter, the answer to the two week question is: There is no weather over two weeks that can falsify. But 8 years is enough to falsify a prediction for an underlying trend of 2C/century. It would cerntainly be enough to falsify a prediction of 10 C/century, had that been the prediction. The amount of data required it ends up taking to falsify a prediction that is wrong depends on a) how far off the projection, b) how much weather variability really exists on the real earth and c) some degree of chance.
Arthur Smith (Comment#2785) May 14th, 2008 at 4:58 pm
Hi Lucia - from the discussion on Schwartz we know the temperature record has significant correlation from one year to the next, and month to month is certainly highly correlated. So in reality, even if there are no large-scale oscillatory “weather” effects on the temperature cycle introducing their own correlations (like the solar cycle), you still know that a 7-year record fundamentally does not give you that many independent degrees of freedom.
With your Swedish height analogy, it’s as if your 6 independent height measurements turn out to have been made from individuals in just 4 different families, who are themselves only a little more distantly related. That is, the low standard deviation in your 6 measurements of real Swedes (the 7 year record) is *artificially low* because of the genetic similarity (year-on-year autocorrelation).
You would get a better comparison even with the same number of data points by using years that are further apart (individuals that are not genetically related). As has been pointed out here before, I think.
Second point is, as I think John V is trying to get at here, and Gavin’s central point: when you look at the trends in the model runs, the range of trends is very large if you only look at 7 years, or 10 years. But when you increase the length of the run you are looking at to 20 years, 30 years, the standard deviation in the trend drops significantly. The IPCC trend and standard deviation as published is based on the model 20-30 year or more numbers, not on the range of 7 to 10 year trends. That’s the big problem: you’re not comparing apples to apples, in even a reasonable fashion.
Maybe an analogy would be if in your heights model you start by measuring individuals of any age, for which you get a huge range of heights, but the average comes out just a bit less. But your Swedish measurements are only of adult males, while the height-o-logists were actually talking about the model measurements for juvenile females. The original numbers could be limited to adult males too, and the standard deviations in the individual models become much less, the average goes up quite a bit, and you get numbers that are more definitive for falsification or non-falsification. Right now you just don’t have enough years of records to falsify - and if you include a significantly longer time record, the trend *does* agree with IPCC (but of course you argue it predates the latest IPCC report - but then go back and look at the predictions of the first reports).
By the way, your 95% vertical bars for the Swedish heights look awfully narrow compared to the Gaussian figure underneath…
EJ (Comment#2786) May 14th, 2008 at 6:05 pm
Lucia,
Nice work. Should be in a text somewhere.
The problems climate science face, IMHO, are due to scale. Time scales, spatial scales etc.. Just eyeing the GCM’s, I don’t see much decadal variations in most models predictions to “wait for”. In otherwords, the majority of predictions show no 8 year period of no increase or actual declines.
The question I have is simple.
If a model predicts no flat or downward eight year trend anywhere in the range of say 100 years into it’s future, then why wouldn’t that falsify that particular model immediately? I don’t think the claim that 8 years is to short of a dT should apply if said model has no prediction of any such trend.
Al Fin (Comment#2787) May 14th, 2008 at 8:32 pm
The problem is that current models are not good enough to be falsified. Lucia makes the mistake of taking the models seriously enough to falsify, when they do not deserve that much respect.
Modelers wish to be taken seriously, without having their projections seriously examined or falsified. It is worth the price of admission to watch them squirm on the hook.
lucia (Comment#2789) May 14th, 2008 at 8:42 pm
Al Fin: All models can be compared to data. The problem with the modelers is they want to insist that we use the uncertainty intervals from the ensemble of models which include both weather noise and the spread due to the different climate parameterizations selected by modelers. They don’t want to permit use of the uncertainty intervals based on weather alone.
Using that rule, the models can never be falsified unless nearly every model in the batch is biased in the same direction. Worse, as a result of weather noise, the whole lots have to be strongly biased to falsify.
That’s why I use the uncertainty based on the weather data only.
EJ (Comment#2790) May 14th, 2008 at 8:50 pm
But haven’t you falsified nearly “every model in the batch”?
jmrSudbury (Comment#2791) May 14th, 2008 at 9:03 pm
Everyone seems to be forgetting something. Gavin’s error bars are wrong. They suggest that only temperature matters, but that is not the case. The level of CO2 matters as well. The level of CO2 has risen more than expected, so only the models that use high CO2 emissions assumption qualify. The Constant scenario, the orange and light orange section of Figure 4 above do not qualify. Actually even the blue and light blue section should be disqualified from consideration.
In other words, the lower error bar should have a definately positive slope. Estimating from that graph, I estimate that lower error bar should be 1 degree over 60 years or 0.167 C per decade. I suspect that even the mean for the GISS and HadCRUT datasets would be outside the 95 percentile.
John M Reynolds
lucia (Comment#2792) May 14th, 2008 at 9:26 pm
EJ– I haven’t looked at individual models. I only examine the final prediction of the AR4 and compare to uncertaintys.
John– According to the AR4, the projections between 2000-2003 were nearly the the same for all the SRES. So, the predictions/projections are not expected to differ much.
But yes, in general, if the GHGs exceed the SRES, then in principle, the earth’s temperature should have increased even faster. But since the IPCC didn’t project how much, I don’t think we can quibble with Gavin’s uncertainty bounds for that reason.
Boris (Comment#2793) May 14th, 2008 at 9:35 pm
Let me ask my question a little bit better:
What weather could be observed in two weeks in Colorado that would falsify the hypothesis that summer is approaching?
Bow, Lucia, you said that the answer would be that we’d have to see something that had never been seen before in two weeks of weather in order to falsify the hypothesis. Why do you think that is true?
Raven (Comment#2795) May 14th, 2008 at 9:56 pm
Boris says:
What weather could be observed in two weeks in Colorado that would falsify the hypothesis that summer is approaching?
That would depend on the meaning of the word “summer”. The generally accepted definition is based on a calender which means the that summer is going to come no matter what weather occurs so your question is irrelevant.
If you mean to suggest that summer is defined to be a period of hotter than normal temperatures then even a two week period of -60C temperatures could not falsify the hypothesis because the hypothesis is supported by years of repeatable empirical data (i.e. we know the summer is a period of hotter weather because that is what has been observed for 1000s of years). That kind of repeatable process of prediction followed by experimental outcome makes the hypothesis that summer is coming a scientific theory/fact rather than a hypothesis.
Unfortunately, your analogy breaks down completely when it comes to the effect of CO2 because we have no experimental evidence that conclusively supports the hypothesis. All we have is a single uncontrolled experiement with so many variables that no reasonable person can claim to know for sure how much of the outcome can be attributed to CO2. This makes the falsification of the IPCC projections significant because it tells us that there are likely better hypotheses out there and we need to go look for them.
John V (Comment#2797) May 14th, 2008 at 10:55 pm
lucia,
In response to Boris’ thought experiment about two week temperature trends that could falsify the arrival of summer, you replied that it would require “weather variability outside that ever experienced in any two week period ever recorded”.
I think this is getting close to the central problem with the way you are using a 7-year trend. Your method is similar to looking at the trend for the first 2 weeks of May, carefully correcting for auto-correlation, and comparing it to the average trend for the first 2 weeks of May.
An example:
Where I live, the first week of May was *very* cold and the second week was average. The trend was probably something like 1.5C/day. For fun, I’ll put the 95% confidence intervals at 0.5C/day to 2.5C/day (using Cochrane-Orcutt of course). My local meteorologist says the central tendency of the trend for May is 0.1C/day. I therefore conclude that I have falsified my local meteorologist. The average trend is at least 0.5C/day.
Obviously that’s wrong. For the 2 week trend, it’s obvious that you would look at historical trends to understand the expected
uncertainty. You can not just “use the uncertainty based on the weather data only”. Why is that not necessary for a 7-year trend?
=====
I can think of a few ways to quantify and/or compensate for the uncertainty in the 7-year trends:
SIMULATION:
Use one or more AOGCMs to determine the expected uncertainty in 7-year trends.
This is what Gavin has done at RealClimate. He found that the standard deviation of 8-year trends was 2.1C/century. Using this estimate for the uncertainty in a 7-year trend, the current observations are consistent with 2.0C/century.
Result: Current observed trend agrees with IPCC prediction
—
OBSERVATION:
Use one of the well-known temperature series and calculate every possible 7-year trend. Also calculate an estimate of the underlying trend using 30 years or some other suitably long period. Subtract the centred 30-year trend from the centred 7-year trend at each year. Throw out trends involving volcanoes.
I did this using OLS on yearly GISTEMP. The differences between the 7-year trend and the “underlying trend” (33-year trend) has a standard deviation of 2.5C/century.
Result: Current observed trend agrees with IPCC prediction
Incidentally, I get a standard deviation of 2.2C/century when using 8-year trends. This is in good agreement with the model-generated uncertainty.
—
ADJUSTMENT:
Do something like Douglass & Clader (2002) and attempt to determine the effect on the trend of known cycles (primarily ENSO and solar). Adjust the observed trend for the known cycles. You have done this for ENSO and looked at it for the solar cycle.
From my back-of-envelope calculations using Douglass & Clader (2002), the observed trend is in good agreement with an underlying trend of 2.0C/century after compensating for ENSO and the solar cycle. (IIRC, I got 1.8C/century after compensating).
Result: Current observed trend agrees with IPCC prediction
This is my preferred method because it reduces the uncertainty. Like you I don’t like ridiculously large uncertainty intervals, but without adjusting they are necessary.
—
I apologize for the long comment. To recap:
Please explain why it is not necessary to estimate trend uncertainties for 7-year trends.
Len Ornstein (Comment#2798) May 14th, 2008 at 11:24 pm
Lucia:
As Arthur Smith notes, in Figure 1, your graphed, purple, barred, vertical lines appear to mark a 68% confidence interval, not a 95% interval. This slightly changes part of your argument.
The important point that you and Roger seem to make, in less than a direct way, is that combining small numbers of real, biased GCMs as if they were random samples of an unbiased set of possible GCMs does not lead to histograms which can legitimately be assigned ’standard’ confidence intervals. They should not be analyzed as if a symmetrical Gaussian is the proper model to fit to such histograms. In so far as Gavin – and the IPCC – may convey a sense that that’s a proper statistical procedure, they would be wrong. But I’m not sure that’s what they’re trying to do – or are perceived to be doing.
The fact that GCM outputs are biased with respect to one another is well known. Therefore some, or all, must be biased with respect real world data. This makes your ‘discovery’ of bias no surprise. Unbiased models are better than biased models. But biased models are not automatically ‘falsified’, in the Popperian sense.
The IPCC is trying to provide some guidance for what may be looming catastrophes, given a set of less-than-perfect tools – the current GCMs. But even 95% confidence always leaves a 1 in 20 chance of being wrong. Until the data are ‘better’, we’ll have to live with substantial uncertainty.
The ‘converted’ and the ‘deniers’ should be kept on their toes – as Roger tries to do – most of the time.
Martin Ringo (Comment#2799) May 14th, 2008 at 11:37 pm
Lucia,
I am going to differ with you on the basics, the height of Swedish men example. What you have there is a straightforward comparison of means from two samples: the Swedish men (sample S) and the Vietnamese, Maltese, Portuguese, Norwegian (sample VMPN), and the test is not a comparison of one statistics versus a parameter. The null hypothesis should be the mean of S is equal to the mean of VMPN, and the test is the traditional comparison of sample means from two, different-size samples. Further, if the VMPN is guaranteed to have 25% Norwegian men and those men have the same population mean as the Swedish men, the chances are you won’t be reject unless you get large N (which can be calculated) because the difference in population means amongst the V, M, P and N parts will persist.
This point applies also to the IPCC projections unless they wish to argue that such projections should be viewed as non-random variables. The problem with the IPCC projections is two fold. First, there is the problem of interpreting the distribution of the forecast (e.g. are we mean to take the error bars as standard deviation estimates from a normal population?). Second, the forecast standard errors, assuming that we could get them, should be inclusive on the error in the drivers (the explanatory variables of the model: the greenhouse gases, the solar input, the atmospheric particulate matter, etc.) for the forecast period. What you really want to compare is the IPCC ex post forecast, i.e. with the known driver values plugged in but all the endogenous variables left for the forecast. (I have little idea how about the mechanics of that with a model like GCM, and there is lots of room to fudge when there are dynamics involved.) In the absence of such a forecast with its forecast standard errors, there isn’t a lot of alternative other than to treat the lower bound of the forecast band as a parameter to compare with estimated values all expressed as trend versus trend.
Assuming for moment that Gavin is correct in the sense that that forecast standard errors are larger than you assumed — yes, I know that he didn’t express it that way, but that is what it boils down to. Then, the Type II (Beta) Error is that much larger, and the empirical content of the forecast/projection is that much less. I can say that global temperature is going to constant plus or minus 10 degree C. That statement is tough to falsify, but it is also almost void of meaningful empirical content. This point is something you made very well in some earlier post. It didn’t generate much excitement — falsifying is much sexier — but the issue of the Type II Error is equally important. Hence, the importance of holding the IPCC’s feet to the fire with regard to making forecasts in the form that can be falsified. That is: Mr./Ms. Modeler, what is the level of temperature trend for a X year period that would falsify the model as an explanation of the determination of global temperatures?
EJ (Comment#2800) May 14th, 2008 at 11:37 pm
“Until the data are better”, we’ll have to live with substantial uncertainty.
What a statement. Does this mean that until our thermometers, as adjusted, match the models, the future is uncertain?
Who knew?
pliny (Comment#2801) May 15th, 2008 at 12:45 am
Lucia,
I have to agree with John V on this one. I objected some time ago, as I think others did, that you are falsifying a prediction that the IPCC never made. It seems to me that if they ever had been called on to make a prediction for the 2001-2008 period, that they would have done so taking into account the spread of results from model runs, as Gavin has described.
You said in your answer to Al Fin that, because modellers used a statistically inconvenient and apparently arbitrary selection of parameters for their runs (which adds to the spread due to weather noise), this made falsification including model run range variability impracticable. Therefore you would proceed, in effect, as if the model spread was zero, and just treat weather noise.
I understand about the statistical difficulty. However, the modellers are addressing a real issue. They have a lot of input data, all subject to some uncertainty. Ideally, they would do a designed experiment, sampling from the ranges in a factorial style, but so far it appears that that is just too large an enterprise. So they use this ad hoc range variation.
That can be criticised as giving an inadequate measure of the uncertainty. But it is a measure. Treating it as zero, as in your analysis, is no advance in accuracy.
Nick Stokes
I think both you and Steve Mosher have pointed to a statement in Sec 10.7.1 of the AR4 WG1 as a justification for your choice of .2 C/decade as the “prediction” to be falsified. The context there was a discussion of “committed” change, and it is true that they spoke of such rates of change without uncertainty ranges. That is bad (though I think context may be a defence) But it doesn’t help to insist that because they omitted uncertainty it there, that it can be taken to be zero.
dover_beach (Comment#2802) May 15th, 2008 at 1:22 am
This is increasingly becoming a new form of apologia. It is, at once, as captivating as it is disturbing.
anonymous (Comment#2803) May 15th, 2008 at 2:12 am
Lucia I suggest that you only use reliable UAH/RSS satellite temp data for you calculations then of course its gonna look even worse…. The GISS data is contaminated and not reliable (see lampasas TX for example) and all the stations that A watts has reported as being in category 3-5 (not acceptable). The modelers cannot accept anything that does not go up because their careers and science are at stake.. its very simple.. my apologies if this offends some people but the future will bear this out…The latest RSS data is indeed dropping alarmingly -1.20F and doubtful if it will go back over normal range soon for May so J hansen will have a problem for may temps… LOL
lucia (Comment#2805) May 15th, 2008 at 4:09 am
Martin:
Further, if the VMPN is guaranteed to have 25% Norwegian men and those men have the same population mean as the Swedish men, the chances are you won’t be reject unless you get large N (which can be calculated) because the difference in population means amongst the V, M, P and N parts will persist.
This is pretty much the point I’m trying to get across. The real question we people want answered is this:
Does the mean recommended by “models” correctly predict the mean for the “real” system. But the answer is no. And using Gavin’s method to test, it will take forever to reject because the Swedes are so close in height to the Norwegians.
Basically, there are two different questions:
a) Do the Swede’s heights fall inside the population of the models? Yes. They do.
b) Does the central tendency recommended by the panel of height-o-logists fall inside the range of uncertainty of the Swedes heights? No it doesn’t.
Analogously, we can ask:
a) Does the earth’s temperature fall inside the range of projections of models? Yes, it does. That’s what Gavin showed.
b) Does the central tendency recommended by the IPCC fall inside the range of uncertainty for the earth’s true trend? No it doesn’t. That’s what I discuss and show.
There are issues with my illustration of (b). These are associated with whether or not we know the uncertainty intervals etc.
But Gavin wants to use the uncertainty intervals for (a) to answer (b). They aren’t the same: They are never he same. Moreover, the uncertainty intervals for (a) are always larger than for (b) because they include the scatter due to spread in models. (This is the Vietnam to Norwegian height analogy.).
The fact is, the uncertainty intervals, the logic and the methods to get the answer to questions (a) and (b) are different.
On the correct test of the VMPN model itself– I try to I stick to discussing whether the final predictions are inside the range permitted by the weather is that I testing final numbers disseminated to the public. I do this because I know testing actual models is more difficult.
The only numbers clearly communicated by the IPCC are the means, and occasionally the standard deviations. So, that’s all I use– overlaying assumptions like Gaussian. ![]()
lucia (Comment#2806) May 15th, 2008 at 4:28 am
Len–
The gaussian shows the standard deviations in the heights of Swedes which doesn’t change with sample sizes.
There are 6 samples in used to determine the mean. The standard error in the determination of the mean from 1 sample is that standard error in the height divided by the square root of (6-1).
The 95% confidence intervals are then obtained by multiplying by the “t” value of 2.44.
So, that is the correct Gaussian for the actual heights, and those are the correct confidence intervals for the uncertainty in the determination of the mean of the population based on six measurements. (Yes, I know it’s confusing.)
lucia (Comment#2807) May 15th, 2008 at 4:36 am
Nick–
You said in your answer to Al Fin that, because modelers used a statistically inconvenient and apparently arbitrary selection of parameters for their runs (which adds to the spread due to weather noise), this made falsification including model run range variability impracticable. Therefore you would proceed, in effect, as if the model spread was zero, and just treat weather noise.
I’m not assuming the model spread is zero in the true earth weather variability. This isn’t a simplifying assumption. The “model” spread in the true earth weather variability is zero.
The earth isn’t a model, and it’s not one version of a model. We aren’t guessing at the physics: The are what they are.
The model spread is unrelated to the true weather noise of the earth. The true magnitude of weather variability existed before models were run, written or even dreamt of. It pre-exists computers and man.
If someone advances reasons why the uncertainty bands I am using are unrealistic, I accept that information and look at it. But the component of the model spread that is due to the variations in “physics” in the model is orthogonal to the weather variability.
I’ll whip out some graphs from the AR 4 to show that later today.
pliny (Comment#2808) May 15th, 2008 at 5:41 am
Lucia, let’s simplify it. Suppose the IPCC has one model, and believes it knows all the inputs accurately except for aerosols. However, there is always weather noise. It runs the model for a high estimate and a low estimate of aerosol, and says there is a high and low value of trend, and predicts accordingly for the decade. You then seek to falsify this against weather noise. Then, to be fair, you would have say the estimate was OK if either value passed. Because although the physics “are what they are”, the uncertainty was acknowledged as part of the prediction. So you have to allow for both weather noise and “model variation”.
Now of course, there are many models and many input subjects to error. And the uncertainty is acknowledged (see eg Sec 10.5.2, and Fig 10.1, with nearby text), and appeared as the spread in the temperature plot you posted in an earlier thread.
Nick Stokes
steven mosher (Comment#2809) May 15th, 2008 at 6:29 am
another point that goes unnoticed is that some models hndcast better than others. But the less skillful models
are not winnowed out, so you have “forecasts” from models of varying skill, which of course leads to a wide
range of foreacsted trends.
lucia (Comment#2810) May 15th, 2008 at 6:43 am
Pliny–
How does your example affect the empirical estimate of the weather variability associated with the true earth? It doesn’t.
The earth either was or was not screened by aerosols. In the real world effect of aerosols is a certain amount. We may not know that amount, but the earth and physics don’t care. The temperatures on the real earth evolve irrespective of our understanding.
So, your hypothetical only affects the uncertainty in our ability to predict.
But after prections are made and the earth turns goes around the sun for a number of years, we can ask either of two questions:
1) Did the earth’s temperature fall inside the uncertainty bounds of models?
2) Did the central tendencies predicted by the IPCC fall within the weather variability
The answer to 1 is yes. The earth’s temperature falls inside the uncertainty bounds of models. Gavin’s post answers this questions So, one might infer that is the question he prefers to answer. However, he doesn’t appear to state his question explicitly.
The answer to 2 appears to be “no”. The central tendendies predicted by the IPCC falls outside the range of weather variability for the real earth.
Why does this happen? Because the uncertainty intervals for true weather are smaller than the uncertainty intervals for the collection of models. And once we know the trend that happened, we can exclude trends we couldn’t exclude when we were predicting.
It just so happens we can exclude the central tendency in our current instance.
steven mosher (Comment#2811) May 15th, 2008 at 7:04 am
lucia march was adjusted downward as well, I think. have a look
pliny (Comment#2812) May 15th, 2008 at 7:29 am
But Lucia, this is where I don’t follow. In this example, the IPCC didn’t predict a central tendency. That’s your construct. They predicted a range depending on aerosols. As in the real case they predict a range depending on all sorts of things, expressed as a variability over model runs.
Now it’s true that the range as usually expressed as a centre point plus/minus something. And the centre point is often quoted as a summary value. This is common throughout practical science. And it even sometimes happens that people forget to quite the range, or errors. But you can’t “falsify” the prediction without reference to the range.
lucia (Comment#2813) May 15th, 2008 at 7:45 am
Pliny– The IPCC predicted a central tendency of 2 C/century for the first few decades of 2000. This is illustrated in this figure in the technical summary and chapter 10 of the report by WG1:
The AR4 also states quite clearly that prediction/projections during 2030 are not strongly affected by the specific sceneario inside the range considered in the SRER.
Is your point that we shouldn’t expect this to apply because we are outside the SRES? If so… sure. That would be a physical explanation for the falsification. However, in that case, the flaw in the IPCC process is the selection of the SRES and not the models.
I focus on testing whether the projections/ predictions fall inside the range of trend consistent with what we have experienced on earth. The central tendency doesn’t.
I’m not delving into the issue of attribution. What’s the point of trying to figure out why the projections are wrong when we can’t even get agreement on whether they fall outside the range consistent with earth’s true temperature trend?
jmrSudbury (Comment#2814) May 15th, 2008 at 7:49 am
For the 2000-2003 period, the error bars may be too similar for all 4 scenarios, so a negative trend could be consistent. Beyond that period the lower error bar would have to be positive even for the orange Constant Composition Commitment scenario. But we do not qualify for the Constant Composition Commitment scenario. Really, we only qualify for the A2 — high emission scenario, so why should we consider the model runs for the other scenarios?
Steve. As of 2008/05/02 GISS had 2008 12 26 67 but is now 2008 13 26 60. I never knew that April could make January warmer!
Oh, and HadCRUT is in too: 2008/04 0.250 C
John M Reynolds
pliny (Comment#2815) May 15th, 2008 at 7:58 am
Lucia,
That Figure 10.1 caption is quite explicit - “Lines show the multi-model means, shading denotes the ±1 standard deviation range”. It’s just the mean of a range, not a “central tendency”.
The scenarios are not the issue. That is just a circumstance where the IPCC has chosen to identify a particularly important determining input and draw special attention to the consequence of a policy decision.
lucia (Comment#2816) May 15th, 2008 at 8:54 am
Pliny–
It’s just the mean of a range, not a “central tendency”.
The defintion of “central tendency” is:
The term central tendency refers to the “middle” value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation.
(See http://www.quickmba.com/stats/centralten/ )
How do you use the term?
John M. Reynolds–
Oh yes. The past temperatures shift around.
I don’t remember specifically, but I seem to recall when I reran the regression in April, the March temperatures were warmer than April, but the April temperatures also changed compared to the value reported in April. (I can’t remember in which direction. Maybe I’ll need to report this as we go forward.)
steven mosher (Comment#2817) May 15th, 2008 at 8:59 am
Lucia comment 2810 Captures it PERFECTLY.
here is another way to put it. The observations since 2001 rule out a forced trend of .2C per decade
at 95% confidence.
Or another way to look at it. The observations since 2001 indicate a trend of X plus or minus e.
Now, lets look at all the models and see which models fall within the envelope of the observations.
Some will others wont. which should you trust going forward
Don B (Comment#2818) May 15th, 2008 at 9:12 am
I understand what you and Roger Pielke, Jr. are trying to do, applying statistical tests which scientists appreciate.
However, to inform the general public and most policy makers who are not scientists, why not use a falsification test which is simple common sense?
The existing climate models say increasing atmospheric CO2 leads to higher temperatures, but the 1960’s and 1970’s in a period of rising CO2 were colder, not warmer, and the last decade since 1998 has been cooler as well. Three decades out of five is a 60% failure rate.
mz (Comment#2820) May 15th, 2008 at 9:40 am
But 8 years is enough to falsify a prediction for an underlying trend of 2C/century.
Really? I’d think it were possible for the 8 year trend to be downwards every now and then even if the century has a total 2C warming trend.
William Connolley plotted 5, 10 and 15 year trends on the existing temperature record as an example:
http://scienceblogs.com/stoat/.....trends.png
Of course it’s trivial to generate data that even has a trend of 10C/century and periods of -10C/decade if you want. You’re into physics and climatology territory when looking at if they are probable or even plausible (they’re not of course).
But really, one way to approach this that to me makes pretty good sense in a way:
Look at the various individual climate models and their projections, and count how many negative 8 year trend occurrences there would be on average (and 2 C positive century trends). This avoids the statistical smearing effect.
It’s been used for extreme weather predictions too, where averaging multiple models for a single projection and then counting the extremes from that average temperature is wrong, as the peaks are not at the same times and are smoothed out - you get better resutls by counting the number of extreme events in each model and then averaging the number of extreme events. Or at least in theory.
lucia (Comment#2821) May 15th, 2008 at 9:57 am
mz–
This is the answer I post evertime someone brings up that graph:
WC showed that downtrend can occur during periods with strong mean– but only after volcanos erupt.
To apply the method you suggest and prove 8 year flat trends are consistent with underlying “up” trends of 2C/century you must:
1) Find periods when we have a proven 2C/century uptrend over a sustained period.
2) Find 8 lots of 8 year flat trends
3) Show they aren’t associate with an phenomena known to cause flat trends (volcanos) and
c) Show there are enough of these flat trends.
Every single flat trend in WC’s image is associated with a volcanic eruption.
Duane Johnson (Comment#2823) May 15th, 2008 at 10:29 am
As I see it, Lucia’s approach of evaluating the central tendency of the IPCC predicted temperature trend against measured real world temperatures is the preferred approach. It provides an incentive for the “predictors” to do a better job, whether through the use of better models, better selection of models used for the predictions, or better interpretation of modeling results. Time will tell if the present prediction bias is confirmed.
In contrast, the Gavin approach, defended by John V and others, would appear to remove the incentive for improved development, selection, and application of the models, since to do so would increase the likelihood that the IPCC predictions can be shown to be biased, if in fact they are, as seems likely at present.
Darwin (Comment#2824) May 15th, 2008 at 10:43 am
In other words, how is the lack of warming over the last eight years explained in light of the increasing CO2? A physical theory requires a physical explanation and truly inquiring minds need to know. It’s the only way to get a more accurate representation of how the world really works. And that’s what we really care about, isn’t it? Getting it right as opposed to proving ourselves right.
John V (Comment#2826) May 15th, 2008 at 10:46 am
Test image before writing a full comment — feel free to delete…

Larry Bolz (Comment#2828) May 15th, 2008 at 11:10 am
This sums it up pretty well I think:
“the uncertainty intervals, the logic and the methods to get the answer to questions (a) and (b) are different.”
You can’t use the same things on something with large uncertainty levels and on something with small uncertainty levels.
” some models hndcast better than others. But the less skillful models are not winnowed out, so you have “forecasts” from models of varying skill, which of course leads to a wide
range of foreacsted trends.”
Which have to be treated different than reality is, with different uncertainty intervals, logic and methods.
PaulM (Comment#2829) May 15th, 2008 at 11:16 am
Some of the comments on this blog are just amazing. Nick/Pliny is attempting to deny that the IPCC made a prediction of 0.2 degrees per decade. The SPM says very clearly “For the next two decades a warming of about 0.2°C per decade is projected for a range of SRES emissions scenarios.” And John V is on at Lucia for not estimating uncertainties. John V, please explain why the 2500 expert scientists didnt do this for their prediction of 0.2 degrees per decade?
A sequence of model simulations is shown in AR4 fig 10.5. It is hard to see, but I think you would have to look quite hard to find a 7-year downward trend in one of those, so I dont believe Gavins claim of huge error bars.
mz, looking at past trends is misleading, because the alarmists say that warming is accelerating, so periods of no warming should get increasingly rare. Also, as Lucia has patiently explained many times, previous periods of no warming have been associated with major volcanic eruptions.
Larry Bolz (Comment#2830) May 15th, 2008 at 11:22 am
“In other words, how is the lack of warming over the last eight years explained in light of the increasing CO2? A physical theory requires a physical explanation and truly inquiring minds need to know.”
Yes, what event explains the disconnect between greenhouse gas levels, the surface numbers, and the satellite numbers recently?
John V (Comment#2831) May 15th, 2008 at 11:35 am
lucia,
I’m disappointed that you ignored my last comment. I’ll try again.
I understand your objection to using model variability to quantify the uncertainty in 7-year or 8-year trends. It’s true that bad models would increase the uncertainty and thereby make the models harder to falsify.
I can *not* understand why you completely ignore the uncertainty inherent in a 7-year trend. I can *not* understand how you take a 7-year trend which has obvious potential for observation bias relative to the underlying trend, find that it differs from the predicted underlying trend, and claim that the *prediction* is biased. The observation could be the source of the bias. How can you justify ignoring that possibility?
If you don’t want to use the models to estimate that uncertainty, then use actual observations. I spent an hour this morning doing this for GISTEMP and HadCRUT data. Here’s my analysis and results:
First we need an estimate of the underlying trend. I used a 22-year trailing trend (OLS) for each year.
Second we need to calculate the observed 7-year trend (OLS). This is the observation we would have for any given year. Here are plots of the 22-year and 7-year trends for GISTEMP:

To estimate the observation bias in the 7-year trends I subtract the 22-year trends. The plot below shows the bias for all years. The years that are unaffected by major volcanoes are marked. I determined the major volcano years using the historical forcings provided by Gavin that you used for Lumpy.

Clearly the observation bias on a 7-year trend can be very large. It’s also interesting that there is some structure to the bias — it’s definitely not just white noise.
The last two plots are a histogram of the distribution of 7-year trend biases and the cumulative percentage of the same. Years around major volcanic eruptions are excluded:


Clearly the potential for bias at the 95% level is very large. Even at the 80% level (10% to 90%) the observational bias is as large as -1.8C/century to +3.0C/century. It’s hard to say much about an underlying trend of 2.0C/century with uncertainties like this.
Finally, I repeated the above procedure using HadCRUT3 instead of GISTEMP data. For brevity I will skip the intermediate steps and get right to the histogram and cumulative percentage plots (again excluding years around major volcanic eruptions). The conclusions are the same:


To me this makes it clear that 7-year observations say very little about the underlying trend. Note that I have not used models in any way. The potential for bias in 7-year trends is very real. This potential for bias needs to be included in your falsification test.
Finally, given the strong structure of the 7-year bias I believe there is potential to eliminate much of it by compensating for volcanoes, ENSO, and the solar cycle. It would be very interesting to look at the current trend after compensating. Would it still “falsify” IPCC?
John V (Comment#2832) May 15th, 2008 at 12:17 pm
PaulM:
IPCC AR4 WG1 Figure 10.5 has lots of 7-year negative trends. Look a little harder. To help you see it more clearly, read Gavin’s post at RealClimate. It’s the same data.
As for the IPCC prediction of 0.2C/decade, that’s actually lucia’s interpretation of the IPCC prediction. It is not surprising that the IPCC did not estimate uncertainties on lucia’s interpretation.
—
lucia:
Did I link my images incorrectly? Thanks for checking.
lucia (Comment#2833) May 15th, 2008 at 12:44 pm
John– Not addressing your specific questions instantly is not the same as ignoring them.
As to your claim that the 2C/century is my interpretation, what is your interpretation? That the made no projection or prediction? That it’s 1 C/century? 3 C/century?
The AR4 describes this number in narrative in Chapter 10, tables and a figure in chapter 10 and the Technical summary.
John V (Comment#2835) May 15th, 2008 at 1:11 pm
lucia — “ignoring” was not supposed to be a heated word.
As I’ve said many times, my interpretation of 0.2C/decade is the underlying long-term trend.
Thanks for posting my images.
Can you please elaborate on why you exclude observational bias when comparing 7-year trends to the underlying trend? I understand and agree that estimating the bias from models is problematic. I realize that my method of using the 22-year trend as the underlying trend has problems. Surely there is a better method of incorporating this uncertainty than assuming it is zero (as you have done).
lucia (Comment#2836) May 15th, 2008 at 1:13 pm
JohnV:
Now to response to 2831:
You complained that I did not repond to your previous comment, posted less than an hour ago before you complained I had not answered. FWIW, I happened to have been finishing a blog post, went to Rogers blog to comment and then mowed the lawn. If you meant a different comment, either I was distracted and didn’t notice it, or… beats me!
The comment to which I did not respond appears to be a figure that tells me it describes something you call a bias in a measurement. There are few words.
1)How are you defining a bias in the figure in comment 2828? Do you mean the difference between the trend and the true mean? It appears you made a cummulative distribution. What sort of response were you hoping for?
2) You now show what you call a bias in your second figure in 2831. I would call those “deviations from the mean”. They aren’t biases.
This graph tells us nothing about bias because: a) what you are calling bias is not a bias. A biase would be if we averaged over all the down trend and all the uptrend calculated over 7 years, we don’t get the same answer as for the full 22 years. (This, of coarse assumed you do the averaging properly. )
Now that you have added words to your figures, I’ll look at it furthers and see what I get. I will be using 30 years averages, as that appeasr to be standard in climatology, and I’ll see if I get the similar results as you get. I’m obviously not going to do it in 5 minutes. For one thing, I’m going to do laundry. For another, I’m going to go spray weeds. But mostly, I’ll be looking at my spread sheet and checking what I get.
lucia (Comment#2837) May 15th, 2008 at 1:18 pm
John– Based on what I understand of the comment you posted, I don’t think you are using the word “bias” correctly. I think you actually mean what the modelers call “weather noise”. I don’t think I ignore this. The reason I went to Cochrane-Orcutt was to avoid the undersized estiamtes of the magnitude using OLD.
But, I do think it’s important to see if there are any and all ways to estimate the uncertainty in the estimate of the weather noise. I’ll plan to look at the issues you raised more fully. As always, until I look at it, I can’t predict whether I will ultimately agree or disagree with your interpretation of choices (or whether we will fall somewhere in between.)
I already metioned at Rogers that, you are a main person who brings up concrete issues. I take them seriously.
John V (Comment#2838) May 15th, 2008 at 1:32 pm
lucia,
There was a little misunderstanding about my use of the word “ignored”. I was referring to comment #2797 made at ~11pm last night. You had responded to many other comments but not mine. It’s no big deal. As you said, you probably just didn’t notice it.
Anyways…
In comment #2826 I was just testing image linking. Please ignore and/or delete it.
I probably did use the word bias incorrectly, but I hope my point still makes sense.
In comment #2797 I used 30 year trends and got essentially the same results. I look forward to your analysis using the same. My spreadsheet using 22-year trends is available here if you think it might be useful:
http://www.opentemp.org/_resul.....ia_7yr.xls
Regarding weather noise, I believe your test only includes high-frequency weather. Low-frequency weather is excluded from a 7-year trend in isolation. IMO you need to estimate the low-frequency weather and add it to your confidence intervals.
avfuktare vind (Comment#2840) May 15th, 2008 at 1:44 pm
Arthur Smith,
“Second point is, as I think John V is trying to get at here, and Gavin’s central point: when you look at the trends in the model runs, the range of trends is very large if you only look at 7 years, or 10 years. But when you increase the length of the run you are looking at to 20 years, 30 years, the standard deviation in the trend drops significantly. The IPCC trend and standard deviation as published is based on the model 20-30 year or more numbers, not on the range of 7 to 10 year trends. That’s the big problem: you’re not comparing apples to apples, in even a reasonable fashion.”
Apart from the fact that Lucia obviously compares apples to apples (predicted trend vs real trend), let’s bear in mind that 1) IPCC nor Gavin have ever validated their models and hence any indication of falsification is an immediate cause for rejection of the model. 2) the IPCC models are ALSO falsified from many other perspectives e.g.
a) heat does not accumulate in the climate system as predicted (semi-monotonic increase)
b) heat is not distributed in the climate system as predicted
c) the pseudo-average surface temperature the modelers are proud of replicating is biased as shown by McKitrick et al 2006, 2007, and thus the models replicate the wrong trend.
d) economic development does not follow the IPCC prognosis
etc.
For someone who’s job it is to kill bad models, I wonder when enough is enough? How many falsifications do we need of an unverified and unvalidated model before she can rest in the sand? Either one of these problems with the model would have led me to reject it (including the original lack of verification and validation). Gavin is out of his mind if he thinks that the fact that he can find a subset of models with negative 7-year trends means that we should have confidence in the models.
Larry Bolz (Comment#2841) May 15th, 2008 at 2:13 pm
Let me get this straight. Adding more models increases the number of bad ones and so increases the uncertainty. Shortening the time period increases the uncertainty.
If that’s the case, what does doing both do, and how does that specifically relate to comparing a certain number of models over a certain time period to observations over that same time period. The definitions of everything is rather less than optimal.
“Also, as Lucia has patiently explained many times, previous periods of no warming have been associated with major volcanic eruptions.”
So what explains it this time?
lucia (Comment#2842) May 15th, 2008 at 2:14 pm
JohnV
Regarding weather noise, I believe your test only includes high-frequency weather. Low-frequency weather is excluded from a 7-year trend in isolation. IMO you need to estimate the low-frequency weather and add it to your confidence intervals.
This is what I was thinking when I referred to the PDO in my very first post with a falsification. Also, this is what I was alluding to over in comments at Roger’s blog.
My thoughts on the way to do it (hypothetically) are to
1) estimate the amount of energy at individual frequencies, and then
2) figure out how a certain amount of energy at a particular frequency affects the uncertainty in the least squares fit, and then 3) integrate over the contribution of the uncertainty at due to different frequencies over all frequencies.
Step 2 is done. I know how to do step 3 either numerically or analytically.
Many of the question I have been asking people about the likely amount of amplitude associated with the PDO is related to this. The solar issue is similar.
The difficulty is figuring out what they may be, within some reasonable bounds.
I don’t like Gavin’s way– because he’s obviously added in the uncertainty due to modeling.
I’ve been trying to ponder how to make the exclusion choices right. I always like to decide before I see the results of the answer. But….. I thought of what I would consider “perfect” if I had infinite amounts of data. But, there are about 120 years. That’s at best 13 non-overlapping intervals of 7 year trends, which means, strictly speaking 13 independent samples. (Also, there are only four non-overlapping 30 year regions. I’m not sure that matters.)
I was thinking “perfection” means I need to exclude
a) every 7 year where the “down” forcing due to stratospheric forcing is more than some critical - low value. Having the volcano at the beginning or end of the 7 year period causes the maximum excursion, and it seems od


Reference (Comment#2768) May 14th, 2008 at 1:41 pm
Finally GISS released the April 2008 Global Near-Surface Anomaly - it’s + 0.41°C (a whopping decline of 0.26°C from March) - that’s a lot of energy flowing out of the near surface atmosphere, if it could be harnessed just think how much fossil fuel would be saved.