What is the “true weather noise”?
What is the “true earth” weather variability? Is it the full range of variability over all possible climate models? Does it matter for testing?
I am asking this because the answer has relevance to the questions I discussed yesterday in Are the IPCC AR4 predictions falsified?. I showed that indeed the IPCC AR4 predictions appear falsified and that Gavin’s method of testing their fidelity is inadequate. The reason is related to how we estimate the true variability of “weather noise” on the real earth, and the actual questions we ask about climate models.
Yesterday, I emphasized that the question I am asking and attempting to answer is:
When I use the term “falsify”, I mean it in the sense that the answer to Q1 is “No, 2C/century central tendency forecast is not consistent with the trends observed on the real earth.”
In comments here and at other blogs (like Roger Pielke’s) many visitor often ponder this different question:
The two questions share similar words, but they are different questions. They have different answers. I believe the answers to the two questions are:
- Answer to Q1: The temperature trends of the earth do fall within the extremely wide range exhibited by climate models used to create IPCC AR4 predictions. This is what Gavin demonstrated at Real Climate. I don’t dispute this and never have.
Gavin gave an answer to an important academic question many ask when trying compare and improve models.
- Answer to Q2: The central tendency of 2 C/century predicted by the IPCC AR4 does not fall within the relatively wide range range consistent with the trend we have witnessed on the real earth. In my opinion, now that projections are published and real data are available, this question has important policy implications.
Does it matter which question we ask?
Of course it does! My opinion is we should be asking both and making sure people understand the difference between the two questions.
Global warming is real; man is exerting an influence on the climate. For that reason, I would like to have some idea about the probable magnitude of warming. Models give a huge range of magnitudes in the trend over 100 years. I think it’s important to try to assess whether models are predict high or low, and to whether the central tendencies predicted by the IPCC AR4 appear fall inside the uncertainty bounds consistent with observations of the earth. For this reason, we must ask question 2.
Of course, to improve any current batch of models, modelers must also ask question 1. As a general rule, in the infant stage of development, models often don’t even achieve the weaker skill indicated by an successful answer to the academic question. I’ve assumed GCM’s are past that infant stage. So, I never seriously doubted the answer to question 1. I think it’s laudable for Gavin to continue to verify that the predictive ability of models has not fallen so low as to fail this less stringent test.
So, I think answering question 1 is fine, provided we never forget to also ask and answer question 2. Question 2 represents a more stringent test and is extremely important from a policy perspective.
Why are the answer for the two questions different in the IPCC case?
The reason the answers to these questions differ is simple: the appropriate uncertainty range to answer the academic question is larger than that required to answer the second question. This means the models predict a very, very, very wide range of trends around 2C/century.
In contrast, while the uncertainty intervals around the trend experienced on earth, are large, they magnitude of these uncertainty intervals are proportional to the “true weather noise”. This magnitude is smaller than those for the full panoply of climate models for a simple reason: the “true weather noise” is weather noise on one specific “model”– the planet earth.
In contrrast, the uncertainty intervals for predictions from a collection of models arises both from the “weather noise” in the model and the scatter about the average behavior of an individual model.
Interestingly, if the models all get the correct magnitude of “weather noise” relative to true earth weather it is trivial to show the variance in the population of trends across all models is
where σmodel-noise is the variance in the ensemble average trend predicted by the full collection of models, and σ2weather is the variance due to “weather noise” in either one individual model run with one specific set of forcings or for the earth itself. (Running ensembles of the model and averaging is permitted– they simply must all be “the same” case.)
Because we are summing squares, we can see that σmodel-noise, the variation in predictions across models or across forcing scenarios, is higher than that due to weather alone.
To determine whether or not 2C/century is consistent with observations of the recent earth’s temperature trends, we must use the smaller uncertainties associated with weather– σweather. Using the larger ones: σmodels can result in unnecessarily large “failing to falsify” diagnoses of models that predicted incorrectly.
These sorts of false negative errors can have malicious policy consequences. (To resort to analogy: false negative results on a Pap test can result in untreated cervical cancer, false negatives in “falsifying” IPCC projections are bad. False positives are also bad. So, we’d like to be fair in both regards.)
How large is “model noise”?
But some may suggest this “climate model noise” is small compared to weather noise. So they might claim that σmodels is approximately equal to σweather
But I must then ask: Why would anyone think the “climate model noise” — i.e. variability across models– is small? If it were small, why would climate modelers bother to use multi-model ensembles? If all models predictred the identical averaged outcomes under identical forcing scenarios, one would simply multiple ensembles of 1 model!
Maybe my “what if” holds no force for you. Then take a look at this figure, which I modified from the chapter 10 of the the WG1 submission to the AR1:
Examining this figure, ask yourself:
- Do the different models predict the same average underlying increase in temperature over 100 years? This is the period where differences in between model predictions is not strongly affected by ” weather noise”. So, presumably, the differences are due to “climate model noise”. i.e. different average results due to parameterizations used by different modeling teams.
- Do the different models predict the same average underlying increase in temperature over 30 years? While pondering that question, recall that 30 years has less “weather noise” than 100 years.
- There is a lot of overlap in obscuring detail, but if we look at the edges, does it appear the models that predict larger increases over 30 years tend to predict higher increases over 7? And 100?
- Do the cases with the low trends over seven years appear to have 2C/century trends over 30 years?
So, in short are those flat trends we see over 7 year periods consistent with models that predict 2C/century or 30 years? Or, are the trends associated with lower rates of temperature increase over 30 years?
Or, to return to yesterdays example: are individuals Swedes usually taller than the average of Vietnamese, Maltese, Portuguese and Norwegians because Swedes tend to be tall?
To answer the question of whether the 2C/century prediction/projection by the IPCC falls within the 95% confidence intervals of the trend experienced on the real earth we need the uncertainty intervals for the true weather noise. Using the larger uncertainty intervals that describe the variability of the model predictions makes sense if we ask the more academic question: Does the earth’s temperature trend fall inside the trends for all possible model (which includes models that may not mimic the earth?)
Conclusion
Whether we are climate modelers or not, we all know which information we personally use to make decisions about which policies to support with regard to adaptation and mitigation. We all have the ability to decide which questions we wish to ask and have answered. In that regard, I am asking a speific category of questions. Other bloggers ask different questions.
Our answers are different because we are asking different questions. With regard to statistical treatment, I perform my assessments using the uncertainty intervals based on the “true weather noise” of the earth. I do this because I am asking:
Others use larger uncertainty intervals based on both the “weather noise” and the “climate model noise”. They do this because they wish to ask the a more academic question, which is important when working toward improving models. Those uncertainty bars are correct for a particular question. It’s just not the one I ask.
Previous Post:
« Do IPCC projections falsify? (Are Swedes Tall?)
Next Post:
Uncertainty Intervals: Now with Bayesian »
42 Responses to “What is the “true weather noise”?”
You can leave a response, or trackback from your own site.



jmrSudbury (7 comments.) May 15th, 2008 at 1:39 pm
Thank you for posting a large graph. Tired of squinting at small pixellated graphs, I went looking and found this link. The 5th frame has the graph on which Gavin (finally) drew error bars. The graph you have above is on frame six. I hope this helps you and your readers. — John M Reynolds
Martin Ringo May 15th, 2008 at 3:55 pm
Lucia,
Excellent point or distinction. There is the real world (= the “true model”) which we measure. Those measurements have some randomness, be it from the measurement errors or underlying randomness. We little humans make models of the real world — some simple, some complex — to explain parts of the real world. We take our models and make predictions or produce fitted series for the measured variables, e.g. global average temperature anomaly. If we have some methodological candor, we report the standard errors of our fitted or predicted lines. (Or if you like, the standard errors of the trends of our fitted lines.) That is our ESTIMATE of the real world noise. Those estimates will differ across models.
However, those estimates are different from the differences in the fitted lines or predictions or slopes, if you will. This is an important point. The differences in slopes in a measure of the uncertainty of our understanding, not the variation in temperature trends or such. Just to give you some numbers, I estimated 13 models of temperature trend for the post 2001 period for the models discussed in these pages and a few others (a pure Moving Average model and variations with GARCH(1,1) estimation, a technique often used estimating volatility of financial returns and generally quite useful for time series where the variance of the underlying series may be changing as a function of time, a sort of autocorrelation of the variance). The average trend was 0.043 degrees C per DECADE with standard deviation amongst the 13 estimated trends of 0.077 degrees C per decade. If you use the same models for estimating the weather noise (expressed in the terms of a slope in degrees per decade), the value is 0.078.
So for time series models the variations of my 13 choices is about the same as the estimated weather noise (expressed in terms of slope). This also should give perspective on the 0.19 standard deviation of the climate models. Now to be fair, those models are making predictions and the variation might be expected to be a bit higher. But your “models predict a very, very, very wide range of trends” is maybe a couple “very”s too many, but it is the right idea.
If I get energetic, I will take the 13 models and estimate from 1980 to 2000 and then forecast for 2001-2008. I can then give you a standard deviation of the time series model forecasts which would be an “apples to apples” comparison with the 0.19 standard deviation.
lucia May 15th, 2008 at 4:02 pm
Martin– Estimates of uncertainties in the trends based on the models would be great! That’s what I’ve been telling John I wanted to do in my not particularly good way. But for the purpose of the 2001-2008 period, I was going get a SWAG (scietific wild ass guess) based on the GISS model runs with solar only forcing! (Unfortunately…. they versions on line are averaged.)
Evidently, one can get the IPCC model runs that Gavin used on line– but you need to register. Is that what you got?
Tom Fiddaman May 15th, 2008 at 4:13 pm
The information content of 7 years of data is the same regardless of how the question is asked. So, unless you can make some compelling statement about the statistical power of a particular procedure used to answer one or the other, I don’t see how you can assert that one test is more stringent than the other. In fact, the second question is quite misleading to the lay public, because it makes a definitive-sounding statement (IPCC models falsified!) with very low confidence (which the press will ignore).
Gavin’s approach is the traditional approach to falsification: evaluate the probability that a model (or models) can generate observed data, and reject when it’s too low. The failure to reject in this case is due to the low information content of the short term data. That doesn’t mean the method is bad; it means that we should seek more data or be patient.
Your question (Does the IPCC AR4 forecast central tendency of 2 C/century fall within the range of trends consistent with the real earth?) is valid, but the answer must be qualified, e.g., “No, 2C/century central tendency forecast is not consistent with the trends observed on the real earth SO FAR.” A negative answer does not automatically constitute falsification, because we have experienced only 7% of the IPCC forecast horizon. Thus the information content of the negative response is as low as the null response to the first question, only rephrased. If you could demonstrate that the IPCC predicts a constant 2C/yr trend, then you might have a case, but I don’t think you can.
One way to do that would be to demonstrate that models have low endogenous trend variability, which you seem to be attempting here by partitioning noise into components. Eyeballing Figure 1 above, the differences in models due to trend are slow to emerge from the general noise in the first decade or so. That would suggest that, early in the simulations, sigma(weather) >> sigma(model-noise). That in turn implies that RC’s trend histogram, linked above, is a useful upper bound on trend variability, not unduly influenced by cross-model or parameter variation over short time spans. It might be possible to refine the RC figure by looking at single model ensembles (some exist in the AR4 CMIP3 archive as I recall), yielding narrower distributions. However, I suspect that it would not make much difference.
In any case, it cannot be true that you “perform my assessments using the uncertainty intervals based on the “true weather noise” of the earth.” No one has access to the true noise. It can be estimated various ways, e.g. with agnostic AR models (at the peril of ignoring forcings), with simple models (Schwartz), or with GCM ensembles. No matter which you choose, the measurement is assumption laden. If the endogenous weather trend has even a little autocorrelation (as one would expect given that the ocean has 1000x the heat capacity of the atmosphere, for example), then the 7% problem is quite debilitating to your argument.
Martin Ringo May 15th, 2008 at 4:38 pm
Lucia,
Alas, no. I would love to get — make that estimate myself — the uncertainties of the slopes of the climate models. Having read climate modelers discuss statistics for over a decade now, I simply don’t trust them to do what is a computationally intensive, probably quite expensive but otherwise straightforward exercise.
What I was referring to was time series models. One can forecast from them also. Indeed somebody once argued that there is little evidence that the climate models can do a better job of predicting than a linear model in the with the same exogenous variables as used in the climate models. Note that with time series models, unless there are large AR coefficient values for deeply lagged variables (e.g. t-48 for monthly data), the forecasted trend-slopes will be very similar to the estimated. {Time Series forecasting paradigm: AR models asymptotically approach their equilibrium values in the forecast, MA(p) models hit the equilibrium in p+1 steps.} So other than the first few wobbles in the forecasted series, one gets the estimated slope. Hence, the numbers I gave are a rough idea of the variation of time series forecasts from 2008 onward. I was going to do it for the same time period as the IPCC forecast/projections being discussed.
As a technical note: unless one understands the nature of how a particular model run/realization is made with respect to the modeling of the inherent uncertainty, then having a whole set of them doesn’t do a lot of good. This is a case where understanding the physics do one next to no good, because this is an issue of how one models a multivariate distribution. All the Monte Carlo does then is to integrate the distribution. But with time series it is easy to go wrong.
steven mosher May 15th, 2008 at 5:01 pm
Tom
Yes with gavins approach we should be more patient. We should wait 20-30 years.
Ask hansen if he wants to wait 20-30 years?
He’ll say no.
He’ll say the model show that going above 450PPM is disaster. No error bars for him.
Tom Fiddaman May 15th, 2008 at 7:26 pm
Indeed somebody once argued that there is little evidence that the climate models can do a better job of predicting than a linear model in the with the same exogenous variables as used in the climate models.
I’ve tried this with simple models (Schwartz model that I think Lucia experimented with and higher order variants). The results aren’t quite as good as the envelope of AR4 models, for reasons I don’t know. However, I think it’s somewhat irrelevant. The point of GCMs is to develop an operational understanding of how things work. Adding the spatial dimension is essential for doing that, and also brings vastly more data to bear on the validation question - you get to look at regional and seasonal patterns, lapse rates, and a zillion other things. Those may not improve your ability to predict the global temperature time series, but they tell you a lot about whether you have your physics right.
Which brings me to my second point …
Yes with gavins approach we should be more patient. We should wait 20-30 years. Ask hansen if he wants to wait 20-30 years?
Even if you want to be patient, you have to make a decision under uncertainty. It would be stupid to make that decision on the basis of a single 7yr experiment when you have reams of other data to consider.
steven mosher May 15th, 2008 at 8:27 pm
Tom.
Every decision is made under uncertainity. So the decision to turn food into fuel ( ethanol) was a decision
made under uncertainity. Luckily some people have figured out in a rather short time horizen the obvious fact that
many yelled about. If you burn food for fuel, then people starve.( DUH. burn uranium for fuel instead!)
It was brought home to me one day a year ago
as I sat in the airport bar. A guy sat next to me. he ordered doubles. I knew by glance he was from the midwest,
as am I, so I struck up a conversation about the weather. he brighten. Talking about the weather. it’s a midwestern thing . we have four seasons. Anyway, when I asked him what he did he said he sold corregated boxes.
No shame in that. the world needs pencils, pens, cardboard, paper, exotic sex toys,and climate science. Lots of needs.
Then I asked What his biggest challenge was. he said, “getting glue.” because the glue that is used to make corregated boxes is derived from corn syrup. We Chatted about Archer Daniels and how they allocated their corn syrup product
( ok boring crap) But it occurred to me that politicians were making decisions about the use of corn product
( turn it into fuel) without a fart in the winds idea about how that decision would impact people.
Did I go Off topic? Tangent MAN!
Tom Fiddaman May 15th, 2008 at 9:05 pm
Steven - I actually quite agree with your tangent. Ethanol was a stupid decision made under uncertainty, without an appreciation of the uncertainty (or even really an appreciation of the more obvious certainties).
John V May 15th, 2008 at 9:13 pm
mosher — I’m with you on uranium vs corn. Nobody likes the taste of uranium anyways.
I’m also being swayed by the idea of liquid flouride reactors. Google Nuclear Green and Charles Barton for more.
Gotta go.
KuhnKat May 16th, 2008 at 12:33 am
Tom,
“It would be stupid to make that decision on the basis of a single 7yr experiment when you have reams of other data to consider.”
Yes, we have about a million years of proxy data without a hint of catastrophe from too much CO2. We should wait 20 years to validate the models and learn a LOT more!!
Technically we could have 19 years of flat weather and make it all up in the last year. Of course, we have no observations to support this. Where is YOUR cut off??
Tell you what, let’s agree to wait and see whether Solar Cycle 24 has a max at least as high as 23’s before passing legislation!!! That is only another 5 years and if warming doesn’t resume till then we haven’t lost anything by waiting!!
I should also ask you what is going to be done about the natural increase in CO2 if warming resumes?? All we did was increase the RATE. If the AGW theory is right, we would eventually reach the same point with the natural additions!!! In other words, just cutting our CO2 emissions is not enough. If rising CO2 is an issue we need to take the NATURAL increase out of the atmosphere also. Do we reduce the biosphere or lock the excess CO2 up in calcium carbonate or some other compound???
Now, about those observations. The AGW physics requires that the oceans warm and the upper trop to warm faster than the lower trop. Care to take a shot at explaining why these indicators are negative along with no temp increase?? It doesn’t leave too many places to hide that extra energy that AGW is supposed to be hoarding. If there is no excess energy in the system, the last 20 years of warming are meaningless. Just more WEATHER NOISE!!!
Nick Stokes May 16th, 2008 at 1:59 am
Lucia,
Thanks for stating the questions clearly. Let me address the one you stated first (but sometimes referred to as the second). I think there is a problem with your use of the term “central tendency”. You said here that it is equivalent to a mean (tho I think a mean is a “measure of central tendency”). But you are not testing it as a mean; you are testing it as an instance. I think that the IPCC should not, and did not, make such a prediction.
An elementary illustration. I’m the IPCC, and asked to predict the future location of a mark on the tyre of a passing vehicle. The wheel radius is 1 m, and the speed, expected to be uniform, is 10 m/s. I can’t see the mark.
So I produce a graph with a line showing where the hub is, and a shaded region 1 m (scaled) wide. My prediction is that the mark will be found within that band at any future time. I make remarks like “the graph shows the longterm trend is 10 m/s”.
So after 1/10 sec, the onlookers, who can see the mark, say “but it went down and up, and has barely made any progress at all. Your prediction of a central tendency of 10 m/s is falsified”. And I say, but no, it is, as predicted by my model, within the shaded region.
Now back to AR4, the IPCC did show a lot about the model’s working, they did make a prediction (in that Fig 10.1) of exactly that form. It’s true that whoever wrote the section on committed change in the TS may have had and communicated the wrong idea, though the words are technically correct (”about”). And that can be criticised. But there is ample evidence of the nature of the IPCC prediction, eg in the plots in Ch 10 and the supplementary materials actually setting out the model mechanics.