Yesterday, I described a time period during which I think we can safely agree the earth’s short term climate variability was uninfluenced by volcanic eruptions. Today, I’ll provide results for the variability of 7 years trends calculated based on a merge of HadCrut, GISS Land/Ocean and NOAA data for the earth’s global mean surface temperatures during that period.
The time period under analysis is that indicated by the red arrowed line in the figure to the left. The selection of this period is discussed here.
Unfortunately, this time period does not include a huge amount of data. So, I did an informal analysis along the lines suggested by JohnV in comments (but with modifications.) I can answer questions in comments, but for the most part, I think simply reporting the broadbrush estimates is best. These are:
- I computed the trend for the global mean surface temperature for the full period consisting of 313 months. This was found to be m=1.35C/century using OLS and m=1.33 C/century using Cochrane Orcutt. The 95% uncertainty bounds computed using CO or OLS adjusted for the serial autocorrelation were ±0.32 C/century. The lag 1 autocorrelation for that period was found to be ρ=0.53.
- I computed the temperature trends (m) using OLS for every 84 month period inside the “volcano dust free” window. Afterwards I computed the standard deviation of those trends (m). This resulted in a standard deviation of the trend to be σm,84=±1.32 C/century. This calculated value approximately 20% larger than σm,84±1.17 C/century I obtained based on a quick CO fits using the ρ=0.53 for all cases, and suggests the CO 1-sigma uncertainties in the trend are somewhat too small by at least 20%. (Because the trend calculations involve overlapping times, the true standard deviation in the trends could be higher)
However, it is worth noting that 1.3 C/century is also quite a bit lower than standard error of roughly ±2.1C/century for variability of 7 year trends suggested by Gavin at Real Climate.
- If I assume the lag 1 autocorrelation for the period is ρ1=0.53, and apply Cochrane-Orcutt (CO), I obtain σm,84=±1.37 C/century. This is also roughly than expected based on a quick CO fits using the ρ=0.53 for all cases, and suggests the CO 1-sigma uncertainties in the trend are somewhat too small.
- I applied C-O to 84 month periods, each separated 2 years in time. (That is, the 84 month period centered on Jan 1924, that centered on Jan 1926 etc. This provided 10 periods for computation. Bear in mind these overlap and are not statistically independent.) Then, I then assumed the trend of 1.34 C/century represented a constant underlying trend for that period, and tested to see how often this trend was rejected.
I found 1 false positive on the high side and one false positive on the low side. That is: Assuming the real trend during the analysis period was 1.34 C/century, I would have incorrectly falsified on the low side roughly 10% of the time and incorrectly falsified at the high side roughly 10% of the time. These two cases overlap in temperature. Consequently, both incorrect falsifications are triggered the same weather range. On the other hand, the non-falsifications also overlap, so given the small amount of non-overlapping data, a 20% incorrect false were I claimed a 5% seems roughly right.
This collection of results suggests the variability of computed 7 year trends of weather during periods with no significant volcanic activity has a 1-sigma spread of roughly 1.3 C/century – to 1.5 C/century. This is larger than the 1.0 C/century – 1.1 C/century I have been obtaining using CO fits to data. It is smaller than the 2.1 C/century suggested by Gavin based on the spread of trends predicted by GCMs.
Based on rather crude inspection, it appears that using method I have been in the past, the uncertainty intervals said to corresponded to 95% uncertainty intervals for rejecting a mean of 2C/century may correspond to roughly 80% confidence intervals, which is the level the IPCC refers to as “High Confidence” (See page 22 of the Technical Summary by WGI for the AR4).
Going forward, I’m looking for better confidence intervals. 🙂
I’m planning to look in two directions both of which readers and I have been discussing in comments. These are:
- Try to estimate the amount of “weather noise” in cycles with longer periods, and estimate uncertainty in the short time period trends due to this factor. If anyone is interested in estimating the spectral characteristics of weather noise is GCM data, I think I know how to use that to estimate the amount uncertainty in a trend estimate due to the short time span.
- Try to determine the better method to estimate the uncertainty for systems significant autocorrelation in the lagged residuals. I’m discussing this with Marty Ringo.
References so I remember they exist.
These references have nothing to do with the post. But I want to remember they exist as I discuss improvements.
Thejll & Schmidtt 2005, JGR Vol 110.
Trends in climatological series.
lucia,
I’ve been busy with real work since taking an extra long weekend (Canadian Victoria Day weekend plus a brief hint of summer kept me away from my computer).
My analysis using OLS on annual GISTEMP data and excluding major volcanoes resulted in a standard deviation of ~2.5C/century on 7-year trends. Do you have any ideas about why are results are so different?
Lucia – by lag-1 autocorrelation are you talking about the autocorrelation between temperature anomalies at a 1-year separation, or 1- month? Sorry I’m not completely familiar with some of the terminology here…
Anyway, I ask because when I ran the autocorrelation for the linearly de-trended GISS (full annual average data series, doing Schwartz’s time constant analysis) the delta-time 1-year autocorrelation was 0.61, more than your 0.53. For HadCRUT3 it was 0.737, and the 2-year autocorrelation was 0.564. I’m wondering how your uncertainties are affected by adjusting a higher value than your 0.53 number (where did that come from, anyway?)
JohnV– No. I don’t really know why they are different. If I use OLS on this set of data, I get about the same standard deviation as with CO.
The only thing I can think of is that we made different choices on exclusion and the issue of the Hekla and Hemm…. volcano eruptions. Even leaving in Hekla, and making a patch up to Angung, I still didn’t get 2.5C a century though. I got about 1.7C/century– with averaged data from the 3 sources.
I know when you were discussing things in comments, you were talking about taking out individual years, filling in missing years, doing comparisons to running 22 year trends (with fixed up missign years?). I didn’t like the idea of patching in that way in general, so I never tried to fully understand that or implement it. I just found the longest (only) set of contiguous years that unambigiously have no eruptions and used that.
When I look at periods with volcano eruptions, the trends are all over the place. In fact, in those periods, the variability is about 3C/century to 5C/century. (There are two big regions– I don’t remember which had what). Of course we understand why.
If the volcano erupts at the beginning of a period, the temperature drops. Then, it recovers. The “veil” lingers about a year or two. So, if we are talking centered averages, the 7 trends are corruptions for at least 3.5 years before the eruption and usually 5.5 after. (Hekla and that other weird volcano could be worse, since they seemed to have blown and then let off stuff slowly for a long time.)
There might be no dust, but isn’t there still some volcano “in the pipeline” (according to models-I obviously don’t believe its actually there)? I seem to recall that the full recovery from an eruption, in models, can take many years, even decades (of course, empirically, the volcano does what it does in a much shorter amount of time 😉 ) so that there might still be some Pinatubo hanging around. In fact, that’s kind of the only way that “attribution studies” tend to get a strong negative “natural” trend in the last 30 years-inspite of a most slightly declining TSI-becuase the effects of El Chicon and Pinatubo actually overlap slightly.
Arthur— The are 1 month lags, and for the residuals in a fit.
I get much bigger autocorrelations in periods with volcanic activity. (This can actually be explained, but I need to go meet my sister to go to the Chicago symphony. Vivaldi! 🙂 )
So, I figure I get lower ones during periods of low volcanic activity because I’ve specifically screened out a big componenet of the “white noise” that would be necessary for Schwarz. (Or, possibly which screws up Schwartz. Because these volcanos are big excursions. Schwartz assumes white noise for the forcing.)
On the uncertainty estimates: The calculated uncertainties in the trend are larger if we think the autocorrelation is thought to be larger. So, if the CO fit took the autocorrelation from all of time, my uncertainty estimates would pop-right up near the correct value. The problem is: Is that ρ really right for the current period? After all…. no volcanos!
I don’t actually know.
lucia,
I did not fill-in any missing years. If I understand your procedure correctly, my procedure was basically the same. I excluded all trends that included years with any major volcanoes (by my loose definition). I calculated a moving “underlying” trend because the underlying trend was not constant over the years I looked at (although your constant underlying trend should *increase* the variability).
I’ll have a look at my spreadsheet and try to reproduce your results.
Ok, I had a look at my spreadsheet, fixed a minor error, and have some results.
First the minor error:
I was using trailing trends instead of centred trends. Centred trends make more sense when comparing the 7-year trend to the “underlying” 22-year trend. This made very little difference in the results, but it is more correct now.
I computed the standard deviation of the difference between the 7-year trend and the underlying trend a few different ways.
1) Using trends centred on all years from 1891 to 1997 with the underlying trend calculated each year;
2) Same as #1, but restricted to trends centred on 1925 to 1943 (my original method);
3) Same as #2, but using a constant underlying trend (OLS slope from 1925 to 1943) (your method);
Standard deviations using GISTEMP:
1) 1.97 C/century
2) 1.28 C/century
3) 1.27 C/century
Standard deviations using HadCRUT:
1) 2.37 C/century
2) 1.90 C/century
3) 1.98 C/century
Standard deviations using average of GISTEMP and HadCRUT:
1) 2.13 C/century
2) 1.53 C/century
3) 1.58 C/century
FWIW, our results seem to agree when we use the same 18-year period.
Lucia,
I find it frustrating that you keep conflating variation of trends as calculated over subsets of a time series (what you do) with variation over a selection of model results (what Gavin does). They are totally different, and there is no reason why your 1.3 should compare with his 2.1. Different models could predict a wide range of results even if the time series showed trends with little variation. I would agree that that Gavin shouldn’t have overlaid a normal profile on a histogram where the variation can’t be considered a result of random sampling.
I don’t like the IPCC custom of attaching meaningless percentages to their statements of confidence about the science. I don’t think they mean anything, but they certainly aren’t statements about tail probabilities. So I don’t think that on that basis you can claim 80% as “high confidence” statistically.
JohnV– I averaged GISSTemp, NOAA and HadCrut. I never looked at anything individually. Wow, Hadcrut’s pretty variable!
If you go forward in time, you’ll see that the issue of Hekla/Unpronouncable Russian volcano makes a difference.
Since they don’t appear on the Sato optical depth graphs, I tried to do a bit of checking to see what people said about those. I found Robok’s 2000 paper– but other than that, I don’t know what the “thoughts” are on Hekla & the Russian volcano. All I know is they happened, so by strict definition of excluding stratospheric eruptions, those years are out.
Nick–
I’m not conflating.
However, Gavin appears to to suggest the 2.1 C/century results from “weather noise”. Or at least, I think he gives this impression when he writes the following:
(Italics mine. Also note that in the previous paragraphs, he spent a lot of time discussing “weather noise”)
My point in making the comparison is to correct the false impression I think people will likely have developed from Gavin’s post. Gavin’s 2.1C/century value is not the weather noise– and that has been my point several times.
I of course, will periodically compare the two with the goal of pointing out the two are different, both in meaning and magnitude.
I would also not call 80% “high confidence”, but it’s the terminology the IPCC uses, to, in their minds, help readers understand something. So, it appears right now, that using their terminology, we might have “high confidence” the 2C/century is incorrect.
That said, we still do need to work on my uncertainty intervals.
Lucia,
Gavin deals with that issue here, and contrasts “weather noise” with between-model variability.
and
This is the link to Nick’s Gavin reference.
One point that may not be obvious to some. Digital software systems are completely deterministic. The same software system with the same data will always produce the same result. This means that any variability in the output must result from internal random (non-deterministic) values in the system (or data). Put another way, all variability in the output is completely determined by introduced random values.
Gavin’s claim that The standard trick is to look at the ensemble of model runs. If each run has different, uncorrelated weather, then averaging over the different simulations (the ensemble mean) gives an estimate of the underlying forced change. essentially says that we get weather by introducing random values.
The suggestion that model ensemble output variability is somehow a measure of weather variability is false. It is merely a measure of the random variability introduced into the model run.
Nick,
I think in words, Gavin is explicitly describes two things:
a) Describes the spread of the ensemble means over “N” different models. (This is the spread reported in the IPCC graphs.)
b) Describes the spread of the weather in models or on earth. Based on what I showed above, during non-volcano periods, the variability in 7 year trend appears to be roughly ±1.3C/century to 1.5C/century.
He doesn’t describe
c) The spread of all weather over all models which is due to the combination of (a) and (b). It is larger than the spread of weather in individual models. Based on Gavin’s post, this appears to be 2.1C/century,
Gavins graph shows (c), his text gives readers the impression that what he thinks he is showing them the magnitude of the “weather noise”, (b). That’s a Gavin’s conflation.
The paragraph you quoted explains that what’s in his graph (which is “c”) is not the variation across model’s ensemble averages (i.e. “a”.)
That’s true. The graph Gavin he shows is not “a”.
But I have to ask you something. Has anyone ever suggested the variation of ensemble averages of climate models (a) is the variation in weather (b) or the variation in all weather across all models (c)? I haven’t. As far as I’m aware, no one has.
So what precisely is the point of Gavin’s little lecture about the difference? I think Gavin decided to create a strawman and argue against it. 🙂
It is my understanding that the mathematical models, numerical solution methods, and application procedures for GCMs are not compatible with, or not intended to be applied to, calculation of weather. Certainly, NWP models/methods/codes/applications and those for GCMs differ in many aspects. Each contains accounting of the important physical phenomena and processes relative to the intended application areas. And while some of these are the same, many important ones are different or not even included in the other. So, I remain somewhat confused by the characterization that ‘climate is the average of weather’ and my understanding that GCMs do not calculate the weather.
If my understanding is correct, then the ‘variability’ seen in GCM calculations is climate variability, not weather variability. Climate models that do not calculate the weather cannot produce weather variability. Or can they? Climate models calculate climate variability.
Additionally, I will go so far as to say that it has yet to be known that the variability seen from GCM calculations is not in fact direct results of artifacts of the models and numerical solution methods used in the code. And these have the potential to be unrelated to any physical phenomena and processes of either the weather or the climate. Verification of the numerical solution methods would provide important missing information relative to understanding the source of the observed variability.
The observed variability is thus either climate variability or numerical variability, not weather variability.
This situation seems to be yet an additional case of adaption of an hypothesis by osmosis in contrast to carrying out the work necessary to determine the actual state of affairs.
Phil_B
I have no difficulties with the ensemble average idea for these runs. Engineers do it in turbulent flows all the time and have been since at least the time of Osborne Reynolds. (I don’t think he instruduced this idea to flows either. But for all I know, he did.)
You don’t introduce random values. They actually just happen on their own.
What happens in turbulent flows (and many non-linear systems) is that the simple solutions are unstable to small perturbations. So, very small disturbances added to a flow solution will make the solution diverge. There are a whole bunch of ways people discuss this issue: chaos, instability, turbulence. Each has it’s advantages.
One thing that can happen in a computations is that if you change the value of 1 number in 1 grid cell in your computation, by say, rounding to 6 significant figures from 7, you will eventually have a different solution for the flow.
The two solutions are called “realizations”. If you run a lot of them, you can average over them and find average properties for the solution for that particular set of equations.
We do this in a laboratory too. We turn on the test section, run the experiment: Realization 1. Turn everything off. Go to lunch. Turn it on the next day. Run it again: Realization 2.
Each experiment will give slightly different results, but an average will exist.
So, the idea of averaging over model runs, realizations or experiments is fine. It’s a routine idea used across many, many field, and not even slightly unique to climate science. It’s discussed in undergraduate fluid mechanics and heat transfer courses.
Lucia,
I’m not sure when you ask “has anyone…” whether you mean characters in this discussion, or the world at large. As to the latter, I just don’t know. As to the former, I believe that weather noise (simulated) is a component of model variation, and I think everyone is saying that. What I understand from Gavin is that models actually produce weather-like variation, (with, as you say to Philip_B, minimal prompting), and it generally underestimates real weather noise. If you ran just one model many times with the same parameters, you could perhaps identify the variability with weather noise. But when you look at an ensemble of models, you bring in different parameters, different treatments of various physics (eg solar cycles) etc. You for example used the GISS drivers for Lumpy; I’m sure other models use sets with noticeably different numbers. All these add up to a great deal more variation than just weather.
It seems to me that by making this selection, the IPCC invites this variation, as a rough measure of the dependence on variances of numerous inputs. It seems to me that (coming back to an earlier gripe) that should be the reference variance when testing their projections. And it means (and this is minor in comparison) that one should not expect that between model variation to match weather variation deduced from a time series.
Incidentally, a large part of my own experience here is the computation of turbulent flows, and the treatment of ensembles in that context.
Nick–
The reason I ask if “anyone” has confused the the variation over models is that Gavin’s post is worded in a way that suggests he is rebutting “someone”, and in particular a blogger. He doesn’t say whom. Presumably, whoever he is rebutting must have made this mistake. I know of know blogger who has made that particular error he advises people not to make.
Yes. This is why Gavin’s 2.1C/century is not weather noise. I think we agree on that.
However, I think the way Gavin writes his post, the words he uses, suggests to readers the 2.1C/century is weather noise. You and I know that’s not what it is. But I think Gavin’s prose leads his readers to think so by not discussing all three possible things.
But we can disagree on what Gavin’s prose seems to say. Nevertheless, I will periodically remind readers that the 2.1C/century that Gavin shows is a) conceptually not the weather noise and b) is much larger than the true, earth weather noise for periods without volcanic activity.
I don’t consider this “conflating” the two. Conflating the two means saying they are the say. I am saying flat out the two are entirely are different things.
If I understand what you are saying then we agree.
I use precisely that variance I use when comparing to the trends consistent with actual weather. However, I limit my comparisons to those metrics the IPCC themselves provide. The IPCC provides a) the central tendency and b) the variance– by way of hazy bands on some graphs.
I think we’ve discussed this before. I recall you thought I should get the entire probability density function for average climate trends for models. I think that would be an interesting academic exercise to dig in, and try to create the full PDF the IPCC did not bother to publish or disseminate to the public. However, as the IPCC does not provide that in the document, I don’t consider doing that to be a test of what the IPCC tells people and policy makers. So, I test only what they communicate.
I found an interesting graph at Tamino’s:

I have no idea where Tamino is going with this graph.
As you know, I would decree falsification of a “mean” as occurring when the mean does not fall within the uncertainty bands for the 7 year trend. Here Tamino did this for 10 year trends, compared to a long term trend.
As we see, we get two falsifications in 25 years using either data set. This is about the same rate of “false” falsification I got for the “volcano-free” period.
However, what’s interesting in Tamino’s graphs is these are likely “true” falsification. The reason is, the hypothesis tested is: Does the “expected long term mean” fall within the 10 year trend?
In the case of periods with volcanic eruptions is, no. In fact, when volcanos erupt, we consistently expect the short term trend to deviate from the expected long term mean!
If we ran 100 runs of a single climate model to cover the time from 1975-now, after averaging over the ensemble, we should expect many, many 10 year mean to deviate from the 25 year mean.
In contrast, during the current period, we have had no volcanic eruptions. So, discrepencies between 7 year trends and long term predictions ought to fall in the range consisted with variations of earth’s weather system.
One thing that can happen in a computations is that if you change the value of 1 number in 1 grid cell in your computation, by say, rounding to 6 significant figures from 7, you will eventually have a different solution for the flow.
Well that’s a software change (or possibly a hardware implementation change).
Climate data in the past has digital values and doesn’t change. There is no new climate data at a particular point in time. Rerunning the model and producing a new result can only result from a changed model, changed input data (i.e. a changed climate history), or the input of random values (and your example is one of inputing a random variable – the rounding method).
Were climate models using non-digital climate measurements (or were we able to generate a new climate history) then I would agree with you, but I would be very surprised if this were the case.
And from wikipedia,
In probability and statistics, realization, or observed value, of a random variable is the value that is actually observed (what actually happened). The random variable itself should be thought of as the process how the observation comes about. Statistical quantities computed from realizations are often called “empirical”, as in empirical distribution function, empirical probability, or the empirical definition of sample.
If this comes across as combatative, then that’s not my intent. I like your blog a lot. You are addressing important issues.
Phil_B:
I don’t think you come across as combative.
Yes, intentionally flipping a bit would be a software change. I’m sure the modelers do actually vary the input file. It’s justified because they don’t actually know the initial conditions.
But actually… you don’t have to change the input file with these sorts of code. I had a friend do a study of a hydrodynamic instability where heavy fluid rested over less dense fluid. Everyone knows that eventually such a system will flip. So, he set it with the input filed set “perfectly” with the pressures, surfaces and everything “just right” to balance forces.
Then, he just let the system run over night. A bit flipped, that triggered the overturn. He did it a couple of times. Different bits flipped. He was all excited.
We all laughed at him and pointed out he could save run time by just putting in a random number generator near the surface and then getting the ensemble that way.
(This was to try to understand the full range of what might happen in a million gallon tank full of radioactive waste.)
Lucia,
“Nevertheless, I will periodically remind readers…”
It’s a deal 🙂 And I’ll take back any mention of conflation.
On weather/model falsification, I liked Roger’s diagram using your means, although not his interpretation.
lucia,
Gavin’s estimate of the weather noise from models has issues, but your estimate of the weather noise has issues as well.
I think it’s important to remember that you only have ~18 years worth of overlapping 7-year trends. The “weather noise” on 7-year trends from this small sample seems to be in the range of 1.3 to 1.5 C/century, but there is a lot of uncertainty in that estimate.
The potential major influences on 7-year trends that we have identified are volcanoes, ENSO, and the solar cycle. Volcanoes have been removed by selecting a small date range. ENSO and the solar cycle remain. With only 18 years of 7-year trends it is unlikely that the full range of ENSO and solar cycle combinations have been sampled.
Another issue that needs to be considered is how to combine your different uncertainty estimates. The uncertainty in the current 7-year trend is due to “high-frequency” (<~7 years) weather noise. The uncertainty in the volcano-free 7-year trends is due to “mid-frequency” (<~18 years) weather noise. Are the two sources of uncertainty independent? If no, then why? If yes, their variances need to be added:
Mid-Frequency: SD = ~1.4 C/century (no-volcano trends)
High-Frequency: SD = ~1.1 C/century (current trend)
Total: SD = sqrt(1.1^2 + 1.4^2) = 1.8 C/century
(I’m not suggesting that this is the number to compare to Gavin’s. AFAIK, he did not look at the uncertainty in the current trend).
JohnV–
Sure. My estimate of the uncertainty has issues as well! That’s one of major points of this blog post.
From an empirical point of view, the small sample size for the “no volcano” region makes it very difficult to estimate the variability conditioned on no volcanos based on data. But we do conclude the Cochrane-Orcutt OLS measurements does seem to give numbers that are between 20%-50% too small. But even the range of the uncertainty in the uncertainty is uncertain. And we don’t know the attribution for this. Is the major reason for the smallish error bars the statistical method (as in these would be too small even if we had 100 years of data?) Or is the major difficulty weather noise at longer periods? Or both. 🙂
That’s why I’m trying to think of other ways to estimate the two ranges.
Martin Ringo has the paper that Lee and Lund cited discussing the correction Tamino discussed (and called an exact solution’) The correction is suggested based an empirical fit created by running monte carlo simulations to a specific problem that had known AR(1) and AR(2) values. Looking at the properties, it’s clearly not general for AR(1) noise (and is obviously wrong in the limints of ρ=1 and ρ=0 (white noise).
So that “fix” isn’t going to work general. So, Marty is trying to look through the liteture to find other stuff.
I’m trying to think of ways to consider the longer periodicity. (And we’ve been discussing that in comments for ages now. So this isn’t new.)
On the Gavin front: No Gavin did not look at the uncertainty in the current trend. I’ve said that many times in comments and in an actual top level blog post.
Gavin looked at an entirely different question. It is unfortunate he worded his post to appear to be rebutting a post examining a hypothesis he was not addressing. It is even more unfortunate that he called the 2.1C/century “weather noise” in the estimation of a 7 year trend and ascribed it to 7 year periods with no volcanic activity. It clearly is no such thing.
However, the fact that he did these things means that I need to put my values in context of his.
On “weather noise” in climate models, Kevin Trenberth says that it is not there:
http://blogs.nature.com/climatefeedback/2007/06/predictions_of_climate.html
If there is no ENSO cycle or PDO, how can a model spread reflect uncertainties in 7 year trends that result from start/end points in that 7-year period at different points in the ENSO cycle?
The spread in climate model realizations over a 7-year period is therefore not reflective of “weather noise,” but probably reflective of the significance of various choices made by the modeler in the run. Trenberth would seem to support this when he says:
lucia,
IMO, you and Gavin have both estimated the weather noise on 7-year trends. His estimate had problems because it lumped inter-model variability in with the weather noise (which increases the uncertainty by an unknown amount). Yours has problems because of the small number of non-volcanic years available (which is unavoidable but probably decreases the uncertainty by an unknown amount). I suspect that the true weather noise lies somewhere between the two estimates.
Unfortunately, the weather noise is likely large enough to say much of anything about 7-year trends.
I know I’ve been saying this for a while, but I hope to find time this weekend to “correct” the temperature history for volcanic, solar, and ENSO effects. With these effects removed the uncertainty on 7-year trends should be smaller.
Roger,
If the variation in the models in Gavin’s graph don’t even include “model weather”, then the ensemble average between models varies even more than I imagined.
lucia, Roger:
The models do not predict the timing of ENSO, PDO, AMO, etc. This does not mean that they do not arise in the models. That’s the whole point. That’s why a collection of 55 model runs of 2001-2007 is similar to 55 distinct 7-year time periods.
The quote does not say that “there is no ENSO cycle or PDO”. It says that the timing is not accurately predicted. It’s a well-known and well-accepted limitation of the models.
John V-
The IPCC says in Figure 8.13 that the models have a hard time with ENSO, getting better but still problematic. The IPCC says “Most, but not all, AOGCMs produce ENSO variability that occurs on time scales considerably faster than observed”.
Lucky for us the Figure highlights 7-years — if the frequency of ENSO evens is too fast this will increase the variability over 7 years.
The IPCC also says (in Exec Sum to Ch. 8):
“As a result of steady progress, some AOGCMs can now
simulate important aspects of the El Niño-Southern
Oscillation (ENSO). Simulation of the Madden-Julian
Oscillation (MJO) remains unsatisfactory.”
http://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4-wg1-chapter8.pdf
If only some of the 55 realizations can reflect important aspects of ENSO, and ENSO matters on 7-year time scales, shouldn’t those of the 55 that cannot replicate ENSO events be tossed out?
JohnV
I see no reason to expect the small number of data decreases the estimate of the uncertainty for the test I am doing below the range I gave. I also don’t see why it would increase it either.
I did a Chi-squared test to estimate the uncertainty in the estimate of the variance based on the sample. For the 82 degrees of freedeom for the CO fit, we get an estimate that says the variance in the trend could be 0.9 C/century to 1.62 C/century. The best estimate was supposedly 1.2 C/century based on 84 monthly data points.
In contrast the empirical data computed from the variance of the OLS and CO fits gave values of 1.37 C/century and 1.32 C/century. Due to the overlap in data and the square-root of (N-1) issues, I can come up with reasons to bump those up to 1.5. But notice this: Those empirical values based on between 5 and 9 independent samples, fall inside the 95% uncertainty range of 0.9 to 1.62 C/century.
If I hugged error bars the way those supporting models are, I could easily say that, “There is no evidence the CO uncertainty bands are too small”!
Or I could say “The historic data are not inconsistent with the CO error bars.” Or, I could even say “The historic data are consistent with the CO error bars.”
But the fact is: The empirical ones look larger. And I falsified 20% of the time, rather than 5% during a period when there are no volcanos. So, I suspect that, given the balance of the evidence, the standard deviation of 7 year trends during a period with a near constant trend is about 1.5 C/century!
Let me point out the obvious — if issues of falsification (or not) hinge on debates over fractions of a degree in the uncertainties in either models or observations, then the comparison can in no way (other than the pedantically technical) be said to be “consistent with” one another (in any meaningful way).
Perhaps we need a new phrase — “robustly consistent with” meaning irrespective of the various arguments on uncertainties . . . 😉
Roger,
It is very difficult to call the current trend “consistent with” a central tendency of 2C/century — at least not the way I use the term “consistent with” casually. When I speak English, I don’t consider excursion that happen less than 1/3rd of the time “consistent with something”. And this looks as if, assuming the largest amount of possible uncertainty bounds I can justify, the true central tendency is less than 2C/century– with probability 90%. Note the jump from 80% to 90%. That’s beacause there is also a 10% chance the central tendency is much, much lower that the mean for the current data.
At best, one could say “not falsified” or “not proven”.
John is trying to suggest the uncertainty intervals could be even larger. But the data describing variability of 7 year trends doesn’t seem to be saying that.
I think it’s worth figuring out a method to get better uncertainty intervals. Otherwise, I’m pretty sure that if we get 1.1C/century over 20 years, we are still going to be arguing over uncertainty intervals!
lucia,
I’m all for getting *better* uncertainty intervals. That’s been my goal the whole time. I’ve been pushing for finding ways to make the uncertainty smaller by correcting for known cycles.
To me it seems intuitively obvious that the range of 7-year trends observed over an 18-year period will be smaller than the range over a much larger period. It is well known that small samples tend to under-estimate the true uncertainty. Of course intuition is no match against statistics, so I did a simple Monte Carlo test:
First I generated 340 years worth of monthly pseudo-temperatures. These are generated using the sum of three sinusoids:
1) 11-year cycle with an amplitude of 0.04C
2) 3.1-year cycle with an amplitude of 0.05C (part 1 of pseudo-ENSO)
3) 4.3-year cycle with an amplitude of 0.05C (part 2 of pseudo-ENSO)
If I look at the 7-year trends over the entire 340 year period I get a standard deviation of 1.13 C/century. This is the true “weather noise” for my pseudo-temperatures.
Next I calculated the standard deviation of 7-year trends over every 18-year period. The mean of these standard deviations was 1.02 C/century. That is, when only 18 years of data is available, the standard deviation of the 7-year trends is under-estimated by about 11%.
—
lucia, you are now moving the goal posts from statistically significant “falsification” to being inconsistent according to everyday usage of the word. I hope you will update your monthly “falsification” to show that the falsification does not hold up using your improved estimate of weather noise.
I agree with you that “not falsified” alone is not a very high standard. If the trend is low, there must be a reason. Based on your results, the current 7-year trend (from 2001) is a 10% event. Should this be expected given our knowledge of known cycles? The cycles that we’ve looked at (ENSO and solar) have both trended down since 2001. The solar cycle peaked in early 2001 and is currently at a minimum. This timing happens once every 11 years. Second, ENSO has gone from roughly neutral to La Nina.
Odds of solar cycle timing: ~1/11 = 9%
Odds of ENSO timing: ~1/2 = 50%
Odds of combination: 50% x 9% = 4.5%
Given the timing of solar and ENSO, and particularly the combination of the two, is it *really* unexpected to be on the low side of 7-year trends?
—
Roger,
I disagree but I’ll let you have the last word on the models.
This thread is about the empirical weather noise.
John V-
Over the past 25 years there have been 4 or 5 La Nina events (depends how one is counted), so those odds are 20-25% not 50%.
The problem with this exercise is that “weather noise” is not independent of the forced component of human-caused climate change. Some argue that variability in ENSO, PDO and other climate modes will also vary with forcing (usually to increasing variability), so historical estimates of variability are not apt to accurately represent weather variability in the context of a forced signal (trend).
John, if it is larger uncertainties you are looking for, I can give you all sorts of reasons for them — both in weather noise and modeled results. And I would agree that the larger the uncertainties the more likely of an overlap between distributions of predictions and observations, and if overlap alone is your goal then great. But if useful information is what you want, the larger the uncertainties the less useful the information.
Now it may be the case that uncertainties are so large that nothing else can be said (Urs Neu has made this argument on our blog). If that is the case, then we should not pretend that the uncertainties can be reduced. If so then modelers should say so and not represent their models in any other way.
Roger, as I have said many times (including a few minutes ago), larger uncertainties are *not* my goal. My goal is *better* uncertainties. I have been proposing and pushing for ways of making the uncertainty smaller. I hope we can agree that non-standard interpretation of statistical terms is not a way of reducing uncertainty.
The problem is that without some sort of data processing the uncertainties on 7-year trends are very large. That’s an unfortunate fact. Of course, as the trend period gets longer the uncertainty gets smaller.
If the transition from neutral to La Nina is really only a 25% event, then an observed trend on the low end of expectations is even less surprising. I stuck with 50% odds to make my statement less controversial. That is, half the time ENSO will trend up over 11-years and half the time it will trend down.
John V.-
If you want “better” uncertainties, then you will recognize that
(a) The effects of the solar cycle on global temperatures are contested, meaning that there are multiple legitimate views on it. So You will need to consider this not as an additive factor, but a case A (with) and a case B (without)
(b) There are ,a s Kevin Trenberth likes to say, many “flavors” of ENSO events with different implications for global temperatures, they are not as cleanly sinusoidal as you suggest. Further, the temporal pattern of ENSO events is not ABABAB with a reliable return period. The 1980s and 1990s were characterized by a preponderance of warm events. So just like with volcanoes, you’d need to make sure that your test period is representative of the observational period.
In (a) and (b) above choices have to be made, there is no single unambiguous answer. the uncertainties themselves have uncertainties. They cannot be resolved. The uncertainty range that you and Lucia are debating is probably 1.3 +/ 0.3, and that is all you can say.
Are the predictions falsified based on this? I don’t know, doyou want them to be? 😉
Could they be? Sure.
Could they not be? Sure.
That is all you can say.
I’m aware of this. See my earlier response that mentioned the square root of N-1 issue and mentioned the number of samples:
1.02 C/century * sqrt(5.5/4.5) = 1.13.
I already included that in my range. That’s how I got from 1.3 for my short sample to 1.5 C/century.
1.35 * sqrt(5.5/4.5) ~ 1.5 C/century
That’s why my post says 1.5 C/century as the upper range instead of 1.32 or 1.37 measured! 🙂
So… now you’ve resolved the correct thing is sqrt(5.5/4.5) instead of sqrt(11/10)! I wasn’t sure if which was the better factor, because I wasn’t sure which better measured the lack of independence due to overlap.
So…. I still see no reason for more than 1.5.
I’m not sure what you mean by moving the goal post?
What do you consider the everyday use of that word? By my everyday use of the word, the two are inconsistent. I consider something that was predicted but which seems far off the mark and which where the deviation could happen only by chance 1 time in 10 to be “not consistent”.
By IPCC terminology, the current IPCC projection is inconsistent with the data to “High confidence”.
But yes, by my general rule of thumb, I don’t consider it falsified as a null hypothesis when it could happen 1 time out of 10. I do when it could only happen 1 time out of 20.
Right now, I can’t say it’s falsified, but I can say it appears inconsistent. How is this a problem?
JohnV–
If you have ideas for “correcting” data to get better uncertainties, you will likely have to do that at your blog. I consider examining the solar cycle to see if it could explain the outlier a back of the envelope calculation. I think similar things of “correcting” for ENSO etc.
When you suggest a cycle, I’m always willing to look at the possible order of magnitude of the effect, but I don’t think anyone can justify smaller uncertainties intervals after correcting for these effects. The more you “correct” the data for cycles, the more difficult it is to estimate the uncertainty intervals because you need an estimate of the uncertainty of your correction. (These will always be high.)
But if you think you can correct for cycles, and estimate the uncertainty intervals after doing so, you are free to do whatever correction you wish and present them to the world, here or elsewhere. Maybe if you did them, and documented them, I would become a convert to the theory of correcting data for cycles to get better smaller uncertainties.
But right now, I’d prefer to just know the correct magnitude of the uncertainties given the simpler hypothesis test I am applying and the data I am using.
lucia and Roger:
I hope I did not give the impression that my “pseudo-temperature” was supposed to be anything more than a very rough test for the effect of a short sample period on the uncertainty of 7-year trends. There’s nothing conclusive in my little experiment. I realize that neither ENSO nor the solar cycle are purely sinusoidal.
lucia, I must’ve missed the comment where you explained how you got from 1.3 to 1.5 C/century. I’m glad we agree that the raw uncertainty based on 18-years of data is likely too small (and that you’ve already corrected it).
The magnitude of the solar cycle is indeed not well known. I have seen studies that range from 0.06C to 0.18C (peak-to-trough). Empirical results seem to cluster around 0.1C peak-to-trough at the surface. ENSO and volcanic effects are also not particularly well known. In all cases, I believe it is possible to correct for these known effects (at least partially) and reduce the uncertainty in the trends. I’ll see what I can do.
For me, weather is what I see in my backyard on a day-to-day basis. Climate is said to be a 20-to-30 year average over the entire planet. The processes being discussed in this thread occur on about a 7-to-10 year basis and affect most of the planet. Both the temporal and spatial scales correspond to climate. They are climate variability, not weather variability.
An accurate nomenclature is important. (Several recent blog discussions strongly reenforce this concept.)
As an aside. GCMs do not resolve weather. GCM calculations cannot introduce weather noise. GCM calculations can introduce noise of a so-far unidentified nature.
Lucia says:
Even if we assume these cycles exist we cannot assume that they cancel out over the medium and long term. Asymmetric cyclical ENSOs or solar cycles could introduce a trend into the data. John V’s estimates all assume that the cycles are symmetric.
Raven-that is a good point, and it occurs to me that there is more than one index for ENSO. Which makes accounting for it tricky. If you use the 3.4 region, I think that the long term trend is negative. If you use SOI, the long term trend is “positive” (actually negative-the sign is backwards)…I don’t know about ONI. But how much “masking of” “attenuation of” (as Josh Willis puts) global warming ENSO has done in the last three decades depends on the data you wish to use. Tricky business…
Lucia-in account for ENSO, what index do you use?
Andrew– I used the MEI.
I consider “accounting” for these period effects back of the envelope calculations to see if we can explain the outlier based on known weather cycles. As it happened, if we used MEI, it did account for part of the negative trend– but reduced the uncertainty intervals in a way that retained a falsification.
Does the following make sense?
If the climate models were perfect, then the weather noise would equal the variation in models – i.e. the amount of variation due to small perturbations in initial conditions.
So if the models are not perfect, what differences would there be between the model variation, and the true weather noise?
I think:
Variations either up or down in so far as the model doesn’t capture the actual physics correctly. For instance if Lindzen’s proposed Iris effect is real, then the model’s failure to capture what would be an effect causing damping on weather variation would lead them to overestimate variation.
Other variations due to noise introduced from outside factors. For instance if there is noise in the TSI forcing, but the model assumes a smooth TSI input this would mean more weather noise than the model variation.
So I would expect it to be more likely but not guaranteed that the true weather noise would be greater than the modelled weather noise.
You have measured the weather noise to be less than the modelled weather noise. If this could be established with enough statistical confidence, could this be telling us something interesting about the models?
Michael, I have proposed that one estimate of weather noise would be to take the magnitude of year to year changes-if this is the way to go, you get a really huge figure of almost .4 degrees C as the max. That is almost as large as the trend, which by itself is interesting. I’m pretty sure that no model has weather noise that large. But I’m really wondering what other people think of this method-is it a good way to go? Is it wise to use the maximum value from this approach? My logic is that year to year changes would have little to do with changes in external forcings, and thus ould be regarded as internal-which I presume is what people mean by “weather noise” (another question-is that my misunderstanding, or is that basically correct?). On the one hand, this would making assessing models very difficult, as this is a huge level of noise. But on the other hand, it does cast doubt on our ability to attribute “most” of any trend to a particular external cuase.
Dan Hughes wrote
I think nobody is getting that Dan .
When one integrates over space , one takes on board spatial autocorrelations .
When one integrates over time , one takes on board temporal autocorrelations .
When one does both , one obtains an infamous mixture of both .
During those past 5 years I have been trying to figure out what such a monstrosity like the “global average surface temperature” could possibly mean physically .
I still don’t know .
Nobody cares about spatial autocorrelations even if Steve McI made a post with some very perturbing results about it some months ago .
To my knowledge there is no serious study dealing with statistics of series both spatially and temporally autocorrelated .
The only thing that seems rather sure to me is that , indeed , whatever variability the models show for this hybrid parameter has nothing to do with weather and very little with its variability .
Actually it seems to me impossible to compare model outputs with reality because the reality does true weather spatio-temporal averages (so represents a kind of average “weather” whatever it might mean) while the models don’t and never did .
If anything , because of their coarse scale and approximated/smoothed physics, the models exhibit a variability much , much smaller than the true weather variability .
Another data point: the standard deviation of the 7-year trends in the 100-year control run shown in Hansen et al.’s 1988 JGR forecast paper is about 1.3C/century.
He’s not saying it’s not there; he’s just saying that it’s not the same realization as the real weather, in part because the models aren’t initialized to the real state of the system.
This is true more generally; i.e. even if you forget about cyclical dynamics, models won’t tell you anything too specific about a particular 7-year interval unless they are initialized for that interval – which they are not, as Trenberth pointed out. By looking at an ensemble from a single model (to avoid the intermodel variation issue), one can at least calculate a distribution of trends given forcings over the interval. Presumably the distribution of possible trends would be narrower if one could refine the initial state for a particular forecast, but no one has done that to my knowledge.
If the runs are all from the same model and parameterization, the distribution would be contingent on the model’s innate unforced variability (i.e. weather noise), which in turn would depend on various choices. If you look at absolute temperatures, a lot of the variability is likely intermodel differences in state that could be fairly persistent. Looking at the trend subtracts out some of that, but introduces lots of other problems.
If this is certainly possible, though it sounds like a 2nd order effect. It would seem to favor models over history as a source of estimates of intrinsic variability.
What you want is likelihood instead of overlap. Consider the normal distribution, for which the likelihood of an observation is
L = 1/sqrt(2*pi)/sigma*exp(-(x-mean)^2/2/sigma^2)
Increasing the uncertainty (sigma) broadens the distribution, raising the exp() term but lowers it everywhere due to its presence in the denominator. If you evaluate forecasts by likelihood it puts the modelers’ incentives in the right place, i.e. the optimal forecast maximizes L, which is expected to occur when sigma is neither over- nor understated. Then there’s no incentive to cheat by making sigma uselessly wide; if it’s wide, it’s because of genuine uncertainty, which is not useless information.
Tom– Did Hansen give that specific statistic in the paper. What a coincidence! (I have the paper – I’ll have to look that up!) But that’s interesting because it’s so close to what I’m getting.
I think I agree with your interpretation of what Kevin Tenbreth says. Models initial conditions vary and for a variety of reasons. I was peaking at the LLNL page with the archive of IPCC runs today. Even aside from the spin up, etc. when looking toward predicting post-2000 temperatures it appears some were initialized with pre-2000 forcings that included volcanos/ solar etc. Some did not. The AR4 document says the modelers were allowed some latitude on loadings of certain ghgs even within a scenarios. So, even when a scenario is called “X”, the runs share a commonality in forcing– but they aren’t all the same.
So, taken all together, one needs to really look to figure out how anything in those runs corresponds to “weather” variability. (Interestingly, the IPCC itself does not discuss whether the individual runs are in any way, shape or form, comparable to individual realizations of weather even for the forcings applied!)
Lucia – I scanned the graph and digitized the points. Then I computed all the 7yr trends and computed the SD. Take it with a grain of salt; I haven’t done QC – recentering points at year intervals and checking for missing values – but I can’t imagine the results will change when I do.
I’ve looked at the PCMDI archive too. Some of the models (GISS E for example) have large ensembles for the 20thC so it should be possible to get at model weather without dealing with the intermodel issue. The problem I ran into is just that everything is gridded, so it would take ages to get all the data and compute means. I think RC already did that for the graphic here. I made a request in comments (#367) but it probably got lost in the shuffle.
Tom–
I got a login yesterday. I noticed everything was gridded too.
I think Gavin did download and plot to make that graph. But, I disagree with his interpretation that all that scatter is “weather noise”. Maybe you can re-request the text file showing data in that graph. I disagree with Gavin’s interpretation– or at least can’t agree with it based on what he actually wrote and showed.
I might have jumped in comments there, but I wasn’t pinged. By the time I was ready to comment to insert a comment into those moderated threads, Jerry Browning has started in on his hobby horse. I’m afraid I made myself a personal promise to just not get into the whole “ill posed”, “unphysical dissipation” thing again.
As for the data: Gavin is often pretty open with data files, so he might make his post-processed data available.
I agree that it combines 2 sources of noise. I’d be happy to request. There’s no contact info at RC anymore – I hesitate to email Gavin at NASA, but that would seem to be the only option. Do you know an alternative?
I’ve emailed Gavin in the past. He even answers.
I know he prefers requests about the posts at RC. But he knows he misses some– there are a lot of comments over there. I wouldn’t be at all surprised if your request got lost amongst all the discussion of “unphysical viscosity”!
For what it is worth, there was a stratospheric volcanic eruption in 1928 (Paluweh, most likely). Also in 1932 5 volcanoes erupted simultaneously in Chile, but any stratospheric aerosols were confined to the Southern Hemisphere.
I heard back from Gavin – unfortunately, the data behind the plot isn’t his, and the owners aren’t ready to publish yet. I’ve found another source though and will share when I can.
Tom, if you find another set, great! 🙂
KhuhKat–
When I use the term “bad data”, I mean it can be bad for any reason. The data in a record may now be bad because people didn’t trust it, adjusted it incorrectly, and turned originally correct measurements into bad data. The data may have been untrustworthy in the first place.
But, regardless, if the dips or rises were due to changes in measurement techniques– from buckets to inlets back to buckets– and that problem wasn’t recognized at the time, then the data are unreliable in the sense that we can’t accurately determine the true changes in the SST.
That’s bad data.