Developing Estimates for Uncertainty Intervals

As most readers know, I am very interested in comparing IPCC projections and/or predictions against observations of weather. Many questions come up during the course of discussing the data comparison between the IPCC AR4 predictions/projections of future temperatures. Sometimes, due to the nature of comments, the questions are not entirely clear to me– and I feel the need to provide an answer, and my motivation is primarily to identify the precise nature of the disagreement over substance. Today, I’ll be addressing a postscript Gavin tagged on at the tail end of comments yesterday.

While discussing how to estimate the uncertainty in estimating the trend in GMST, Gavin said:

PS. do not confuse the distribution of a random variable (the trend) with the uncertainty in defining the trend in a single realization. They are not the same. Try some monte carlo experiments with synthetic AR(1)+trend time series to see.

I have to admit this left me scratching my head because:
a) I have done monte carlo experiments on synthetic AR(1) + trend time series and
b) the of monte carlo simulations show that “the uncertainty in defining the trend in a single realization” is a very good estimate of “the distribution of a random variable (the trend)”.

I found this result unsurprising because my undergraduate textbook “Advanced Engineering Mathematics” by Erwin Kreyszig, sure seems to suggest the point of estimating “the uncertainty in defining the trend in a single realization” is precisely to obtain the best estimate of standard deviation of the random variable (the trend). The econometrics articles seem to say the same thing.

So, today, I’m going to show one simple case, which will happen to be one with no trend which I had already done for reasons unrelated to Gavin’s “postscript”. (I’ve done a few quick ones with trends; I’ll be doing more later. If someone wants a specific one, I’ll re-organize the order I’m doing things, and provide a specific case.)

I think I will manage to show that the uncertainty in defining a trend in a single realization is at least supposed to be the best estimate of “the distribution of a random variable (the trend)”. This is so well known, that I suspect I will discover that Gavin meant something a bit different– and with luck we may eventually figure out why we see the estimate of uncertainty intervals so differently.

What I calculated

  1. I wrote a script to generate a time series with AR(1) noise. I picked a slope of mtrue=0 C/century, and an autocorrelation of ρ=0.8.
  2. I used it to create a single time series with 84 data points, computed best estimate of the trend using Cochrane-Orcutt. CO was applied assuming the autocorrelation is known. (This is different from what I do when I test data. I’ll be discussing this when I’m finally done with a whole bunch of different runs.) So, this gives me an estimate of the trend mi.
  3. I used the C-O to estimate the standard error in the estimate of the meant trend. If you use LINEST in excel, they denote this sm. All add an “i” to denote this is for case i.
  4. I repeated this 10,000 times. So, I end up with 1000 mi, sm,i

Statistical theory says if I’m applying Cochrane-Orcutt to time series with a known trend, plus AR(1) noise, then ,

  1. Individual trends may not be zero, but on average, the sample trend will be equal to the true trend. Since, mtrue is zero, this means if I do a whole bunch of tests, the average result for m will be 0.
  2. If repeat the experiment an infinite number of times, the trends will have some distribution, which is often characterized by the standard deviation σ. I can estimate σ by running the experiment many times (say 1000) and computing the standard deviation of all trends, “m”.
  3. On average, the standard error computed over all realizations, sm will be equal to σ.

In other words, the theory claims

“the distribution of a random variable (the trend)” == “the uncertainty in defining the trend in a single realisation”

At least the statement above is supposed to hole in some average sense. (And of course, it must be worded in some truly proper manner, which no one but statisticians truly understand.)

So, anyway, as I said I did a bunch of computions. Here is the result in graphical form:

Click for larger

The histogram shows the distribution of all trends all 10,000 trends. As expected, the average is very near zero (0.0037). In order to have units pretend these are temperature and each of the 84 data points are spaced by 1 year. The using “stdev” in excel, the standard deviation for all 10,000 computed trends was σ= 0.1641.

The average off all 10,000 sm computed was 0.1644. That’s pretty darn close to 0.1641! Sure looks like the estimate of the uncertainty in the meant trend sm computed using CO is sure darn close to the standard deviation in the mean trend computed using CO.

What about those Falsifications?

For what it’s worth, I also tested to see how often Cochrane-Orcutt would reject m=0 (the true trend), if I used a confidence of 95%. (This would be an incorrect falsification.) The theory says if I should incorrectly falsify 5% of the time under this circumstance.

So, how often did I falsify? 4.49% of the time. So, oddly enough CO falsified less often than it’s supposed to.

I assume the reason I falsified at this too low rate was mostly due to the small sample size of 10,000. Since falsifications should be binomially distributed, I actually estimate I should have falsified 5% ± 0.4% of the time. So, in principle, I did falsify a bit too few times– but it’s not all that much of an outlier.

So, with regard to this:

PS. do not confuse the distribution of a random variable (the trend) with the uncertainty in defining the trend in a single realisation. They are not the same. Try some monte carlo experiments with synthetic AR(1)+trend time series to see.

Using AR(1)+ a zero trend shows that sample standard error in the trend is in fact equal to the standard deviation of the random variable (the trend), provided that we are applying the correct statistical method to estimate the trend and compute the standard error.

So, Gavin must not mean what I understand him to be saying. Because, as I understand those words, Cochrane Orcutt applied to a single realization does provide an estimate of the distribution of all trends possible trends in the population for a particular AR(1) process!

But you want to know what this means for the falsifications, right?

With regard to the falsification of the IPCC results, this still leaves open a bunch of questions. These include: Are the data AR(1)? Probably not exactly. If they aren’t AR(1), are my uncertainty intervals too large? Or too small? And how do I estimate the proper size?

Oddly enough, the reason I had previously done the analysis here is to try to look at something a bit different. I am ramping up to figure out what the size of the error bars should be if the weather data are the sum of AR(1) (for the weather itself) and white noise (for the measurement errors.)

I’m looking at AR(1) plus white noise because it’s simplest increment in complication over AR(1) noise. Neither ordinary least squares nor Cochrane-Orcutt work perfectly for this. And… I’m curious about the issue, so I’ve been running cases! My original plan was to get those done, show results and use the new uncertainty intervals. But, since the “ps” came up in comments, I thought I’d show this preliminary result now!

49 thoughts on “Developing Estimates for Uncertainty Intervals”

  1. I think you missed my point. Plot the distribution of the s[m,i] and calculate the likelihood of one single realisation giving you an s which is close to the mean of all s (i.e. 0.16). You only have one calculation in the real world, not 10,000 and so you do not know what the mean ‘s’ is. Additionally, you are not allowing for any uncertainty in what C-O gives you for p in a short time series – that should widen the spread even further.

    However, for the case that is closer to what I did with the models, I calculated 9000 7-year AR(1) time-series with an underlying trend=2 degC/century and with p=0.1. The distribution of the resulting OLS trends is N(2.0,2.2), but the range of the s was from 0.16 to 3.8 (mean was 1.7). The chances of getting within 25% of the s.d. in the trend distribution (i.e. an s between 1.65 and 2.75) is just under 50%. The chance of getting something less than half as big, is ~15%. Therefore there is a significant uncertainty in what the real s is given only one measurement of it.

    You can calculate the ‘falisification’ rate as you define it though: the fraction of cases where m+2*s < 2. …. this turns out to be 5%. So you aren’t doing too badly. 🙂

    However, and this is the real point, the distribution I calculated for the models is completely consistent with this simple model, and you cannot claim that my calculations for the conditional probability for a 0.2 degC/century trend over the next couple of decades is spurious. Thus while your test is valid as far as it goes, it doesn’t tell you what to expect, and it certainly doesn’t falsify the IPCC projections (which are the ensemble of the models and are quite consistent with the uncertainties that one can vaguely discern in figure TS.32). The fact remains that the current trends are all within the 95% envelope of the model runs, and these short term trends have very little information about the longer term trends. Thus they are, in any useful sense, meaningless.

    To be specific, for the 5% cases which would fail your test, their distribution of trends after 30 years (same recipe) is N(1.9,0.2), compared to N(2,0.2) for the full set. That is, a falsification now, has little to no information about the long term, which IPCC, quite rightly, highlights.

  2. Gavin–

    The fact remains that the current trends are all within the 95% envelope of the model runs, ,,,

    Gavin, this is where you and I simply don’t see eye to eye. I think the fact that the current trends falls within 95% of the model runs answers a different question from the one I’m asking and answering.

    It is entirely possible for the central tendency of the models to be fall outside the range for the earth data data and at the same time for the data to fall inside the range of model runs.

    All you need to do is add a few crappy models and/ or a few models with ridiculous amount of “weather nose” such that you spread the model range from minus infinity to plus infinity. Then after doing that include too many on the highside. In this case, everything falls inside the 95% confidence intervals for models, but, the average for the models remains too high. (I know what I am describing is reducto-ad-absurdum. But seriously, get some paper out and draws a tight distribution for the earth’s “weather” and a ridiculously wide spread for the models. I know no one intends to do this. But, it’s something that can happen unintentionally. )

    You may not quite “get” my discussion of “Are Swedes Tall?”, but rest assured that plenty of other people understand this idea. If you are going to counter argue it, you need to understand that this is what people are suggesting is wrong with your idea of using the model variances. (I’m really not the only one saying this. )

    Anyway, if thing go wrong in just the wrong way, the earth data falls inside the range of the models. BUT the central tendencies for models is inconsistent with the real data from the earth!

    I think you missed my point. Plot the distribution of the s[m,i] and calculate the likelihood of one single realisation giving you an s which is close to the mean of all s (i.e. 0.16). You only have one calculation in the real world, not 10,000 and so you do not know what the mean ‘s’ is. .

    Ok… So, what is the point of calculating the uncertainty in Sm? Of course there finite uncertainty in Sm. But is Sm a quantity of *primary* interest? I think not!

    I, and many others, want to determine the likelyhood that the trend m= 2C/century, projected trend by the IPCC to apply right now, is correct. That’s what I’m testing.
    So, we estimate the mean trend, and the uncertainty in the mean trend. Then, we determine the likelyhood given the data that 2C/century is consistent with the data.

    There are standard methods to do find the uncertainty in the trend, m. I’m using two. Sure there is uncertainty in “Sm” (the estimate for &sigma.) But the standard methods account for the uncertainty in determining the σ when estimating the uncertainty in “m”!

    That’s what the “t” test does. That’s why we look up the “t”.

    What’s his name, “Student” came up with these method to estimate the uncertainty in the mean value that incorporated all those other uncertainties. We don’t need to know or explicitly calculate the uncertainty in Sm to do the t-test! (And thank heaven’s we don’t!)

    However, to indulge you, and since the value is already calculated in my spreadsheet….For this particualr problem, standard deviation for the 10,0000 sm calculated using Cochrane Orcutt is …. ta ta ta da…. ± 0.0129

    So, based on the CO fit, we get sm = 0.1644 ± 0.0129 with the range expressing the standard error in Sm. In other words: the uncertainty is small.

    Tomorrow, I’ll do this for an AR(1) process with the autocorrelation and standard error in residuals for OLS that matches the data (not your models) and tell you want I get. (But, I’ve done enough of these that I can tell you right now that I get roughly 6% rejections when I claim I’m getting 5%. So, we are talking about a 1-2% difference in the size of the uncertainty intervals.)

    Additionally, you are not allowing for any uncertainty in what C-O gives you for p in a short time series – that should widen the spread even further.

    I said that in my post. 🙂

    I can discuss that if you like. But for the properties of this particular set of observations it doesn’t make anywhere near the difference you suggest. (There are other things that might, but I don’t know if those things make the uncertainty intervals larger or smaller. We’ve discussed them here. I’m trying to explore those. And, possibly if I got a hold of the individual time series runs used by the IPCC or for Model E, I could figure something out, and at least explain what I did.)

    But basically: As far as I can tell, given the properties of the data we actually have, the much larger uncertainty intervals don’t spring up from the need to use iterative CO.

    They come from the possibility (or probability) that the data aren’t AR(1). (This is an issue I’ve been trying to figure out. There is an avenue I could explore if I had access to the time series of projections in the IPCC. But, the site I found has– as far as I can tell– gridded data, which is a pain in the neck!)

    The distribution of the resulting OLS trends is N(2.0,2.2), but the range of the s was from 0.16 to 3.8 (mean was 1.7).

    Hmmm…. You may need to tell me the specifics you put in your generated AR process, and the amount of “weather noise” you included and the autocorrelations you assumed. Because, if memory serves me correctly, when I run cases that match the properties of the data I get means that converge to the mean of 2.0 much more quickly. (Of course this could also be my poor memory.)

    Tomorrow, I’ll run these with an AR(1) process that a) gives the correct standard error of residual for a fit and b) the lag1 autocorrelation that matches the data (not the models.)

    If you email me the values you use for AR(1) and standard errors in residuals, I’ll run those too. Then we can compare.

  3. Ohh… for readers who might want to know when comments arrive on a particular post, you can subscribe to comments. Look near the “post” button to find the option to subscribe to a comment thread. If you click, all responses to your comments will arrive in your inbox. (The difficulty is you’ll get everyone’s comments on a thread. I might need to look for a plugin to let people subscribe to answers from targeted individuals only. )

  4. I think the fact that the current trends falls within 95% of the model runs answers a different question from the one I’m asking and answering.

    I think this is a key issue. I think that there are two different questions with different answers, each with a different implication:

    Gavin’s question – ‘do the observations lie within the range predicted by the models’ – yes.
    My understanding is that a key assumption of IPCC etc is that the actual behaviour of the climate will lie somewhere in the range of behaviours predicted by the climate models. If we can show that this has held true for the data we have to date, then our confidence that this will hold in the future is reinforced. If we can find a significant difference between the range of modelled behaviour and actual climate behaviour then we have little grounds for confidence that actual behaviour will lie within the modelled range in the future.
    The implications of the fact that Gavin’s question is answered yes, is that we can be confident that the futurre climate will remain within the bounds predicted by the models.

    Lucia’s question: ‘Do the observations match the central trend predicted by the models if we assume AR1 noise is the only possible source of difference.’ – no.
    The ‘interesting’ implication of this is the idea that we may be able to rule out the central trendd (and higher trends) predicted by the models. However to do this we would need to understand the behaviour of the models better. If the answer to Gavin’s question really is yes, there are a significant number of model runs that are reasonably close to the actual behaviour of the climate since 2001. How do these models behave for the rest of the model run? Do they all run with a lower trend for the full prediction period? If so then the data we have gathered since 2001 can probably be used to rule out some of the upper range predicted by the models. Or are they just as likely to have a high trend over the full prediction period as any other model run. In which case we would not be able to rule out any of the range of model predictions.

    Another possible implication is that something other than AR1 noise is the difference, something that is not taken into account in the models. If we can identify what this ‘something else’ is then we can decide whether/how the model prediction, or Lucia’s validation method should be altered to take this ‘something else’ into account. From my climate knowledge I think solar influence would be the leading candidate for this ‘something else’

  5. Micheal:

    The ‘interesting’ implication of this is the idea that we may be able to rule out the central trendd (and higher trends) predicted by the models. However to do this we would need to understand the behaviour of the models better

    Yep.

    But of course, the issue could also be the forcings mis-matching the current or past forcings. If the problem is the SRES, then we can’t throw out any particular GCM, because that wouldn’t be where the problem is. The problem would lie in our inability to estimate GHG’s, aerosols etc. That’s still a modeling problem because people do come up with predictive models to estimate future economic activity, emissions etc. But it’s not a GCM problem.

    The other issue that we are seeing here are. I think

    * many readers seeking guidance really do want to know whether the central tendencies — communicated by the really heavy lines in the figures — are likely to track well, or whether they might be biased high or low.

    * it makes sense to include the error bars based on the actual earth observations on these graphs. Estimating the uncertainty associated with observed trends and data is routine in other fields. It is a much stricter way to test the validity of model predictions. The absence of this sort of information in IPCC reports or journal articles related to climatology is one of the reasons for the violent arguments over whether or not climate models have been validated against data.

    To many people, true validation requires treating the observations as primary when comparing the predictions to the experiments. It’s true we have only one real earth and many models. But there are statistical methods to estimate uncertainties in determining the average trend for the real earth based on time series of observations. But these often seem to be given short shrift in documents comparing model predictions to data!

  6. The models don’t ‘have ridiculous amounts’ of weather, and the answers you get even if you screen out outliers are the same as if you don’t (though if you want a different screening than I did in a previous comment let me know). A simple AR(1)(p=0.1)(sigma=0.1)+trend has exactly the same spread for instance.

    As for the different questions, you are correct we appear to be asking different ones. However, the one that everyone really wants to know is what is going to happen in the future. I am therefore using the past behaviour to see if that affects the projections in the future. Your question is purely about the past, yet you frame it as being about ‘falsifying IPCC projections’ (and in particular the multi-decadal trend). Hence the confusion, and hence my attempt to answer the actual question that many people think you are asking.

    Look, the fact is that the IPCC projections on short timescales have very little explanatory power, (for 7 years N(2,2) remember) therefore there is very little point in getting excited about short term deviations from the mean, let alone updating all analyses every month thinking that something will change. That will get awfully dull.

  7. Gavin,

    … therefore there is very little point in getting excited about short term deviations from the mean, let alone updating all analyses every month thinking that something will change.

    While I agree that there is little point getting excited about short term deviations, I see nothing wrong with updating her analysis every month thinking something will change. I mean, if a short term deviation doesn’t change in the short term, it is not a short term deviation.

    This maybe a dumb question, but when does a period of time stop being short term? Or to phrase it another way, how long does this short term deviation have to last before it is significant?

  8. Lucia – the problem can’t be the SRES because there is almost no variation in GHG levels through the present time among the different economic models – at least as I understand it. It takes decades for the different emissions levels to make a significant difference in the accumulated GHG levels.

  9. Gavin,

    However, the one that everyone really wants to know is what is going to happen in the future.

    Who is “everyone”?

    While I recognize the question you ask is useful, I have never met anyone other than climate modelers who asks the question you think “everyone” asks. In any case, I think “everyone” has more than one question.

    I’m mostly grounded in empiricisms, so when I see a model, I ask: does the average of the models fall in the range consistent with the data? (Let me tell you, if 10 models predicted the lift on an airfoil and the average was biased to the extent of being inconsistent with data, I’d want to know this. )

    I may not be “everyone” but I want to know both what will happening in the future, and whether or not the mean of the current prediction for the future lies on the high or low side compared to real earth data. The answer to the second is: The mean predictions in the AR4 are on the high side compared to earth data and the deviation is sufficiently large that the mean lies outside the range consistent with the earth data.

    Hence the confusion, and hence my attempt to answer the actual question that many people think you are asking.

    You seem concerned about other people being confused or the way I frame the question I ask. Your concern seems to center around my use of the word “IPCC projections” and people thinking I must be answer some other question.

    I have consistently stated the questions I ask and test. I state what I have falsified. The falsification applies to the 2C/century which is the central tendency of the IPCC projection. Projection is the word the IPCC uses for what ever this “thing” is. What other way is there to “frame” what I find? Pretend I just picked 2C/century for no reason at all? Suggest the IPCC picks in appropriate words? What?

    As for answering the question many people think I am asking, who are these people who think I’m answering the question you reframe? And what makes you think you know what question they think I’m asking? What makes you think they know what question you are answering?

    Your blog posts leave the question you are testing unstated. You certainty state your answer with great fervor, riddling the post with words like alternate answers to the unstated question are “bogus”.

    In my opinion, your failure to explicitly state the question, to suggest that “others” have given different anwer to your unstated question, or to distinguish the question you answer from the one I ask is one of the main reasons your “many” might believe we are examining the same question.

    As it happens, I have taken active steps to overcome the possible confusion. You are likely unaware of the extent to which I emphasize the existence of different questions. Heck, I devoted a blog post to explain the difference. See this post written in May where I state:

    Yesterday, I emphasized that the question I am asking and attempting to answer is:

    Q1: Does the IPCC AR4 forecast central tendency of 2 C/century fall within the range of trends consistent with the real earth?

    When I use the term “falsify”, I mean it in the sense that the answer to Q1 is “No, 2C/century central tendency forecast is not consistent with the trends observed on the real earth.”

    In comments here and at other blogs (like Roger Pielke’s) many visitor often ponder this different question:

    Q2: Does the temperature trend experienced by the real earth fall within the range of all trends exhibited by all models used to create the IPCC prediction?

    The two questions share similar words, but they are different questions. They have different answers. I believe the answers to the two questions are:

    The blog post continues.

    If you think there is confusion somewhere out there or among “many” readers of your blog, might I suggest a simple remedy? When you detect this confusion, you can explain that we are asking different questions and explain what they are.

    Look, the fact is that the IPCC projections on short timescales have very little explanatory power, (for 7 years N(2,2) remember)
    Evidently not! Yet, despite this, the IPCC still shows a big dark continuous line indicating the mean trend starting from year 2000 in the technical summary and various other places, and with relatively narrow error bounds! 🙂

    In my opinion, one of the difficulties is that you and others communicate this lack of explanatory power as being due to weather noise for the real honest to goodness earth. This idea is communicated in full snark mode as (e.g. “(weather!)”)— implying that those who doubt the tracking ability of models are confusing “(weather!)” with climate.

    Yet, if we look a the data, and compare the difference in my results and yours, it appears the problem is not the real earth weather noise. The problem appears to be that the variabiilty in 8 year trends predicted by models is larger than that consistent with real earth weather noise. The other problem is that you wish to avoid testing the trend using the measured variability associated with true earth weather.

    Absent volcano eruptions, right now, it appears the lack of explanatory power of IPCC projections over short time scales is likely due to the limitations of models ability and/or the forcings in the scenarios — not the real earth weather noise!

    On screening of the model predictions:
    Why you are suggesting screening models to me? Screeing models is useful if you are comparing models to models. It may turn out to be necessary to obtain reliable predictions. But, my analysis is based on determining what could be consistent with real earth data. How will screening the model runs affect the magnitude of trends consistent with actual earth’s data?

    If you screened models and still came up with 2C/century as the best projection of the screened models, 2C/century would still be inconsistent with the current range of trends with empirical support. So…. maybe screeing give is a hint about the cause of the biase in the model mean projection. Maybe its a common factor: the SRES were biased. Maybe there is an “unknown unknown” that is affecting models for the current periods, but not previous ones. (Needless to say, we can’t know what the unknown unknown is!) Screening might tell us many things.

    But unless screening results in a different trend– one lower than 2C/century right now, then I would continue to say the model predictions for the current periods are inconsistent with the earth data!

    FWIW: There are some tests that I’ve thought of that I could run compared to individual model realizations. Steve Moscher dropped a link to the IPCC trove of what appear to be time series. I’ve asked for access. If I get it, I’ll runs some tests that I think make sense to me, and then explain what I find. I pretty much blog everything I find, as there seem to be people interested in any and every particular question and result. Who knows? Should I get a hold of the time series, I may report what I find when I compare models to models or models to the average. I’m a bit curious myself.

    Meanwhile, I’ll run AR(1)(p=0.1)(sigma=0.1)+trend sometimes this week. I’m not sure I can do it today. But I’ll report what I get for various things. 🙂

  10. Arthur Smith–
    Yes. I understand that the AR4 predictions found little difference between the SRES. I agree that should be an indication the problem is likely not the SRES.

    However, it’s not entirely impossible depending on who the SRES forcings before they diverged. The difficulty with systems with large response times is that getting off track a bit in the past can result in incorrect amounts of heat “in the pipeline”. But, of course, teasing this out would be difficult.

    I want to make it clear that I’m not in the “models are useless” camp. I think they are useful. But I think they are more useful if we examine them critically and admit when there is evidence they are off track.

    I realize that Gavin disputes they are off track. But, so far, neither one of use has convinced the other! 🙂

  11. Raphael quoted:

    therefore there is very little point in getting excited about short term deviations from the mean, let alone updating all analyses every month thinking that something will change

    I sort of didn’t note this. Who’s “getting excited”? Who says the point of a blog post has to be larger than “very little”? Who is expecting anything to change?

    After each posts, my readers ask slightly different questions. Of course I update data. But each month I try to address analysis issues my commenters bring up or I address issues other bloggers bring up. Some people want to know if things change based on data records. Some want to know the effect of ENSO. Some want to know if a seeming counter argument elsewhere affects the results. Some want to know whether the fact that we can’t exclude no warming means anything (it doesn’t. That’s why one set of analysis includes a discussion of type two or “beta” error.)

    Believe it or not, occasionaly analyses are simply updates, but sometimes method change.

  12. Lucia,

    I think your rant at Gavin was a little misdirected. You quoted him saying that “everyone really wants to know is what is going to happen in the future” and replied that you have “never met anyone other than climate modelers who asks the question”.

    Ummm, are you suggesting that only climate modellers want to know what’s going to happen in the future?

    Correct me if I’m wrong, but I think your actual issue is with treating the models as primary and testing if the real-world falls in the envelope of model results (rather than the other way around).

    I think you’re probably frustrated and annoyed. I’ve been there. It doesn’t help the discussion.

  13. JohnV:
    Using the quote there is a bit confusing.:)

    But pause a little and ask yourself this? Are the data comparisons gavin is suggesting remotely suited toward answering that? The answer is No.

    The question the statistical arguments gavin is advancing in support of his model are suited to answering this question: Does the earth’s weather fall within the range consistent with models?

    The one my analysis answers is: Does the model average fall within the range consistent with the earth’s data.

    Both questions are posed by people who want to judge how to weigh the predictions of climate models.

    If gavin began his post explaining that models aren’t falsified by suggesting the purpose of his statistical analysis is to answer the question of “What will happen in the future,” I suspect people would be quite puzzled.:)

  14. Lucia
    Just want to let you know how much I enjoy your analysis. This is a most important issue and is THE critical ‘Question’ that needs to be answered. You clearly enjoy the statistical aspect of this issue and do not, IMO, allow your personal feelings to get in the way of your work. Its scarey that even in statistics we need to be careful with the credibility of the information we are using to perform the experiments.
    Keep up the good work.

  15. Lucia, I think Gavin knows only too well that the question you are asking is different from the question he is asking and that it is a valid – albeit inconvenient – one. But instead of – like you – admitting that his opponent asks a different but valid question, he keeps pretending that the answer to _his_ question – that you agree on – is also answering _your_ question.
    Why doesn’t he, when he is so interested in what you are doing here, try to restate your position using his own words in order to try to establish some understanding? It is like he prefer to “muddy the waters” here. Odd.
    In fact Gavins last post where his main concern seems to be “what many people think” and what “excites” them or not, made me doubt that his main interest is to discuss the validity of your falsification of a 0.2C/decade trend based on temperature measurements…

  16. I’ll nominate myself as a member of the ‘everyone’ who wants to know what will happen in the future.

  17. Let me try to break it down to the essentials. Imagine a climate timeseries that consisted of a trend and a decadal sine wave:

    T= 0.02*(year-year0) + 0.1*sin(2*pi*(year-year0)/10 + theta)

    where theta is a random phase. The trend is well, the trend, and the sine wave is supposed to the weather. This isn’t realistic of course, but it serves to make the point. Every ten years there is a period of negative trend, and in fact there is a seven year period with a significantly negative trend.

    Now lets make another leap and imagine that the models all have the same trend, but because they are not initialised with real world data their ‘weather’ is random – that is ‘theta’ is a uniformly distributed random variable. Since each model is at a random part of the weather cycle (and here it really is a cycle), the short term 7 year trends range from -1.7 deg C/century to 5.7 deg C/century (in one test), i.e N(2.,2.8), while the thirty year trends are much clearer N(2.,0.16). Each of the models has exactly the same magnitude of weather, and yet the spread of their predicted trends is large. Yet for the 7 year trend in any one model has no information about the 30 year trend. The model projections at the 30 year timescale are very robust despite their central tendency being ‘falisfied’ for a short time. Again, the envelope of model projections easily encompasses the actual trend and the short term trends are meaningless for judging the validity of the long term trend.

    The basic point is that there is no expected constraint on the climate to always be within some factor x of the long term trend. Your ‘falsification’ is the falsification of a strawman argument.

  18. Gavin–
    a) We’ve discussed the single sine wave issue here. See Can ENSO really explain away “the problem”? What about the PDO? It appears you are now trying to persuade me that I have to wait more than 10 years because 100% of “weather noise” is contained in 10 year cycle?!

    We know all of weather noise is not lumped into one 10 year cycle. Quite a bit is in the shorter time scales. There is not enough energy in ENSO either by i) using the sine wave idea (advanced long ago by Atmoz) ii) estimating the energy based on the correlation between anomalie and the MEI or iii) using the correction you posted at Real Climate.

    I have, for what it’s worth, done the integration over all theta, as it has a closed form solution. It’s a similar calculation to one assigned to grad student in any introduction to turbulence class and/or junior/senior year experimental methods class.

    Unless you can suggest how much energy in the larger scales so we can compare that to how much energy is in “red noise”, you can’t save the virtue of your models using that sine wave.

    If you want to do the sine wave thing, find the energy spectral density function for. Then it will be perfectly possible to do a closed form analysis to see how far off Cochrane Orcutt is when we estimate the uncertainty bands!

    (Doing this is why I have been wanting the individual trajectories of control runs. I think Dan Hughes has the software and may get the spectra for me. The results would still depend on the models being right in some sense — but your path ain’t workin’!)

    b) I’m puzzled by your use of strawman. I don’t know what strawman argument you are suggesting I am first creating out of straw and then jousting to show the strength of my claim. The IPPC projections actually exist, and I am falsifying.

    c) I’m doing your AR(1) process with a lag 1 autocorrelation for yearly residuals of 0.1. This corresponds to a lag 1 monthly residual of 0.82. I’m checking my numbers, and making sure I match everything yearly with monthly. When I run the exact match, you aren’t going to be happy with the results! I’ll be documenting in a bit, but basically… (I’ll send you a graph.)

  19. if I get get what Gavin is saying, you have climate signal ( a trend), a weather signal ( some
    quasi periodic un forced internal variability) and then Noise. And the magnitude of the weather
    signal is such that on a short term basis it can swamp the climate signal. Is that about right Gavin?

    A couple other things. First thanks for dropping by and spending time explaining your position, Second
    in AR4 the “projections” were made for 20 year periods. For example, 2011-2030 was “projected” to
    be .64C to .69C warmer than 1980-1999, showing a relative insensivity to SRES over the first three
    decades of 2000. I was curious ( the text in AR4 was a bit opaque) why folks didnt do a 30 year
    from 2001-2030? Also, perhaps unrelated to all this, when you speak of .6C of warming being “in the
    pipeline” does that warming come out at constant rate or am I putting too much weight on the metaphor
    of warming in the pipeline?

  20. Lucia, I may have a link to control run data. Somewhere over on CA I posted on it. I’ll go hunting.
    Also, gavin did a nice post on RC awhile back on IPCC data availability. It may be a different site
    than the one I gave you. I will double check for you

  21. Gavin,
    I has written the script to do the monte carlo some time ago. One of the things I specifically wanted to look at was the distribution of autocorrelations.

    The AR(1) noise you suggest with a lag 1 autocorrelation of 0.1 corresponds more or less to a monthly case with a lag 1 autocorrelatin of 0.82 (or 0.88 if I account for the effect of averaging over the month and/ or year as all good experimentalists are trained to do. Otherwise, they risk the peril of mis-understanding their measurements.)

    As you know, the distribution of the lag-1 autocorrelations is unaffected by the trend or the “weather noise” in these monte-carlo tests. I had already run the p=0.8 case.

    What do I find? If the earth’s temperature was AR(1) with an annual lag one autocorrelation of 0.8 you suggest, the probability of an observed autocorrelation of 0.47 or lower is 0.5%. Needless to say, if we assume the data should be AR(1), we would reject (i.e. falsify) your suggested autocorrelation when comparing to data. (Needless to say, the higher 0.82 and 0.88 also will be rejected, and more strongly, should I run these.)

    So, this is consistent with what I have been saying all along: We can’t simply assume the characteristics of the “model weather” match those of real earth weather. (There are many possible reasons for this. I won’t get into that. James Annan got a bit snippy when he read a comment of mine, suggesting that because you smooth the forcings, you can’t really decree Schwartz’s idea about the shape of the correlogram for a planet that experiences white noise forcing. But, one again, I will point out: The forcings on the earth do have some non-smooth components not included in models. Hypothetically, this could make a difference. Or, at least, one needs to make a scaling argument for why they should not. )

    There is some hope for the models overcoming the verdict of falsifictaion on at least three fronts:

    Front 1: Notice that UAH only has a *one* tailed falsification.
    While the rejection is two tailed This may not seem like a big deal but here’s the thing: The lag 1 autocorrelations for the monthly data are
    NOAA: 0.287( p=0.8 falsifies at 5% two tailed)
    Hadley 0.402 ( p=0.8 falsifies at 5% two tailed)
    RSS : 0.409 ( p=0.8 falsifies at 5% two tailed)
    GISS: 0.427 ( p=0.8 falsifies at 5% two tailed)
    UAH: 0.578 ( p=0.8 falsifies but ONLY one tailed.)

    So, you could insist I use only two tailed tests, and argue that because the falsification for UAH falsifies only with a one tailed test, one out of five services fail to falsify. But, of course, one might argue that 4 out of 5 falsifications is a “robust” falsification. (And one would also go run the cases for p=0.88, likely kicking UAH into two-tailed falsification range.)

    Front 2: Suggest the measurements since 2001 contains quite a bit of measurement noise. 🙂

    I’m planning to explore the effect of the measurement noise on the distribution of the autocorelations. In fact, that’s why I was doing this in the background. However, I haven’t run those yet. It may be that if there is enough white noise, p=0.8 for monthly won’t falsify on this count– but that remains to be checked.

    Front 3: Reread my response to your sine wave example.

    If, instead of assuming AR(1), we look at the spectral energy density functions, for the control runs we can do something I’m not going to even try to explain in comments. (Those who understand may have guessed already. Those who don’t won’t understand the comment!)

  22. Thanks Steven!

    I think I’m going to try to get Dan Hughes (or some other volunteer) to do the transforms for the spectra. I rarely do those, and I want to get the close form solutions for estimating the uncertainty due to not capturing longer periodicity stuff given knowledge of the spectrum, and see what we learn by running regression on the projections for 2000 – now! )

    I need to get someone to do the spectra, or I don’t have enough time. 🙂

  23. You are welcome.

    See this thread

    http://www.climateaudit.org/?p=3086

    Comment 139 by samU, links to some interesting control run stuff.

    If you read on and look at my comments you will find that for ATTRIBUTION studies,
    they did, in fact, screen the models. And they screened on drift in the control runs.

    So, when more precision was required, attribution studies, then a screening was applied.

    Odd: when you do an attribution study you screen the models to throw out the misbehaving
    children, But when you do a projection you dont screen. Odd. nothing more.

  24. The IPPC projections actually exist, and I am falsifying.

    This is just wrong. The IPCC projections for the 7 year period are given by the spread of the model runs which I have shown are ~N(2,2). Since you are not using any information about the spread of the model results in any of your calculations, your ability to ‘falisfy the IPCC projections’ is zero. Don’t you think it would have made a difference if the projection was N(2,0.001) vs. N(2,2)?. Now, you will come back and say something about the central tendency – however, in the statement you made above, there is no reference to that at all, and in no IPCC document is there an explicit statement that the central tendency is the ‘projection’ for all time scales. People are rightly criticising your choice of language because you are being very loose with what you say, and in ways that could not have been better designed to confuse.

    As for how good the ‘model weather’ is, that is a reasonable issue. But there are enough model simulations for you to screen them in any way you choose. I suggested some possible screens previously, but since that gave exactly the same answer, it wasn’t clear what was gained. But try again, set a relevant bar for the models to pass and we can see what difference it makes to the remaining projections.

    PS. All IPCC model output can be downloaded at http://climexp.knmi.nl/

  25. If the argument Gavin is making is based on weather cycles, what about those that are say 60 or 100 or even longer? The 20th century rise in temperature could just as easily be part of the upslope of say a 400 year cycle. Without knowing what those cycles are I can’t see where you can make one argument without accepting the possibility of the other. But then that would make climate models about as useful as stock market predictors.

  26. Gavin:

    This is just wrong. The IPCC projections for the 7 year period are given by the spread of the model runs which I have shown are ~N(2,2).

    No, gavin, it’s not wrong. The central tendency of the projections falls outside the range consistent with real weather.

    Since you are not using any information about the spread of the model results in any of your calculations, your ability to ‘falisfy the IPCC projections’ is zero. Don’t you think it would have made a difference if the projection was N(2,0.001) vs. N(2,2)?.

    (I made the central tendency bold for the sake of reader unfamiliar with the notation.)

    The answer depends on which question one asks about the models.

    If one asks “Question 1”:

    Does the central tendency 2 C/century (in bold) fall inside the range consistent with the earth’s weather since 2001?

    the answer to your question is:
    It makes absolutely no difference whether the IPCC gave a tight range or a wide range for their projections. The only thing that matters is the projection for the central tendency which is 2C/century.”

    This is the question I ask. It is the question I test. It is a question I consider important, and evidently other people also think so.

    If one asks “Question 2”:
    “Does the best fit trend experienced since 2001 fall inside the range projected by the IPCC.”
    Then the model variability for the best fit trend does matter. If the models scatter for this was ±0.001 C/century you would say the earth’s trend does not fall inside the model scatter. If the models scatter was ±2 C/century then you’d say the earth’s trend does fall inside the model scatter.

    This is not the question I test.

    I have inferred you only ask Q2, and only wish to test Q2. Perpetually insisting one must use the ±2C/century variation in 8 year trends predicted by the models makes sense only if you wish to address Q2, and refuse to consider Q1.

    But you know gavin, that’s not really going to work Because, one is a climatologist or a cosmotologist, whether one understand statistics or not, it is perfectly easy for people to understand that Question 1 and Question 2 are entirely different. I’m surprised you seem to not understand this.

    On screening models: I already explained that the standard deviation of 0.001 or 2 makes no difference whatsoever when doing the analysis to find the answer to Question 1 above. None.

    However, if you feel some need to screen, go ahead. After screening the screened central tendency projected by the screend models still don’t fit into the range consistent with the weather that means the screened models fail the hypothesis test that answers Question 1 above.

    I’m puzzled that you don’t understand this. But, in my estimate when you say screening makes no difference that suggests all the models fail the sorts of empirically based tests I apply!

    however, in the statement you made above, there is no reference to that at all,…

    Are you complaining I didn’t put in central tendency in this?

    As most readers know,I am very interested in comparing IPCC projections and/or predictions against observations of weather.

    I’m not sure what your gripe is. As my readers know, I am very interested in comparing more than just the central tendency of the trend. So why would I include “the central tendency” in that particular sentence? My readers and I have been discussing other things to test. The test of the central tendency was an initial one, and it has gotten attention, but that doesn’t mean it’s my only interest.

    So, I don’t know why you think I need to say “central tendency” in this post.

    Thanks for the link! I got a login from the place Steven suggested yesterday. 🙂

    On the issue of the autocorrelation: I want to be sure to use the precisely correct values:
    Does p=0.1 refer to the lag 1 autocorrelation for annual averaged 1 year temperature anomalies? (As opposed to intantanenous or something else.)

    Does sigma=0.1 mean 0.1 C for annual averaged temperatures?

    I always run my tests on monthly values, so I’ll be showing results based on monthly data.

    I assume the answers are “yes” and “yes”, but the answers do affect precisely which graphs etc. I create. Also, are the numbers rounded? Because I can just as easly do sigma=1.1 or 0.9 if that’s the value you believe in.

  27. And they screened on drift in the control runs.

    Well… slow drift up and down would certainly do unwonderful things to the accuracy of lag-1 autocorrelations! That means it would do unwonderful things to ability of models to get the spread in the 8 year trends to match that for the data.

    In fact, model drift would be the sort of thing that prevents models from giving proper short term projections. But,… Model drift has nothing to do with real earth weather. It doesn’t affect our ability to see whether the behavior of the average of the group of models matches the behavior of the earth. (And I mean, model doesn’t affect our ability to test any measurable behavior. The fact that I have been mostly trying to look at central tendencies of the AR4 projections doesn’t mean other statements don’t apply more generally! 🙂 )

  28. lucia,

    Question 1 is a valid question, but I and others have issues with how you define the real-world weather noise (as you already know). Your first and primary estimate of weather noise is using the OLS or C-O trend sd on a single 7-year trend. As we have tried to explain many times and in many ways, this is not a valid approach. Why? I’m glad you asked. Primarily because it excludes the low frequency component of weather noise (longer than ~7 years).

    To put it clearly — there is no reason to expect the uncertainty in a single 7-year trend to be related to the uncertainty between multiple 7-year trends. The former includes only high-frequency weather noise with a period significantly less than 7 years. The latter includes all sources of weather noise. They are different.

    I tried to explain this a couple of months ago, but we got all tied up in my incorrect use of the word “bias”:

    http://rankexploits.com/musings/2008/ipcc-projections-do-falsify-or-are-swedes-tall/all-comments/#comment-2797
    http://rankexploits.com/musings/2008/ipcc-projections-do-falsify-or-are-swedes-tall/all-comments/#comment-2831

    Basically, I looked at a couple of methods for estimating the weather noise on 7-year trends — models and measurements. (I also talked about compensating for weather noise, but let’s not get back into that right now). For measurements, I looked at years without major volcanoes and calculated the difference between the 7-year trend and the underlying trend. The standard deviation was huge — on the order of 2C/century. You had some complaints about the number of independent samples. The complaints were probably valid.

    However, you then proceeded to do a similar analysis but restricted the number of data points to a very small number. Your computed standard deviation using this tiny sample was much smaller. For some reason you seem more comfortable with your results using a tiny number of samples.

  29. Lucia,

    Beware the wordsmiths. One of the goals of political speak is to imply some fact without saying it explicitly.

    The first day that military troops were in Afghanistan, there were two operations conducted. During a press briefing in the following days, one of the reporters asked about rumors of casualties in Afghanistan. The person conducting the briefing said “In (this operation), there were no casualties.” When the question was asked again, the same “In (this operation),” was used prior to the answer. On the face, that answer is true, that operation had no casualties. You may even remember seeing the video of the US troops capturing their objective without opposition. But the other operation… wasn’t so pleasant.

    Explicitly, he never said there weren’t causalities in Afghanistan that day. Though by answering a general question with a specific item, he certainly inferred the lack of casualties that day.

    So long as that inference supports the desired conclusions (no casualties), nothing more need be said. However, if that inference were used to support an undesired conclusion (the people were lied to), then what was explicity said could be used to redirect the scrutiny.

    If I were a policy maker, and looked at the graph, I would likely infer that the black line was a projection. So long as that inference was used to support desired conclusions, nothing more need be said. If it wasn’t, then the wordsmith can fall back to what is (or isn’t) explicity said.

  30. John V:

    Question 1 is a valid question, but I and others have issues with how you define the real-world weather noise (as you already know).

    Yep. I agree that some of those (like you) who recognize the difference between Q1 and Q2 disagree with how one should define real-world weather noise, or, I think more precisely, how to estimate the uncertainties.

    But this disagreement is of a different nature than Gavin’s.

    As for the low frequency component: I have agreed for all we know there may be a lot of energy in there. However, someone who thinks there is a lot of energy in those low frequency components needs to do some work to show it. I’ve done various suggested tests, and the theories of loads of low frequency energy don’t really pan out.

    a) Comparison to the previous volcano free periods suggests the method I’m using pretty much words.
    b) The back of the envelope test for solar suggests that this could nudge the trend into the “not-falsified” region. But only if we feel very certain the effect of the solar on the GMST is of a strength mentioned in the IPCC documents, but disputed by many AND I set the both the phase and periodicity of the cycle at “just the right point” to maximize the effect on the trend.

    If I shift the phase slightly (within bound consistent with measurements of the solar cycle) or shift the periodicity, or lower the magnitude of the effect, the solar cycle can’t explain this!

    On this:

    The former includes only high-frequency weather noise with a period significantly less than 7 years. The latter includes all sources of weather noise. They are different.

    You are partly correct. But have you looked at the distribution of energy in red noise? Relative to white noise, the assumption of AR(1) noise already partly accounts for this issue. It assumes that more energy is in the slow varying components — that’s why given the same total amount inter-annual variation, accounting for the lag 1 autocorrelation widens the computed uncertainty intervals.

    So, I have accounted for the fact that more energy exists in slow varying components than in more rapidly components.

    The questions that remains open is: If we got a real spectrum (during a non volcano period), would its distribution show more or less energy in the slowly varying components that the AR(1) noise with the lag1 autocorrelation we are seeing in the observations?

    This is an open question– but it’s entirely different from Gavin’s issue. He wants to insist on using model variances.

    I am comfortable with these results until someone can actually give some concrete reason why the real earth weather noise really is larger than these show. Currently, people handwave a suggestion, I check it, and the never pann out.

    I’m looking to see whether we can get anything out of the single model runs. Who knows? Maybe it will show something that will make me think my uncertainty intervals should widen. But so far, no, I don’t think so.

  31. lucia,

    We both did comparisons to previous “volcano-free” periods. I excluded only years with major volcanic eruptions, thinking that it’s better to have more data even if some of it is partially contaminated. You excluded years with much smaller volcanic eruptions, as well as years around the “bucket problem”. You were looking for pristine data even if that meant having very few data points. There are pitfalls with both of our approaches.

    The problem is that we got very different results. IIRC, the standard deviation of 7-year trends about the underlying trend (sd7u) using the different approaches was:

    Using all years (including major volcanoes):
    sd7u ~ 2.1C / century

    Excluding only major volcanoes:
    sd7u ~ 1.9C / century

    Excluding major and minor volcanoes plus bucket years:
    sd7u ~ 0.5C / century

    For some reason, major volcanoes have a small effect on sd7u but minor volcanoes have a large effect. It might be true that minor volcanoes are more important than major volcanoes. Alternatively, it could be that your result for sd7u is very uncertain because of the small number of independent samples.

  32. John– Why wouldn’t one exclude the trends from the bucket years? Immediately after we discussed this, Nature ran an article explaining the problem with the adjustments. All the bucket years are outliers, compared to the rest. When I tested the CO and OLS method for trends, all the outliers were in the buckets years, and specifically the ones reported to be dubious.

    I know you do disagree on my decision to throw those out due to the reported uncertainty in the data. Besides that, the type of uncertainty is the type that would quite specifically affect trend calculations. It’s not scatter; it’s a sudden systematic shift!

    Obviously, you perfectly welcome to include that data when you compute things. But, equally obviously, I don’t need to agree with your decision to include the bad data!

    On the variability: I agree that the result of for the variability of 8 year trends based on the previous period involves a short period (the 20s and 30s). But, it a) confirms the current small variability obtained by a standard tests b) we have a phenomenological argument to explain why we would expect the variability of 8 year trend to be smaller when there are no major volcano eruptions and c) while my uncertainty might be too small, the ones Gavin suggest are most definitely too large. His uncertainty intervals are larger than one could justify based on variability for 8 year trends in the entire thermometer record, including volcanos, ordinary measurement uncertainties and the “bucket” period.

    Other equally short periods with volcano eruptions show much larger variabilities in 8 year trends than the periods with no volcanic eruption. Though the statistical precision of the computations is less than one might wish, the fact remains that this is the result we get.

    So, yes, I’m comfortable with the idea that the trend fit is giving me pretty decent uncertainty intervals. I’m am exploring other things, but the results I’m getting are results from some standard methods.

  33. If the goal is to check that the data fall within the range of the calculated numbers, what is used to eliminate models that are clearly out to lunch? It seems to me that some kind of discriminator is needed, otherwise I can always devise a model that will ensure that the data fall within the model range.

  34. lucia,

    The “bucket years” represent one small difference between our analyses. I am fine with discarding them in the same way that years with major volcanoes are discarded. That was not the main point of my comment.

    For the sake of completeness, I removed the years 1945-1955 inclusive and re-calculated sd7u (as defined above):

    Including all years (1901-2007):
    GISTEMP: sd7u = 2.07 C/century
    HadCRUT: sd7u = 2.53 C/century

    Excluding major volcanoes (my analysis):
    GISTEMP: sd7u = 1.88 C/century
    HadCRUT: sd7u = 2.55 C/century

    Excluding major volcanoes and 1945-1955 (my analysis):
    GISTEMP: sd7u = 1.61 C/century
    HadCRUT: sd7u = 2.09 C/century

    Your result: sd7u ~ 0.6 C/century

    In summary, major volcanoes and “bucket years” reduced sd7u for GISTEMP and HadCRUT by less than 23%. Also excluding minor volcanoes reduces sd7u by an additional 60% or more. Does that not seem odd to you?

  35. John–

    In my view, when I wish to exclude volcanos, I don’t want the air to be either filling up or clearing off the dust veil, and I selected my criterion accordingly. I posted a chart showing my criterion and the years. I chose the criterion before I did the calculation, and I posted the discussion of reasoning before I did that.

    Presumably, if we got different answers for the variability, it’s because our criterion for excluding periods are different. I don’t recall the specifics of your method of excluding volcanos and it’s more difficult to find a discussion in comments than in a blog post.

    If you could describe the basis for your criterion clearly, and why you picked it that would help. You can do it in comments, but if you want other people to find it, posting at your blog, possibly with illustrations, that would help us all.

  36. lucia,

    I believe my approach was fairly well explained (for a blog comment) in the comments I linked above. One day I might have time to start my own blog, but for now I’m content to comment on yours. 🙂

    Our criteria for excluding years and methods of calculation are quite similar. Like you, I excluded the years before and after volcanic eruptions. The only real differences are the thresholds we used for excluding volcano years (and thus the number of years remaining in the analysis). I did not originally exclude the “bucket years” because they were not yet known to be an issue, but I am comfortable with excluding them now.

    If the difference between our computed variabilities is interesting to you, you may want to consider a little experiment. Try increasing the volcanic forcing threshold used to eliminate volcano-affected years.

    Currently you have maybe 3 independent 7-year trends from which to define the trend variability (~20 years of data divided by 7-year trends). It’s difficult to trust the standard deviation of only 3 independent samples, particularly since it is so different than other results.

    Coincidentally, if you were to scale the standard deviation using the standard method for a small sample size, you would get

    sd7u,scaled = 0.6 * sqrt(30) / sqrt(3) = 1.9 C/century

    …which is very close to my measurement-based result and Gavin’s model-based result.

    Of course, neither of us wants to make the uncertainty larger. A better approach might be to increase the number of samples by slightly increasing your volcanic forcing threshold. The trade-off between statistical uncertainty due to small sample size and physical uncertainty due to small volcanic forcing may be worthwhile.

  37. John–
    But I know more know about ability of the trend fits to get the correct standard distribution for the trends. I ran the montecarlo. If samples are drawn from the same population, the method of getting the variability in the trend from the 89 points works.

    Changing the volcano threshold doesn’t decrease the uncertainty. If you drop the criterion for “volcano” free to include periods that include those when the dust veil is still clearing, you get more variability. That’s why I consider “volcano-free” periods when the dustveil has cleared so we don’t have any significant rate of change in the forcing due to volcanos.

    That was the point of finding a volcano free period! It is unfortunate that it is small, but this can’t be fixed by permitting years where the forcing due to volcanic activity is changing into the sample!

  38. John and Lucia,

    why not try doing the analysis on land only data and kick the bucket problem.

  39. lucia:
    As I said, there is a trade-off between physical certainty (by eliminating all volcanoes) and statistical certainty (by increasing the number of samples). As I recall, you dug pretty deep to find volcanoes that are not even on most lists. Since you are more comfortable using a small number of pristine samples, you should at least indicate the uncertainty in sd7u due to the small sample size.

    ===
    steven mosher:
    I think Lucia and I are in agreement about removing the “bucket years” from the analysis. IMO, switching to land-only would cost more in areal coverage than would be gained in temporal coverage.

  40. Steve–
    I think that’s a good idea. I’ll do the land only later, using the years I used before the whole Bucket thing came along. Sure we lose spatial correlation, but we have another test with more years. I guess, I’ll think about JohnV’s idea that we can’t use the land issue in the meantime.

    I won’t do it right away because I’m looking at the AR process Gavin suggested I look at. He provides statistics for lag-1 correlation and monthly averages that I think are based on annual average data. I use monthly data for my statistics, so I needed to do a few close form integrals, check them etc.

    If the properties of the monthly data don’t reject, I need to admit that. The first swipe neglecting the issue of averaging causes the properties he described to be excluded by a wide margin. Specifically: The hypothesis that rho=0.1 for annual averages applied was so way out of whack (i.e. falsifies compared to data), that it appeared I could just exclude that.

    And if that’s out of whack in the models, then in my view, we should definitely prefer the observed autocorrelation in measured values. (In my view, there isn’t even question on that. In fact, in my view, the observations have precedence even if rho=0.1 survives. But that’s Gavin’s argument, and so to be fair, if rho=0.1 doesn’t falsify, I’m happy to report how things look under that assumptions.)

    But, after I looked at the quick values, I looked at the effect of averaging on the numbers more carefully, it might just nudge the p=0.1 autocorrelation for annual averages into the not falsified category. (It’s close enough that I need to do a whole bunch of numbers just right.)

    Documenting will require about 3 truly boring posts so people can see where the numbers come from.

    John–
    The purpose of the exercise you initially proposed was to remove the physical uncertainty. Retaining the physical uncertainty to get a precise statistical answer on a problem that contains the physical uncertainty is, in my opinion, pointless.

    I have discussed the uncertainty due to sample size in blog posts. It exists. But, right now, qualitatively, the data support the idea that the volcanic eruptions introduce variabilty in eight year trends. Since the models suggest precisely the same thing, and simple physical arguments suggest precisely the same thing. So, based on phenomenology, we actually understand why we would expect variability in 8 year trends if we include years when we can see the forcing due to aerosols is changing rapidly (relative to other forcings) by simple inspections of graphs of forcing.

    The purpose of the exercise is precisely to exclude those years. Including them to increase the number of samples is like wanting to determining average weight of cats, discovering you only have access to 20 cats, and adding a 10 dogs into the group to get the sample size up. Sure, the sample size is now 30. But you the average weight per animal is no longer the average weight of cats!

  41. lucia,

    Of course volcanic eruptions have an effect on 8-year trends. I’m not saying otherwise. I am not suggesting that all years with volcanic eruptions should be included. You chose a non-zero threshold for volcanic forcing. I chose a higher threshold. I’m only suggesting that the right threshold may be between our initial choices. Like any experiment, it’s all about trade-offs.

    It’s very odd to me that minor volcanoes have a much larger effect than major volcanoes. Is that not odd to you? Do the models and simple physical arguments suggest precisely the same thing?

  42. John–

    Do you mean the two late volcanos? First, they aren’t “minor” eruptions. Those are classed as major eruptions that went stratospheric. Second the first stratopspheric volcano to explode after a period of little activity might well make a distinct difference than one that erupts when there is already a dust veil clearing. When on erupts after another one, it takes a while for the veil to form. If the other one is clearing at the same time, the profile of teh “forcing” doesn’t change as rapidly. Third– did you see the temporal profile of the two late volcanos? One of them had a lower “peak” veil, but kept eruption for a long time. They weren’t measuring optical depth at the time.

    So, no, I would not be exactly stunned if that volcano eruptions made a large difference relative to the peak eruptive power.

    Also, did you look at where those are compared to bucket-jet-inlet confusion? And compare it to the full range of years that are now in the region of “confusion”, due to the various different changes in methods of adjusting all the way from after the war to about the 60s? There were shifts from buckets to jet-inlet, then back to buckets, then back to jet inlets, and a slowly progressive (and possibly eratic) change over time. It’s around the time of those two eruptions!

    So… no. Not surprised.

  43. lucia,

    No — I do not mean the two late volcanoes. My volcanic threshold was based on the estimated forcing from the forcing data file you used for Lumpy (the one from Gavin).

    Just to re-iterate what surprises me (using GISTEMP):

    All years included (1901-2007):
    sd7u = 2.07 C/century
    107 years of data (~15 independent samples)

    Excluding major volcanoes:
    sd7u = 1.88 C/century
    78 years of data (~11 independent samples)

    Also excluding 1945-1955:
    sd7u = 1.61 C/century
    67 years of data (~10 independent samples)

    Also excluding minor volcanoes (your analysis):
    sd7u = 0.6 C/century
    ~20 years of data (~3 independent samples)

    The largest volcanoes increase sd7u from 1.88 to 2.07 C/century, but the smallest volcanoes increase it from 0.6 to 1.6 C/century.

    I’ll drop this for now because I have other things to work on. I will try to find the time to update and explain my analysis better. Maybe I’ll even do it on my own site.

  44. Lucia, just a followup observation reminding in this context that we’ve observed that the monthly AR1 autocorrelations in the tropics are about 0.87 for MSU and RSS data.

  45. SteveM–
    Ok. That makes sense. I haven’t looked at the tropics specifically, but I get higher autocorrelations for MSU and RSS data generally.

Comments are closed.