Carrot Eater’s Challenge: Rate of Rejections when applied to simulations pt. 1.

Note: I noticed a bug, so this post is modified. I used strikeouts and call out some changes in blue. In the conclusions, I did simply delete a paragraph to avoid having the post end with a strikeout.End note

Last night, Carrot Eater suggested a challenge to test whether the method I discussed in yesterday post rejects the null more frequently that claimed when specifically applied to compare trends from climate model simulations to the known average corresponding to that batch of models and to apply the test to years from 2001-to the present. This is a intelligent challenge for many reasons, I’d never done this specific test. So, I agreed to do it. Because yesterday’s post discussed two methods of testing a projection, I’m going to answer Carrot’s question in two posts.

In today’s post, I’ll focus on the simpler analysis: That is one that compares a ‘point’ type prediction for the trend to see if it falls inside the uncertainty intervals estimated to correspond to a single observed time series. That is, I will demonstrate the false positive rate of rejecting any particular claimed trend– for example “0.2 C/decade”, that some group or other claims is associated with the single noisy observation.

In a future post, I’ll extend the analysis to account for the possibility that the projected trend was not a point value, but expressed in some probabilistic form. That is: I will test whether a projection based on the multi-model mean of a large population, estimated from a finite sample of a larger population is consistent with a single observation. (This post will probably appear on Monday or Tuesday.)I’m going to skip this post.

Sub-dividing this way will permit me to isolate the steps and assumptions in the analysis so that people can see where each individual assumption applies. It will also permit individual post to be finite in length.

In today’s post, I am doing this is to present an informal test of Carrot Eater’s hypothesis, which is stated below:

I think that if she took each of the 55 or 58 or whatever runs individually, and repeated the graphic above, the 0.2 orange line would sometimes fall outside the dashed yellow lines. How many times? Well, to make it interesting, I’ll say more than 5% of the individual runs.

I’d also be interested to see this for different time periods – if Lucia started doing this in 2006, then 2001-2006, as well as 2001-2010.

Why do I care? If Lucia says that more than 5% of individual model runs are inconsistent with the models, then there’s something not quite right about the test.

(Note, highlighting mine. )

To test Carrot Eater’s hypothesis, I will test the null hypothesis that the linear trend for an individual model run is equal to 0.2 C/decade, which represents the the nominal trend corresponding to the projections for warming during the first part of this century indicated in the AR4. I will use the exact analytical steps used to create the graph in yesterday’s post, with the sole difference being that I will substitute the time series from a simulation run for the time series corresponding to observational data.

I’ll then repeat this test for 55 individual model simulations and report the “false positive” rate when my test is applied to a projection expressed as a “single point trend” like 0.2 C/decade. (Before proceeding, I want to remind the reader that there several issues in dispute; also people ask questions about meaning of two different types of graphs that I periodically present. To clarify the separate issues, I plan to write two posts, isolating issues.)

I will now explain and perform the analysis as required to fulfill Carrot’To estimate the uncertainties, I first compute the “standard error for the mean trend”, sm,white under the assumption that the residuals are white noise. This method is described in numerous under-graduate textbooks, and is also available pre-coded in excel. The numerical value obtained is sm,white=0.004 C/year.

When residuals are Gaussian white noise, and the residuals from a linear fit are from any two separate runs of the model are statistically independent from each other, the square of this standard error sm,white, is the unbiased estimate of the variance of all least squares trends we would obtain if we generated “N” independent samples of “weather” by running ukmo_hadcm3 and computed a trend for each of these N tests, letting N get infinitely large.

However if (a) the residuals are not Gaussian white noise or (b) the residuals from run “i” are uncorrelated with the residuals from run “b”, the computed standard error will not represent the variance in the population of least square trends for multiple repetition. So, we must test whether they are correct.

We won’t worry about the cross-correelation between series today; it’s not particularly important for tests of trends from 2001-2008. However, inspecting the residuals from climate models and earth observations, we note residuals exhibit large temporal auto-correlations. This means the residuals are not “white”.

It has been suggested that an ARMA(1,1) model be is appropriate to use as a statistical model for the residuals; for the Carrot Eater challenge, it suffices to say that I used this assumption when applying the method he wishes me to test. I discussed the empirical support for this statistical model and how I estimate the “standard error for the mean trend” in the when this assumption applies in two blog posts. You’ll find the basic formulas here (the key formula is number 4) and a tweak I added after running some monte-carlo discussed here.

When plugging and chugging, my method of estimating the uncertainty intervals amounts to computing the ratio of the number of degrees of freedom in the linear fit to the time series to the effective number of degrees of freedom, Neffective. Then, when computing uncertainty itnervals replacing Neffective for N when computing the standard error in the men.

(4)N/Neffective = [1+ (2α – 1)φ] /(1- φ)

When the magnitude of α and φ are known a priori, as they would be if the data tested were generated synthetically, substitution of the function results in appropriate sized uncertainty intervals. When used in a t-test, we obtain the rate of false positives intended by the test.

However, when presented with data, we do not know α and φ a priori so they must be estimated. Ideally, the method of estimating these would return precisely correct value; but this is simply not possible with finite samples of data. Based on the meaning of the values, I estimate the parameters α and φ computed as discussed here. My goal was to use a method that was assymptotically correct as the time series becomes infinitely wrong, if errors are to be made, returns uncertainty intervals that are too large and be fairly simple to implement. The method I devised is:

  • φ is computed as the maximum of either a) measured correlation at lag 1, b) the ratio of the second to the first lagged correlations, c) the ratio of the third to the second lagged correlations. Note however that if 1<φ, I replace φ with 1 and if φ<0, I replace φ with zero. The first is required to prevent the statistical model from suggesting correlation coefficients may ever exceed 1. Note that if we had an infinite sample, we would expect the all three methods (a), (b) and (c) to return the precisely correct value of φ. However, when presented with a finite sample, all three methods return estimates of phi that are biased low; if any of the three are used to estimate φ, this result in a uncertainty hypothesis tests that return too many false positives. Taking the maximum of the three possible values corrects for this.
  • α is computed as the ratio of measured first lagged correlation and φ.

Using this method, I found the sm,ARMA=0.009 C/year and a reduced number of degrees of freedom of 27.535. For a t-tail test at 95% confidence, this results in a critical value of 2.052 for a t-test, and the 95% confidence intervals are dm±,arma=±0.018 C/year. Upper and lower confidence intervals were computed by adding this uncertainty to the least squares trend of 0.037 C/year and are indicated graphically with dashed yellow lines in the figure reproduced below:

Figure 1: Temperature anomalies from one simulation of a climate model.

s challenge.

Plug and Chug

I will now test whether a single run from ukmo_hadcm3 and compute the least square trend; this data is show in fuschia below. A least squares trend of m=0.037 C/year was computed using EXCEL’s graphical package is show with a black dashed line; the identical value computed using LINEST is shown with a solid yellow line:

Figure 1: Temperature anomalies from one simulation of a climate model.

Hypothesis test

We now return to the hypothesis test, to test a null hypothesis that this run of ukmo_hadcm3 is consistent with trend of 0.2 C/decade, but differs owing to the “weather noise.” To test this, I compare 0.2 C/decade to the upper and lower confidence intervals, and note it falls inside the confidence intervals. Graphically, the trend of 0.2C/decade is indicated by the orange line; it falls inside the dashed yellow lines indicating confidence intervals.

This means that, based on the assumptions of this test, the trend of 0.37 C/decade is not inconsistent with a hypothesis that the true trend is 0.2C/decade. So, if 0.2 C/decade was our null hypothesis, we’d continue to accept that as true.

How about the other 55 model runs

Recall that Carrot Eater’s hypothesis did not relate to only one run. His hypothesis was that if I applied this test to all 55 model runs in my ensemble, I would reject the multi-model mean at a rate greater than 5%.

To test this, I repeated the test using all 55 model runs. To create a graph where the viewer only need to see whether the whiskers span “0”, I subtracted the trend corresponding to the null hypothesis (i.e. 0.02 C/year) from the trend and upper and lower uncertainty intervals for an individual simulation and plotted those bar and whiskers. This is shown below:


Figure 2: Hypothesis test of 2 C/decade applied to 55 simulations. Image replaced March 20.

To interpret this graph: When the 0 C/year does not fall inside the bar and whiskers uncertainty intervals, we should reject the null hypothesis. As you can see, the null hypothesis is almost never rejected. When applied to simulations, using a test trend of 0.02 C/year, my method resulted in 1.8% rejections, which is less than 5% the test is required to reject. That is, this particular test indicates that far from rejecting too often, the test I devised rejects too infrequently. The null hypothesis rejected in 13% of the cases.

Before I complete this post, I recognize that some will note that I did not test the specific point value of the trend corresponding to the multi-model mean for the 55 simulations. The multi-model mean for this period happens to be somewhat larger than 0.2C/decade. The test was repeated for that specific value below:

Figure 3: Test of multi-model mean trend. Image replaced March 20.

The false positive rate for this test was 1.8%, which is less than the upper bound of 5% when one claims a 95% confidence in rejecting a null using a hypothesis test. Applying the test to compare the individual simulation means to the multi-model mean trend for the batch results in a lower rejection rate– roughly 9%.

Why is this lower? Because in this case, we are testing a ‘null’ hypothesis that is closer to the sample mean for the models in the group. Why isn’t the rejection rate 5% or less. There are two possible reasons:

  1. The assumption the residuals to linear fits are ARMA(1,1) may be a poor one for at least some models.
  2. Although the multi-model mean trend for this sample is 0.24C/decade, this does not mean that the model mean trend for any individual model is 0.24C/decade. Some models may have higher or lower trends; so a portion of the rejections may be false positive which should be less than 5% and a portion consists of true positives. The sum can be larger than 5%. How much? That depends on the actual spread in model mean trends, which we do not know.

To address question 2, I’ll be posting an analysis to determine whether the trends from the collection of 22 models are drawn from a collection with equal median trends. The results will show that the trends for in 22 models are not drawn from a collection with equal median. This means that we should expect the rate of rejection for Carrot’s test should exceed 5% if the hypothesis test is properly constituted.

Summary

Carrot Eater speculated my hypothesis would exhibit too many false rejections. That is, when I claimed the method returned 5% or fewer false positives, it actually returned more. It did. However, could be explained by showing that the runs from each model are drawn from models manifesting different “underlying means”. In a follow on post, I’ll explain a test that diagnoses whether runs from each of 22 models share a common median trend.

Spreadsheet:Anoms

196 thoughts on “Carrot Eater’s Challenge: Rate of Rejections when applied to simulations pt. 1.”

  1. Thanks for spending time, following up on my suggestion. Thank you also for fleshing out the method a bit. I hope you will have found the exercise useful.

    So at a glance, I was wrong. Such is life.

    Some of these uncertainty intervals are rather wider than I was expecting, when I made my hunch based off looking at the RC histogram. That’s certainly one way to un-do a hunch. But so long as these uncertainty intervals are computed in a comparable way to what you do with the instrument record, I suppose it’s fair test of the test.

    I’m a little puzzled by Fig 2 and Fig 3. Why are so many of the points above the red zero line? Shouldn’t they be distributed around the line a bit more evenly? I must be missing something.

    While you’ve got the numbers sitting there in Fig 2, can perhaps we see them expressed as a histogram of the trends for this period, a la the RC histogram? Forget the uncertainty bars for that. Just curious how tight it is, over this time period. Meaning, neglecting the uncertainty, I’d like an easier visual of the model variability.

  2. I’m a little puzzled by Fig 2 and Fig 3. Why are so many of the points above the red zero line? Shouldn’t they be distributed around the line a bit more evenly? I must be missing something.

    ‘Cuz I blundered and forgot to subtract 2 for the figure. (The rejections are based on d*– so that’s ok.)

    I’m uploading new graphs right now. (I noticed just after i put the graph in comments.)

  3. I hope you will have found the exercise useful.
    Actually… yes. It is useful. I’d looked at things other ways which made me 99.99% certain I knew the answer I would get to your question. But doing it this way will probably make the main point clearer to people whose first assumption is that my method must reject to often. Your suggested test is designed to reveal problems if any one of a number of things goes wrong. I was thinking of showing graphs with synthetic data for each case and it just get… long…

    While you’ve got the numbers sitting there in Fig 2, can perhaps we see them expressed as a histogram of the trends for this period, a la the RC histogram

    The bars can be turned into an RC histogram. It would look pretty much the same. I have no argument with their histogram– just with what it means we must do if we want to test whether or not the multi-model mean is biased.

    One reason the estimate of the st. dev from a time series can differ from the histogram for the RC graph is that the spread of the RC graph is the combination of “weather noise” and “difference in trends from individual models”.

    Also, the reason the spread for the RC graph can be larger than for the value estimated based on earth’s weather is that the model weather could, hypothetically be larger or smaller than the earth’s weather.

  4. One reason the estimate of the st. dev from a time series can differ from the histogram for the RC graph is that the spread of the RC graph is the combination of “weather noise” and “difference in trends from individual models”.

    To separate out those two, you’d need more runs from each model. That said, I don’t know if you can tell a high sensitivity model from a low sensitivity model, over 8 years. You would see which models are noisy, and which aren’t.

    If the histogram looks about the same, then it really does seem to come down to the uncertainty intervals you put on each run. If those are wide enough, your test won’t reject. And this is exactly where I guessed wrong, I think. The intervals for the GISS data are such that it sometimes rejects; but not necessarily for all the model runs.

    Could you append GISS data to the bar/whisker graphs?

  5. Carrot–
    By the way, this is the result of the “carrot test” if I assume the noise is AR(1) i.e. “red” and use the red correction recommended in Lee & Lund (which I started using very early on.)

    There are good reasons to believe the earth’s noise is not “red”. But, oddly, the method happens to return close to the correct level of false positives for model simulations. (I believe it does so for the wrong reason. The “weather noise” in some models does look more-or-less ARMA(1,1). For those models assuming AR(1) can make error bars that are too small. . But some of the model have truly hilarious “weather noise”, and for those cases, both AR1 and ARMA make error bars that are too large. But if we apply “the carrot test” using AR1 and testing at 10 years, this doesn’t look too bad! )

  6. To separate out those two, you’d need more runs from each model.

    Technically you would need many runs to detect the issue if it was small. But… but one of the models with ridiculous noise has about 5 runs. And… and… you’d have to see it. It’s amazing.

    Blog climate wars being what they are, some of my readers would love me to say the model with insane weather is Gavin’s. But it’s not. 🙂 I haven’t noticed anything obviously peculiar about model E. I’ll find the really wild model for you. (Carrick wants to know too.)

    Could you append GISS data to the bar/whisker graphs?

    Sure. I’m off to the gym so I don’t get fatter than I already am. I’ll do it afterwards.

    That said, I don’t know if you can tell a high sensitivity model from a low sensitivity model, over 8 years.

    I don’t think you can either.

    In fact, what the test cannot reveal is the cause of the discrepancy. Is it because modeler left the 11 year solar cycle out of their projections. (Even those who included it in the 20th century forcings did that. It was odd decision because even if they couldn’t predict the long solar minimum, they could at least have had the forcing drop for a little while instead of having the sun stuck at a solar max. Assuming the did even that– and I’m not sure. It might just be stuck at “average” in some runs.) Is the discrepancy due to the asian brown cloud? Models have too high sensitivity? Wrong time constant for the oceans? Showing the correct trend in the 20th century for the wrong reason?

    Lots of people assume I suggest the reason for the discrepancy is model are too sensitive. That may not be so and I don’t claim it. I don’t know the reason.

  7. Right, there’s no way you can make any statements from this exercise, with regards to exact forcings or sensitivity.

    I’ve been working with the simplifying assumption that the forcings are good enough, and sensitivity won’t show up this quickly, so it’s just how much weather the model generates.

    Mainly, I’m curious to see the plots for the runs with the huge uncertainty intervals, since I’m pretty sure it’s the uncertainty intervals that did me in.

    Basically, from the histogram, I’d have said that the observed trend falls well within the distribution of model trends. Which is probably does; it’s just that the interval on GISS is perhaps narrower than the intervals on the further outlying model runs.

  8. The reason they dont throw retarded models out seems largely political. Rules of the IPCC seem to imply that there is a democracy of models ( phrase in the climategate files) I’m not sure whether this means:

    1. If a member state sumbits results they must be accepted.
    2. All runs from models are given equal weight regardless of
    hindcast skill.
    3. We know that in attribution studies they do impose certain
    restrictions on using models ( drift restrictions)

    the other issue I have is averaging all the runs together and calling the N = 55.

    A bad model with a low run time could contribute all sorts of
    bogus data to the pool. Some of the models ( I recall) just submitted one run. I think it makes way more sense for member nations to just adopt the best performing models and get more runs with the same models.

  9. Carrot

    Basically, from the histogram, I’d have said that the observed trend falls well within the distribution of model trends.

    It probably does. And if all we want to find out is whether the trend fell inside the full range of all weather in all models, that graph shows us that. That’s not the question I ask. I ask a question: is the multi-model mean biased? (I ask this because the IPCC uses that for projections. )

    If we wish to detect bias in the multi-model mean there is a big difficulty with using that observation that the observed trend falls inside the range of all weather in all models as evidence of anything. The reasons are that when you do that, you implicietly assume:

    That the distribution in trends across models actually represents the range of trends that would arise as a result of the earth’s internal variability.

    But that assumption is very unlikely to be correct because
    a) The models have different trends over the long haul. This fact and it is not disputed. These different trends contribute to the spread of the distribution of trend in “all weather in all models”. The triangle inequality assures us that the fact that this non-weather feature of the collection of models exists will tend to make the distributions of trends wider than what we expect for the earth’s weather. That is to say, if the models all got the correct “internal variability”, but had different trends, we would expect the spread in Gavin’s graph to be wider that we would see on earth. (It’s worth nothing there was a recent paper that actually discusses the ratio of the spread due to the model spread and the ratio due to “weather”. I’d have to dig it up. But the fact is, the spread in that histogram is not due solely to internal variability even in models. There is a contribution due to the very real difference in mean trends across models.)

    b) Even if every model predicted the same trend, the internal variability of the models can differ from that of the earth. It could be higher or lower. But as long as the internal variability (weather) of the models does not match that of the earth, that spread of trends in that graph will differ from the spread you expect for “weather”.

    c) Meanwhile, we can actually estimate the spread of trends consistent with the earth’s “weather” based on the observations.

    Mainly, I’m curious to see the plots for the runs with the huge uncertainty intervals, since I’m pretty sure it’s the uncertainty intervals that did me in.

    I’ll show some. Some of that large uncertainty intervals is due to the short time, so I’m not sure the ones with the largest uncertainty intervals will look the ‘weirdest’. They just have very high values of φ and α But I can show those and also show the really ‘weird’ ones.

  10. The reason they dont throw retarded models out seems largely political.

    My impression as well. The authors raise a fuss if their’s isn’t included in the mix.

  11. Mike Z
    That’s one of them.
    It does have very large uncertainty intervals computed under the ARMA assumption.

  12. (It’s worth nothing there was a recent paper that actually discusses the ratio of the spread due to the model spread and the ratio due to “weather”.

    I’m pretty sure I’ve read the paper in question, or at least one like it. We don’t have the information here to assess that.

    But in the end, all that’s happening here is this:

    GISS observation gives trend x over y years, and the uncertainty intervals just barely brush against 0.2 C/decade.

    Some model runs also give trend x over y years, but for whatever reason, you’re able to stretch the uncertainty intervals across 0.2 C/decade.

    I don’t think this is nearly as significant as you do. I see all the things you’re saying; I just don’t think it’s especially meaningful.

    But it’s the way you set up your test, and now we know what your test does.

  13. I don’t think this is nearly as significant as you do. I see all the things you’re saying; I just don’t think it’s especially meaningful.

    I’m not sure how significant you think I think this specific result is. Based on the specific result since 1980 only, done with data we have now, we’d decree the models are just inside the confidence interval consistent with earth data. Do you disagree with this?

    It’s the comparisons with earlier years and not using “0.2 C/decade’, but rather the trends models projected based on those years that are saying models are wrong (with GISS). The GISS results based on 2001 also say the models are ok.

    With Hadley and NOAA, we get rejections for more recent years too. But the

    GISS

    result form 1980 alone? If that’s the only graph I showed, I don’t think you’d be concerned. If I only showed the one from 2001 with only GISS, I also wouldn’t say that was much evidence against the models. (Although some of the people here seeing that for short times my 5% claimed false positive is only 2% might point that out and mention that those uncertainty intervals ought to be smaller.)

    Right now with current data, it’s the comparisons with Hadley and NOAA that are making the models look bad. The models don’t fail for all possible start years,– but it’s a huge swath of years.

    I’m going to add NOAA, and GISS to the bar and whiskers graphs too.

  14. The reason I shrug at your test is perhaps this: there are model runs with trends further from 0.2 C/decade than the observations. Yet I daresay those individual runs end up somewhere closer to that mean (than whereever they are now), over the course of 30 years. Does the fact that their (sometimes ridiculous) uncertainty intervals after 9 years span 0.2 C/decade tell you anything about that longer term result?

    I don’t think so.

  15. Carrot Eater:

    Does the fact that their (sometimes ridiculous) uncertainty intervals after 9 years span 0.2 C/decade tell you anything about that longer term result?

    No of course not.

    But it still is diagnostic of the models in other respects. Not everything the models spit out is directly related to long term temperature trend.

    How well it handles short-term fluctuations may also tell you something about how well it does on regional scale climate changes. In a way, knowing the regional impact of climate change is almost more important than knowing that there is an overall temperature trend.

  16. Carrot–
    There are models that have 30 year trends averaged over many runs that differ from the model mean. Some are higher, some are lower. Why shouldn’t those be correct?

    If the model mean can be shown to be biased, only suggests that the models in the lower range might be right. Biased by 40% wouldn’t mean the trend is 0C/century– it might be 1.5 C/century. That is still useful information.

    It’s also worth nothing that because models do have different long term trend, the result of a Carrot test should have larger α rejections. This is because the correct way to test the null is to test individual runs against the converged result for that model.

  17. Lucia:

    I’m just saying the 30 year trend for any individual run from any given model is not going to be, say, 0.

    Yeah, they have different long term trends. And it’ll be a while before we know which is doing better, in that respect. And it’s something of a tunnel vision with that, anyway; one could end up looking more ‘right’ for wrong reasons.
    As Carrick notes there are other things of interest here as well.

  18. Lucia,

    Perhaps I missed something, but it seems to me that the efforts here are directed at correcting the variance of the OLS trend estimator, instead of using a more efficient trend estimator in the presence of serial correlation.

    C-O estimator is not more efficient than OLS when the regressor is trendy, as time is.
    http://books.google.de/books?id=yKQN62Itmg0C&pg=PA193&lpg=PA193&dq=cochrane-orcutt+efficiency&source=bl&ots=cAPWIs_Lry&sig=3UUIrXjRs2OSpYkKgogVf9TWkac&hl=en&ei=9bqiS8LLAomYnQO4z-WmCw&sa=X&oi=book_result&ct=result&resnum=2&ved=0CA0Q6AEwATgK#v=onepage&q=cochrane-orcutt%20efficiency&f=false

    I have learned quite a lot from this paper. Perhaps it may be useful here as well
    http://ideas.repec.org/p/max/cprwps/1.html

  19. Lucia,
    “As in a color that does not rhyme with my first name.”

    Lucia… fuschia. Rhymes perfectly; or perhaps you pronounce you name differently from what I imagine. Is it Luci’a? or Lu’cia?

  20. I had a small brainwave this morning (yes, most of mine are small…)

    I was thinking about the whole 0.2C/decade thing, because it doesn’t seem right to me. The IPCC claims that the 20 year average over the years 2011-2030 would be 0.69C higher than the average of 1980-1999. Now this may equate to about 0.2C/decade, but it doesn’t mean that an appropriate test is to test for 0.2C/decade. It has to do with the order the results appear, an average is not affected by the order you get data, whereas a trend is. And if you use an arbitrarily short time period you would be more likely to get a ‘weird’ trend. For example the trend over the last 10 years isn’t very high, but the average of the last 10 years compared to the previous 10 is about +0.2C. So on that measure the IPCC projection is largely correct.

    I don’t know how to go any further than that though, I don’t know what an appropriate test would be.

  21. Re: carrot eater (Mar 18 17:36),
    BTW–I think I did have a boo-boo in the spreadsheet and it “mattered”. Right now, I think rejection on this is going to be over 5%. Rather than posting the “fixed” and then another “fixed” and then yet another “fixed”, I’m having this be blank until I can double check again. (Anyway, embarrassing to have a boo-boo! That’s what I get for wanting to post quickly!)

  22. Re: SteveF (Mar 18 18:13),

    I know of three correct ways to pronounce the name spelled “Lucia”. I grew up correcting people who wanted to rhyme it with fuschia. I was named for my Cuban grandmother and born in El Salvador. It’s Luci’a with a soft “c”.

  23. eduardo–
    I’m not using CO. I’m using OLS. Yes, the efforts are at correcting the estimate of the variance of the OLS trend estimated from the sample.

    Thanks for the links.

  24. lucia (Comment#38583),
    ” I was named for my Cuban grandmother and born in El Salvador.”
    .
    Too bad, it means we probably can’t nominate you for POTUS (at least not without a lot of right-wing screaming!).
    .
    Latin names are funny. In Brazil, it would almost always be Lu’cia (with a soft ‘c’), which is odd, since all normal Portuguese words have the accent on the penultimate syllable, just as you pronounce your name.
    .
    ¿Pero una Latina que vive en el frío de Chicago? ¡Una qué sorpresa!

  25. And if you use an arbitrarily short time period you would be more likely to get a ‘weird’ trend.

    Yes. This is reflected in larger uncertainty intervals which we generally get when we use shorter time periods. So, we need to see a bigger difference between the predicted trend and the observation to decree the difference statistically significant.

    For example the trend over the last 10 years isn’t very high, but the average of the last 10 years compared to the previous 10 is about +0.2C.

    The trend since 1980 is lower than projected. With NOAA and Hadley the difference is statistically significant. With GISS it’s not.

  26. Well… SteveF…

    I ask why I live in cold, cold Chicago all the time too!

    The more common question people ask when they who look my family members and read the varieties of last names have is how the heck did various family member end up in Cuba, El Salvador, and Guatemala!

  27. Lucia,
    “Lots of people assume I suggest the reason for the discrepancy is model are too sensitive. That may not be so and I don’t claim it. I don’t know the reason.”

    I am a bit surprised by this comment. Sensitivity near the low end of the IPCC range is the simplest explanation for the apparent discrepancy between models and temperature measurements. The discrepancy between the satellite measured global lower troposphere trend and the models’ projected lower troposphere trends (as Chad has shown), is quite significant, which suggests that many of the models may have tropospheric heat transport not quite right. The simplest explanation seems to be that the models are too sensitive… Occam’s razor an all that…

  28. SteveF–
    I agree that low sensitivity is the simplest explanation. But this tests doesn’t discriminate between different causes. But that doesn’t mean I know the cause. Wrong forcings is a not very complicated explanation. Solar having more effect than most people believe is a simple explanation. None of these are complicated– the question is, given other information, which is true?

  29. Lucia,

    ” But this tests doesn’t discriminate between different causes.”

    True enough. All the more reason to get into ARGO data analysis… therein lies the key.

  30. Lucia: Yeah, take your time and double-check.

    Rick: This winter has been warm over Greenland and parts of the Arctic, and cold over parts of the US. We’ve known that. The thermometers on the ground show it; the satellites show it. So what?

  31. Carrot–
    I’m pretty sure you will now be happy with the outcome. I’m getting 9% rejections.
    However, I also think we can’t conclude much based on this!

    I’m not only double checking, this sheet– but I’m checking an old Kruskal-wallis test to which tests whether the runs from individual models are drawn from different populations. (That is, the mean trend from GISSE is different from ECHAM etc. )

    That result is saying “yes they are different”. I don’t think this is controversial, but the test shows they are different enough to discriminate. Since my argument for why getting more than 5% rejections is based on these being different, I also want to see if that sheet is correct or incorrect. So… it will take a while.

    I’m have an odd schedule compared to most people. I work part time, so I blog more during the week and less on weekends. So, that means … probably Monday!

    (I mention this proactively so you don’t need to wonder if something might magically appear! )

  32. Steve— many eyes do NOT make light work of finding mistakes in complicated spreadsheet that also contain side calculations. I’m pretty sure I found the error, and I have 9% rejections. Part of the time issue is that I want to:
    1) Show the correct graphs.
    2) write a new post discussing.
    3) Check a few things because if I’m going to speculate why it’s ok to get 9% rejections, I want to make sure that’s not based on a fantasy idea that the model means are different, but that I can actually back up that they are with another calculation. (I don’t mind revealing the demonstratoin in a separate post, I just want to make sure my impression that I’ve shown they are is not also a mistake. I did a Kruskal Wallis test long ago but didn’t post. Now… I want to make sure that’s ok too!)

  33. It’s not a matter of being happy or sad; it’s a matter of figuring out what your test is actually doing.

    I was going to suggest some simple ways of comparing the models. Just take all the 9-year trends from all 50-something runs, and then see how they correlate with the long-term trends in all those runs. Or with the published sensitivity of the model.

    Basically, are the lowest short-term trends in the ensemble more likely to be coming from the models with lower sensitivity?

    It’s already obvious that the models are generating some of their own weather noise (just look at the RC histogram, and how it narrows over time), and this explains a lot of the spread over 9 years. But these simple measures would help tell us whether we can really see the effect of model sensitivity over 9 years.

    It’s a shame some modeling groups only submitted one run.

  34. Carrot–

    It’s not a matter of being happy or sad; it’s a matter of figuring out what your test is actually doing.

    Agreed actually. 🙂

    But people do often like to see their predictions pan out.

    I was going to suggest some simple ways of comparing the models. Just take all the 9-year trends from all 50-something runs, and then see how they correlate with the long-term trends in all those runs. Or with the published sensitivity of the model.

    Of course the models generate weather noise. The have all kinds of weather noise– and the same Kruskal Wallis test can be applied to tell us if the weather noise in all models is drawn from a population with the same noise property.

    Yes. It is a shame some modeling groups only submitted one run. A fair number on submitted two. That makes certain tests impossible to perform.

    Before concocting a test and especially doing it, I usually like to know what specific question we think it will answer. Then, if possible, I like to find a formal test contained in a statistics book.

    The question I want to ask is: Are the trends for individual models all drawn from the same population? (If the answer is yes, then the difference in trends runs from two different models would be deemed “noise”. If no, then at least part of that difference is a manifestation of different biases in models.) Kruskal-Wallis answers that question.

    What specific question do you want the test to answer? There will always be some correlation between the short term trend in of a run and the long term trend of that same model. After all, the short trend is a subset of the model. So, what are you trying to ask?

    Basically, are the lowest short-term trends in the ensemble more likely to be coming from the models with lower sensitivity?

    Maybe the will; maybe they won’t. But shouldn’t we have a formal answer to whether or not the difference in the short or long term trend are statistically significant? That’s the Kruskall Wallis test.

    I’d be interested in someone doing the correlation test – but only after the trends have been shown distinct with the Kruskall Wallis test. If the difference between trends is real, and then examining the correlation makes sense. If the difference is real, and there is a statistically significant correlation, that would point to sensitivity being the issue. Otherwise, it could be something else.

    It’s already obvious that the models are generating some of their own weather noise (just look at the RC histogram, and how it narrows over time), and this explains a lot of the spread over 9 years. But these simple measures would help tell us whether we can really see the effect of model sensitivity over 9 years.

  35. The question I might pose is this (not asking you to do it; I could do it myself):

    Take a model with a climate sensitivity that’s higher than the model mean.

    Then compute all possible 9-year trends.

    In how many of those cases did our selected model give a trend that’s lower than the multi-model multi-run mean trend over that 9 year period?

    This percentage should be less than 50%. I’m sort of curious, how much less.

    What would this mean, formally? Probably nothing. But it’s the sort of thing I’d calculate, to get a feel for what’s going on.

    Alternately, I would make a bunch of RC-style histograms, and color code the bar segments by model sensitivity. Say, if a model has low sensitivity, then that model’s contribution to the histogram would be in blue. Then, the histograms would tend to be blue-heavy on one side, red-heavy on the other. But one could see how mixed up it was.

    Again, what would it mean formally? Nothing. As you can tell, I play around with data in different ways, before doing a formal statistical test. That’s just me.

  36. Oh… I play around with data too. But I guess the informal tests I think of have to do with what I’m trying to learn about the data.

    If you want to see if the models with higher sensitivities show higher warming, why not just do a correlation of the change in temperature over the 21st century to the sensitivity? Isn’t that an easier way to learn the same thing?

  37. If you want to see if the models with higher sensitivities show higher warming,

    For a fixed set of forcings, and a very long time period, they will do this by definition. Exactly by definition.

    It’s more the relationship between short-term and long-term trends I’m interested in.

    A high-sensitivity model will absolutely give higher 100 year trends than a low one. It’s what happens down in the 9 year span you’re looking at that interests me. Mainly because, you’re looking at 9 year spans.

    When did you start this line of inquiry, by the way? Did you start with 6 or 7 year trends?

  38. Carrot I was thinking something different.

    If you have 55 runs that start at some date ( say 1900 ) would it make any sense to screen runs like this:

    At 1910 you rank models by their decadal error. you then check their error at the 30 year mark. Does decadal accuracy correlate with 30 year accuracy? just thinking out loud.

    I mean seriously If I were modelling I’d be really tempted to abort runs that were way off after 30 years.. would that be cherry picking?

    I’m also wondering if you can ever get decadal accuracy.

    My understanding is that models are spun up to an equilbrium state ( based on assumed forcings ) However good your simulation is you are probably going to miss predict the first el nino. You might get the frequency of these events down right, but from the very nature of how things are set up you almost have to be out of phase with reality. in other words that equilbrium starting point is pretty much a convient fiction.

  39. Carrot–
    Around jan 2008. I’d picked 2001 as the start date based on publication of the SRES before doing any test. That’s why 2001 always sticks around in the test.

    The first test I did said “reject”.

    Since that time, I’ve been incorporating suggestions about treatment of uncertainty intervals I tested with different start years etc. When the model runs became available, I switched from testing 0.2C/decade as a point to testing using the model spread. When Santer paper came out, I used the method in that paper. (I now use ARMA instead of AR1 which give wider uncertainty intervals.)

    The thing is, no modification that makes sense with respect to testing for bias in the mean makes the rejections really go away. At most, rejections with GISS for shorter periods go away. But the rejections with Hadley and NOAA just stick around.

    Gavin’s method doesn’t make this go away because it’s not a test for bias in the mean. It just isn’t.

  40. mosher: problem with that is the hindcasting runs don’t all use the ssame forcings

  41. CE: arrg.

    Forgot about that. I do think the community would benefit
    from a standard set of forcings. especially on the solar, I think some of us were kinda puzzled by the various approaches taken there. Not that it would make AGW false, but after reading though ModelE ( not the whole model, leif just asked me about solar so I looked, or maybe gavin answered..anyway ) and seeing the 11 year cycle put in I thought “neat, that’s easy”, although the climate might be insensitive to the minor fluctuations in TSI its nice to put that kind of detail in, no huge computational load, etc etc” While others just flatline the figure. I’d rather say we modelled it and it made no difference than argue that we didnt model it because it should make no difference.

  42. Steve

    My understanding is that models are spun up to an equilbrium state ( based on assumed forcings ) However good your simulation is you are probably going to miss predict the first el nino. You might get the frequency of these events down right, but from the very nature of how things are set up you almost have to be out of phase with reality. in other words that equilbrium starting point is pretty much a convient fiction.

    mis-predict is the wrong word because the entire process is designed to make no attempt to synchronize the el nino’s with the earth.

    The equilibrium idea is pseudo-equilibrium. The idea is not much different from defining steady state for averaged quantities in turbulent pipe flow. If you measure velocity at any point, it’s not steady state. But if you set up a pipe flow, you can reach a stationary point where the mean velocity, turbulence intensity and all velocity moments are no longer functions of time.

    The idea of climate models is they are predicting averages of the sort familiar to any engineer who often only cares about the average of a property– or at most some fairly simple statistics about the system.

  43. Re: steven mosher (Mar 19 14:58),

    Forgot about that. I do think the community would benefit from a standard set of forcings. especially on the solar,

    Who is going to enforce it? Every single modeling group with their own climate model is going to want to use the set of forcings they consider most sound.

    Also, there are positive and negatives to everyone using the same forcing. What if they latch onto the wrong forcing for solar?

    On the one hand, people using different forcings within the range supported by existing data is good because the world gets access to predictions based on the full range of that might have reasonable scientific support.

    On the other hand, modelers assessment of which forcings are most believable can be influenced by which forcings make their model hindcasts best fit available data for things like global surface temperature. (This is not necessarily intentional. But, this sort of two-way feedback happens in nearly all modeling to some extent.)

  44. well, that’s the point of the defined scenarios. put everybody on a more equal footing on forcings.

    solar – put the cycle in, don’t put it in; what is hard to put in is long term changes. nobody can predict those worth a darn, and before the satellite era, they’re harder to pin down anyway. proxy time.

    because it’s not a test for bias in the mean. It just isn’t.

    You keep saying that. And I just say this just isn’t something you can meaningfully test for, in the short term.

  45. Carrot

    You keep saying that. And I just say this just isn’t something you can meaningfully test for, in the short term.

    I know you keep saying this. But either a) you provide no reason, or b)the reason you provide is bizzare and not based on any statistical principle.

  46. well, that’s the point of the defined scenarios. put everybody on a more equal footing on forcings.

    Yes. But it’s only applied to the future. The defined scenarios aren’t applied to the past.

  47. I know you keep saying this. But either a) you provide no reason, or b)the reason you provide is bizzare and not based on any statistical principle.

    Repeat your exercise with 1 year periods.

  48. Here:

    The trend is positive and exceed 0.2 C/century. However, the uncertainty intervals explode, and 0.2 C/decade lies well inside the uncertainty intervals. If it’s consider the null, a test starting 12 months ago would fail to reject it. There is insuficient evidence to reject no warming.

    The problem with short time tests is that the false negative rate gets high. So we should not take this “fail to reject” as evidence that the true mean is higher than 0.2C/decade for the same reason we can’t take the failure to reject 0 C/decade since 1995 to mean no warming.

    This has nothing to do with how we interpret rejections at α=5%.

    Out of curiosity, what did you think we’d see?

  49. CE. shorter periods simply means bigger CIs

    You can still test with shorter periods all it means
    is that you have wider CIs and lower power.

    Now if you want to say something analytical like:

    with a system that has natural cycles that look like this,
    and shot noise ( volcano kinda) that looks like this, and
    weather noise that looks like this, you have to look
    for XYZ months to find a climate signal of the following
    size with the following confidence.

    I mean essentially that is what you are saying, that the climate signal is so weak relative to the other signals that you have to
    wait for a longer period of time ( more data ) to have a test
    with the appropriate POWER to pull that signal out.

    Am I misconstrueing what you think?

  50. Given that I haven’t put any time into how you’re calculating your uncertainty intervals yet, I don’t know.

    The whole thing is coming down to “however Lucia is calculating these intervals, do they overlap with 0.2 C/decade”

    Until I get a handle on what all you’re doing, I’m not entirely sure what that means. I don’t know if I’m supposed to expect them to overlap 0.2 C/decade, or not.

  51. Carrot–
    All other things being equal, the shorter the time intervals, the wider the uncertainty intervals become. This feature is exactly the same for ARMA(1,1) residuals as for the case where you estimate the uncertainty intervals for white noise residuals.

  52. Carrot–
    It just occurred to me that I can’t clarify for you because I don’t know where you are having trouble with my method. For example, I don’t know if you are unfamiliar with 1) the meaning of the standard error for the trend when the residuals are white, 2) if you are worried about the ARMA(1,1) assumption, 3) if you don’t understand why I think the different methods are required test Q1 and Q2 discussed in the previous comments thread or any other possible thing.

    If the problem is not understanding (1), it’s straightforward enough for me to provide monte-carlo examples to show how that standard error related to the standard devioton of repeat models. On the other hand, if you understand that, and the problem is (2) or (3) or something else, that little ‘tutorial’ type post would be useless and would come off as condescending. ( After all, if you know that, it seems sort of insulting to suggest I assume you don’t!)

    So, if you’ve looked at stuff enough to know just which bits you do get and which you don’t, letting me know could help me communicate why I think what I think. (And if it turns out I’m wrong…well so be it. )

  53. All other things being equal, the shorter the time intervals, the wider the uncertainty intervals become.

    That’s a general truth; goes without saying. It’s the particulars of what your test is doing on these data that I’m curious to see.

    Hence, the ‘carrot test’.

    Though, on this note, I suggest a graph that would probably be easy to make, with your spreadsheet. Can I see a graph of the max trend and min trend over time? Meaning, y-axis is trend, x-axis is time; showing how your uncertainty intervals narrow since 2001, and whether the center is drifting.

    It just occurred to me that I can’t clarify for you because I don’t know where you are having trouble with my method.

    It’s mainly the MEI step that I haven’t sorted out yet. The rest of it, mechanically I know what you’re doing. It’s a question of how appropriate it is, and what you’d really expect to see. Again, hence the carrot test. If the results are actually sensitive to whether you use ARMA 1-1 or AR-1 or whatever else, then some care is required.

  54. Carrot

    Can I see a graph of the max trend and min trend over time? Meaning, y-axis is trend, x-axis is time; showing how your uncertainty intervals narrow since 2001, and whether the center is drifting.

    I”m not sure that’s so easy to make with EXCEL. It’s easy for white noise, but a PITA for AR(1) or ARMA.

    But I’m planning to adjust a script to do the computations. (It’s the right time to do it.) So, I can get that for you pretty soon.

    It’s mainly the MEI step that I haven’t sorted out yet.

    The MEI step is to add MEI to a linear regression. So, I say

    Temp = F(time, MEI) instead of just Temp= F(time). In EXCEL, that’s just
    LINEST(Temp, time, mei) for three series.

    Some of my readers have suggested I use ONI.

    ( I actually specifically added MEI, because Tamino was criticizing me for NOT using accounting for ENSO, but he specifically used MEI. Some of my readers suggested I check ONI, which might be better suited toward temperature because MEI is more suited toward precipitations– so I may switch to ONI. First, I’m going to check the literature on that to see if people use ONI or MEI more frequently for this.)

    Again, hence the carrot test.

    Tests I can show:
    1) If I generate a series where the noise is ARMA the rejections are less than 5%. (That leave the question: is hte noise ARMA). This is a standard test to check the false positive rate because you really do know the “true” trend. (It’s the exact same notion a the Carrot test. The only problem with your carrot test is that 0.2 C/decade is the trend for the multi-model mean, but it is not necessarily the trend for each of the 22 individual models. So some of the rejections are not false. They are true. We don’t know the proportion of false/true because we don’t know the true trend!)

    2) I can compare the actual St. Deviation of trends for models with multiple runs to the st. dev. estimated by the ARMA method. (This I could actually do fairly quickly. I was going to do it anyway. I’ve done it for AR(1), and AR(1) estimates are about 10% too small. The ARMA widens them, and just eyeballing for earth noise, I’m guessing it’s going to make them tooooo wide, but just a little bit.)

  55. People have been telling you for a while that you’re not properly modeling noise. For instance, El Nino oscillations. The long drawn out nature of your sort of semi-publishing, semi-working on things is kind of annoying too. you have people like Moshpit citing you…and then when I ask which post has the story, the whole thing is a moving target and never really clearly stated.

    that’s ok, that you don’t publish, don’t synthesize. You’re doing it for free. But then others should blow off your in progress work. Not just for the likelihood of error, but for the inscrutabulity of trying to clearly and consicely read what the hell your point is!

    P.s. I had a well renowned weather noise stats guy, whom you’ve actually CITED, and no he doesn’t want his name used, look at a little bit of your meandering (like I said there is no one consise statement, so I took pity and did not try to make him read back to the whole meandering journey!). He said, you lacked a feel (or an interest) in understanding the dynamics of well known (even from history) climate variability. glad that Carrot Eater finally penetrated your poise and got you to THINK! It’s amazing how long it took you. Months and years!

  56. I’ve always wondered if TCO’s lack of people skills is a substance thing, or a brain chemistry thing… any Psychiatrists among you statusticians?

  57. Carrot Eater:

    If the results are actually sensitive to whether you use ARMA 1-1 or AR-1 or whatever else, then some care is required.

    I think you’ll find the statistical processing matters more as you make the analysis interval shorter.

    For long periods of time the temperature trend becomes more robust, meaning it depends less and less on the statistical model you use to analyze the data with. That doesn’t mean you can’t do the shorter periods right of course, just that they do require more care.

    On a side note, isn’t there a bit of irony that many of the same people who are poo-poo’ing Lucia about her looking at short-term trends, turning around and taking at face value studies e.g., of climate in Australia over the last decade that must have an explanation in human activity?

    Last I checked, the models become less robust, rather than more, as you reduce the area of the Earth you are trying to cover.

  58. Carrick–
    Even for longer times, using least squares when the noise is ARMA(1,1) is still a problem. The only difference is that if you wait long enough, and a null hypothesis is wrong, there is a good chance that your test statistic is so far away from the boundary separating “statistically significant at p” to “not statistically significant at p”, that it doesn’t matter. To some extent, if you have the luxury of lots of clean data, it doesn’t matter what you do. The answer to some questions becomes obvious.

  59. lucia (Comment#38651) March 19th, 2010 at 3:15 pm

    Having proscribed variations for solar is also an option.

    Typically, you’d do something like a low boundary and high boundary and best guess. You might do such a test with a lower
    order model. At least that is how we would do it, we would use
    faster operating models to test the sensitivity of the parameter space and that would drive our selection of setting for the higher order models ( sometimes a dangerous thing, but then that was telling )
    I’m gunna guess that the first order effect of TSI is linear,But who knows the climate system may be very sensitive to variations in TSI. not sure how

  60. Steve–
    I’d guess to leading order, sensitivity to solar forcing should be the same as sensitivity to forcings of any type. I’d also guess to first order, the sensitivity to all forcings is linear.

    Very simple models can only hope to capture sensitivity to lower order. More complex ones can hope to do better. The question is “do they?”

  61. Now, maybe. Were you always? Or were some of your earlier posting citing signifacance, before you took that into account? I seem to remember complaining about it as a blind spot a while ago, before you did this.

    And my point is not to dredge up the past, but to make the point of why work that is so in progress should not be relied on…as well as to point out a pattern in your thought process that is not sufficiently skeptical of your own hypotheses.

  62. Oh…and why are you pre-moderating my comments? What was wrong with the last one? (Which was before you turned on the pre-moderation)?

  63. Re: PolyisTCOandbanned (Mar 20 15:26), You were hardly the first person to bring up ENSO. I posted an order or magnitude estimate based on Atmoz values way back in… may 2008? I incorporated a correct based on a method suggested in an RC post back in july 2008.

    This has been discussed in comments, and I deferred fully correcting until such time as I could consider both volcanic eruptions and ENSO at the same time. A method of doing both cleanly occurred to me.

    Because even at 8 years, the time span was sufficient to include some oscillation, the main effects of not correcting for ENSO was to increase variabilty in the OLS trend, this also resulted in wider error bars.

    This has been discussed at length. Sorry if the forum is inconvenient for you and annoys you. As I told you long ago, if you don’t want to read it, don’t.

    As for the pre-moderation: You were always pre-moderated. My policy toward you is that you will be premoderated until you have posted a sufficient number of not-annoying comments to prove yourself trustworthy and not liable to set off troll eruptions. (My guess is this will be never.)

    The reason you seemed to not be pre-moderated is that owing the the changed fake email address and the changed fake name, the premoderation plugin didn’t recognize you. Now it does. I’ve been looking into a plugin that requires first time commenters to respond to an email to avoid moderation. However, the one I found didn’t seem to work out of the box. However, if I am given sufficient incentive, I will renew my search.

  64. TCO,

    Why not be civil (or maybe this is civil for you)?

    And why not provide appropriate links and quotes as far as your complaints, so that the rest of us can maybe figure out what you are talking about?

    If you’re complaining about Lucia’s comment moderation policy, well, that’s different. She’s earned a reputation for quite a high threshold in that department. So if there’s a problem there, maybe start by looking… elsewhere?

  65. I’m gunna guess that the first order effect of TSI is linear,But who knows the climate system may be very sensitive to variations in TSI. not sure how

    This is an interesting problem, because intuitive assumptions that the solar cycle shows an increase (decrease) in output that it explains say T increase/decrease as say an explantion of natural variation.

    Well use Lean 2010 as an example.

    Direct linear association of natural and anthropogenic influences explains 76% of the variance in the observed global surface temperature during the past 30 years (and also in the past 120 years22), Figure 2 shows how a model that combines these
    influences clearly identifies the cause of the rapid global temperature rise from 1992 to 1998 as the result of ENSO-induced warming following Pinatubo produced cooling. The cause of the lack of overall warming in the last decade is also identified, the result of decreasing solar irradiance in the declining phase
    of cycle 23 from 2002 to 2009 and La Ni ˜ na cooling
    countering anthropogenic warming.

    Seems ok ,however the spectral irriadiance ie in the bands of absorption and interest say H20 and co2 have actually increased in the declining phase before leveling off at minima ie the forcing signs are inverse.

  66. Lucia:

    I think you mentioned doing something like this: As a sanity check, can you generate say a thousand years of synthetic data, 0.2 C/decade trend, add ARMA noise of whatever order, and see what your test does over intervals of any span?

    I have some other things in mind about looking at the specific model runs that failed your test, but I can probably do that from the spreadsheet you put out.

  67. I think you mentioned doing something like this: As a sanity check, can you generate say a thousand years of synthetic data, 0.2 C/decade trend, add ARMA noise of whatever order, and see what your test does over intervals of any span?

    Sure. But I have codes organized a bit differently.

    My test codes can create 1000 years of arma data for checking various things. But to test over specific intervals, (say 8 years), I generate 8 years. Test.
    Then I generate another 8 years, test.
    etc.
    The only difference between what I do and what you are suggesting is after 8 years, I initialize with with a random value. But I can easily just continue with the last value of the previous year.

    I’ve already done this, but it’s not discussed on the blog. I planned to eventually, so I can tweak your way. No problem!

  68. Lucia

    This is really good. Congrats, I think now you are getting some where with all this.

    I think it may be good to look at the characteristics of the model runs too, I don’t know how to explain it, but perhaps there are some model runs that exhibit features that are ‘strange’. It may be that if you remove the models that exhibit strange behaviour, the model mean becomes a better approximation of reality.

    Perhaps compare the behaviour (or features) of the observed temp with the model runs. I don’t know how this would be done…

  69. Lucia: Thank you again for being willing to try out suggestions, and for being honest about whatever little spreadsheet glitch you found.

  70. Nathan

    I don’t know how to explain it, but perhaps there are some model runs that exhibit features that are ’strange’.

    There are models that exhibit features that are ‘strange’. For example, some have way too much “internal variability” (i.e. weather noise.) Some probably have too little.

    Perhaps compare the behaviour (or features) of the observed temp with the model runs. I don’t know how this would be done…

    By eyeball, it’s obvious some models have way too much noise. (Others likely have too little, but it’s less obvious.)

    There are two big difficulties with demonstrating this to someone who would “like” to think the models noise is similar to earth variability:

    One difficulty is that the *first* thing you have to do is at least run a test to show the models disagree with each other and that disagreement is statistically significant. That is a minimum requirement for saying that we don’t do test like just comparing the model trend to the spread of the models.(i.e. the test Gavin does and Carrot wants to do. It would be fine if all the models shared the same mean trend, mean variance, mean any statistic as each other..)

    It turns out to be easy to show that the models all disagree with each other on… well… everything I’ve checked! (It’s rather odd to feel the need to show the models differ from each other, when, in fact, I’m pretty sure most climate modelers think they do differ from each other. But, if the do, the test Gavin showed at RC has very, very, very low power to detect bias in the mean trend of models. So, you have to do the t-test to detect that. It’s only in the context of the fact that people want to use that test that the proof of a fact nearly everyone concedes most of the time becomes necessary.)

    Now moving on to comparing individual models: After showing models differ from each other, you can start comparing individual models to the earth. In some cases, that’s very difficult because there are very few runs for most models. In all cases, you have to at least impose an assumption about the “noise” to do a good test. And…. that’s difficult to do in any remotely clean way.

    Carrot–
    Well… if I wasn’t honest about spread sheet glitches, someone would find them soon enough! 🙂 (But really, there is no point in claiming something based on computational error, is there?)

  71. You’re just going to run aground because of the low number of runs per model.

    In the end, simple linear trend + some sort of noise may be helpful in making some illustrations, but actually requiring that real life or expectations look like that is just going to lead you astray, and I think that’s what we’re seeing here.

  72. Carrot–
    There are enough runs for cross model comparisons. I can show the underlying trend for the collection of models differs during the first 30 years of the century.

    However, showing individual models differ from the earth’s trend requires assumptions. Given reasonable ones, we can sometimes show things– sometimes we can’t.

  73. Lucia, carrot eater,
    .
    It seems to me that models which have internal (weather) variation far away from the known (measured) variation on Earth should be considered very suspect, since clearly incorrect model weather noise means that model fails to accurately capture the dynamics of the system. Any model that is an arguably accurate representation of the system will reasonably match the measured variability at all time scales, as well as match the long term trend. Tweaking (optimizing) a model’s parameters and it’s assumed forcings can always make the temperature hind-casts look pretty good, as the very strong correlation between assumed aerosol effects and model predicted climate sensitivity shows. It seems to me that the correct internal variability ought to be at least as important as a validation factor as the hind-cast accuracy of the long term temperature trend, and one that is not so subject to easy adjustment by choosing the aerosol forcing that makes the model look good in hind-casts.
    .
    It’s much like finding that measured turbulent velocity variation in a process flow is grossly at odds with the predictions from a model of that process. Any engineer faced with a clear discrepancy in this case would conclude (we hope) that the model is not an accurate representation of the process, and so it’s projections doubtful. If it were his or her model, then we would hope our engineer would focus on what is wrong with the model, and explicitly discount model projections until the cause for the discrepancy is identified and fixed.

    Lucia, we kicked this issue around a bit some time last year when Easterling and Wehner published a paper in Geophysical Research Letters which claimed to show that the variability in the ECHAM model proved the recent low rate of observed warming was not unexpected/unusual, even though the ECHAM model temperature variability was comically higher (by eye-ball, 50%-100% higher) than the historical record. I sent Easterling and Wehner an email message raising this concern, but they did not reply.
    .
    So maybe it would be good to screen models for internal variability that is consistent with the actual temperature record before trying to use those models’ projections to test if the Earth’s temperature history deviates from model projections. You would need to know (I think) which models include known volcanic forcing and which don’t, since this factor would significantly influence apparent consistency with temperature variability. If a model’s variability is way wrong, then we know before doing the test that the results won’t mean very much.

  74. Lucia,

    Two more thoughts on internal variability of climate models.
    .
    Could there be a correlation between internal variability and model calculated climate sensitivity? Are the more sensitive models more variable or less variable than the less sensitive models, or is there no correlation? Do models that reasonably match the measured temperature variability of Earth diagnose similar climate sensitivities?
    .
    Since the temperature record includes at least some variability related to measurement uncertainty, can we not reasonably conclude that climate models (which do not include measurement uncertainty) ought to show slightly less variability than the temperature record?

  75. Lucia:

    Even for longer times, using least squares when the noise is ARMA(1,1) is still a problem. The only difference is that if you wait long enough, and a null hypothesis is wrong, there is a good chance that your test statistic is so far away from the boundary separating “statistically significant at p” to “not statistically significant at p”, that it doesn’t matter

    I guess my thought is that climate fluctuations that aren’t forced are band-width limited (there is a lower-frequency-limit to the “source region” for these climate fluctuations).

    Suppose for sake of argument, we take the putative 55-year oscillation observed in some temperature proxy measurements
    (e.g., middle figure in this link).

    For a long-enough time window, the secular temperature trend will always overwhelm any temporal fluctuation associated with climate “noise”.

    So if what you are interested in is just the temperature trend, and not modeling short-period climate fluctuations, then “just waiting long enough” works.

    What you are saying is right though… the model you’ve selected is still wrong (and this should be reflected in statistical tests), but what I’m saying is simply that the trend estimate becomes “robust against erroneous modeling of short-period climate fluctuations”.

    On the other hand, as I pointed out, if what you are interested in (totally hypothetically of course, *coughs*) is the 10-year trend in drought in Australia and it’s relationship to anthropogenic global warming, then it seems salient to worry about how well the models are getting temporal fluctuations over this same sort of interval.

    The global warming activists appear to want it both ways: They want us to worry about short-period climate changes such as the Australian drought, but not worry about it if the global climate models fail to describe global climate changes over this same interval. Somehow in their universe, these two views are fully consistent.

    What I am saying is something different. If temperature is a “blunt instrument” (in Eli’s words), then the distribution associated with short-period climate change is something that should be robustly predicted by the any of the “good” climate models.

    On the other hand, if we want to throw away consideration of global averages of short-period climate fluctuations from the models, it seems reasonable we must also throw away regional short-period model predictions as well. What’s good for the goose is good for the gander.

  76. Attribution of a single drought is always going to be a crapshoot. Most you can say is maybe it’s a bit worse than otherwise, maybe it’s part of a pattern.

  77. Carrick

    Parts of Australia have had reduced rainfall for over 30 years. That’s not short term. Southwest Western Australia has had it’s rainfall reduced by about 30% since the 70s… The change is quite big. I know one year isn’t ‘important’ but in Perth we have had no rain of significance since Nov 26th last year. We might have some today, let’s hope so.

    The drought in the East and Southeast lasted around 7 years, and this was on top of a series of droughts in the 80s and 90s.

    Maybe you could look at the stats for rainfall in Australia? I doubt you’ll find it’s a short term problem. Possibly you will find a large number of short term problems… But these should probably be viewed as one large problem.

    “They want us to worry about short-period climate changes such as the Australian drought, but not worry about it if the global climate models fail to describe global climate changes over this same interval. Somehow in their universe, these two views are fully consistent.”

    I’m not even sure how these views are inconsistent. It’s not just the period that is labelled ‘drought’ that is important. The whole of southern Australia has had a marked reduction in rainfall coupled with an increase in evaporation (more clear days, hotter weather). I think the problem is one of labelling.

  78. Nathan:

    Maybe you could look at the stats for rainfall in Australia? I doubt you’ll find it’s a short term problem. Possibly you will find a large number of short term problems… But these should probably be viewed as one large problem.

    You seem familiar with it. Why don’t you do it and present the results to us if you think they are so obvious?

    Seriously what you are doing isn’t that different than Laura S, namely making claims then expecting everybody else to test them for her.

    What I’ve seen doesn’t seem so obvious with respect to Australian rain-fall patterns: The current pattern of decline since circa 1995-current isn’t that different now compared to the circa 1935-1950 pattern. See e.g., this.

    Beyond that, long-term rain-fall has increased. (Yes it stops at the end of 2005… but we’re talking long-term trends here not short-term ones, you can neglect 4-years against 105.)

    Anyway, that’s not even my point… These patterns could as easily be related to climate fluctuations as to the “fingerprint of man on climate.” This is why it is so necessary for climate models to get these sort of patterns right and why what Lucia is doing is useful.

    If they can’t be relied on, we need to recognize they are worthless for helping us interpret how our climate is changing around us. In which case we are left to more robust testing, which means in this case very long-term trends where we can neglect natural fluctuations compared to long-term secular trends.

  79. Carrot Eater:

    Attribution of a single drought is always going to be a crapshoot. Most you can say is maybe it’s a bit worse than otherwise, maybe it’s part of a pattern.

    Same applies to one example of a favorable weather improvement. In the US, growing seasons are longer, rainfall is up over 100 years. I think that is even true in Australia in spite of their recent decade-scaled problems.

    There’s no way around it, there are a lot of people who are fitting natural fluctuations to patterns in their minds. That’s not an efficient way to analyze data.

  80. I went back and looked.

    Here’s Australia’s annual rainfall data.

    Here is it by decade:

    columns:
    1 year span
    2 average rainfall (mm)
    3 ranking (“1” is highest)

    1900-1910 432.04 7
    1910-1920 444.22 6
    1920-1930 426.84 9
    1930-1940 417.37 11
    1940-1950 429.82 8
    1950-1960 459.02 5
    1960-1970 423.52 10
    1970-1980 518.15 1
    1980-1990 459.29 4
    1990-2000 478.00 3
    2000-2010 488.26 2

    By that measure this last decade was the second highest in recorded history for Australia.

    We’re back to what I was saying there are those who judge it OK to look for patterns in noise, as long as the patterns conjure fearful images. But take a well-defined statistical method, apply it to data and model, and suddenly it’s open for ridicule by some (*cough* hi eli *cough).

    All I’m gonna say.

  81. Carrick,

    Annual rainfall for Australia as a whole is a predicted consequence of climate change. However, on a regional level, this is predicted to break down into increased rainfall in the north and decreased rainfall in the south. And these predictions have come true. I should note, however, that these predictions were *not* expected to come true at this early stage.

    The south-east of Western Australia and Canberra (where I live) are two places where it appears as though what is termed a step-change has occurred. There is more evidence that what has occurred in Western Australia is a result of climate change, and a mechanism has been proposed for the change, with evidence suggesting that the proposed mechanism is real. In this case, it is increased precipitation in Antarctica (another prediction of the effects of climate change), which is reducing the moisture in the air masses that move from Antarctica north to Australia.

    As far as I know, no-one has proposed a mechanism as yet for Canberra. But the CSIRO suspect that a similar step change has occurred here. They would agree with you that there is not yet enough data to be sure – indeed, I think that they suggested that we need another decade of data to know in a statistical sense whether or not such a step change has occurred in Canberra.

  82. Carrick

    Note I was talking about southeast and southwest Australia – that’s where most of the people live and most agriculture is.

    So, Southeastern Aust:
    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rain&area=seaus&season=0112&ave_yr=0

    Southwestern Aust:
    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rain&area=swaus&season=0112&ave_yr=0

    Murray Darling basin:
    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rain&area=mdb&season=0112&ave_yr=0

    The drop in rainfall has also been exacerbated by the increase in temp, so evaporation has increased in southwest Australia.
    Evap in SW WA:
    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=evap&area=swaus&season=0112&ave_yr=0

    All three of those regions have shown a marked increase in max and min temps between the drought in the 40s, so the water requirements of agriculture (and people) is higher. Combining the effects of this drought, with a decline in the ‘ordinary’ rainfall level, and increasing temps has made this drought far worse.

    Note that the reduction in rainfall in the south, with increased rainfall in the north (which typically result in flooding events) is in line with projections made by the CSIRO – this is what we were told to expect.

  83. It should be noted that Canberra has had well above average rainfall over the last two months, just for comparison with the short-term stuff for Perth. 🙂

  84. Carrick I urge you to read the conlcusions section of the doc you linked.

    Here:

    “The long-term rainfall deficiency since October
    1996 across South Eastern Australia (south of
    33.5ºS and east of 135.5ºE) documented by MT08
    was described as being severe but not
    unprecedented in the instrumental record.
    With an additional 3 years of below average
    rainfall, that statement is no longer true. The
    recent 12 year, 8 month period is the driest in the
    110 years long record, surpassing the previous
    driest period during WWII. The spatial extent of
    the deficiency covers most of the south-western
    part of eastern Australia and extends along
    significant orographic features eastward and
    northward. The seasonal signature of the rainfall
    decline has also evolved. It remains dominated by
    a strong and highly significant autumn rainfall
    decline, but has been supplemented by recent
    declines in spring, particularly after 2002. The
    spring decline is the dominant feature of the very
    dry 2006-2008 period.
    This change in the relative contributions by the
    autumn and spring seasons now more closely
    resembles the picture provided by climate model
    simulations of future changes due to enhanced
    greenhouse gases. However, the growing
    magnitude of the rainfall decline is far more
    severe than any of the IPCC-AR4 model
    projections except for the lowest deciles from the
    model uncertainty range, forced with the highest
    emission scenarios occurring later in the 21st
    century (2050 to 2070) (CSIRO and Bureau of
    Meteorology, 2007).
    The most important characteristics of the ongoing
    rainfall decline (spatial extension,
    intensification and change in seasonality) are well
    aligned with the recent evolution of the STR and
    its known influence on SEA rainfall. Other largescale
    influences were briefly evaluated. It appears
    unlikely that the ENSO mode of variability has
    contributed to the worsening of the rainfall
    decline in the last 3 years. On the contrary, it
    appears likely that the Indian Ocean mode of
    variability (with three positive IODs in a row)
    may be linked to the strong spring signal in 2006-
    2008. However, that does not change the fact that
    the IOD is unlikely to be responsible for the
    largest component of the rainfall decline (the
    autumn part) and based on the limited evidence
    provided here, it is unclear whether the IOD is a
    contributor, or simply a covarying response to
    other factors. Finally, the long-term evolution of
    the SAM remains unlikely to explain the longterm
    decline in SEA due to the seasonal nature of
    the influence of SAM on SEA rainfall but its role
    (both positive or negative) is visible while
    updating month by month anomalies
    One of the goals of the new SEACI program
    involves “investigating the causes and impacts of
    climate change and climate variability across
    south eastern Australia” This is now more
    relevant than ever, particularly as we are dealing
    with the worst rainfall deficit in the region within
    more than a century long instrumental record.”

  85. David,

    I think you should pipe it over here!!

    Actually we’re lucky in Perth. We have a big aquifer and they’ve built some desal plants. So no one will run out for a while here. Best solution for us is to recharge our aquifers with recycled water.

    Sadly it won’t save the street trees in my area. For some reason the planted Queensland Box trees, a lot are dead or dying because of our recent low rainfall (they were planted in the 50s I believe).

  86. Thanks for the comments, David. Of course I agree with you that generally one expects longer growing seasons and greater precipitation from a warming climate, but that one could see regional deviations from that pattern (especially within latitude bands).

    That said, hopefully you recognize the conundrum of using results which do not confirm the prediction of a theory as evidence to support that theory.

    That said, I’m curious… can you provide a breakdown for smaller regions of average rainfall by decade? (North versus South would be OK). I’m not that familiar with Australia’s weather service and anyway I have a report I have to finish writing tonight.

  87. The recent 12 year, 8 month period is the driest in the
    110 years long record, surpassing the previous driest period during WWII.

    So it’s comparable to that previous driest period then, right?

    And that’s by cherry-picking the starting point. If you extended it to 15 years what happens? If the conclusions flip-flop, you’ve got a problem with robustness.

    You mentioned 30 years. I couldn’t find anything about 30 year intervals that looked particularly bad.

  88. Carrick,

    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rranom&area=nsw&season=0112&ave_yr=10

    These provide timeseries that can be used with 10-year averages (the above is for NSW), and you can also get the data and run your own tests on them.

    As an example:

    http://www.bom.gov.au/web01/ncc/www/cli_chg/timeseries/rranom/0112/nsw/latest.txt

    The BOM is actually very good, imo, at giving lots of information. However, sometimes moving arounds the BOM website is not that intuitive (that’s what I find, anyway.)

  89. For Canberra, the last two 30-year periods have been more than two standard deviations below the mean – the only time that this has happened in the records available to me, which admittedly are only from 1940 onwards.

  90. Fair enough, here’s the same table as before, but for Southern Australia.

    1900-1910 373.56 6
    1910-1920 373.15 7
    1920-1930 363.36 10
    1930-1940 369.72 8
    1940-1950 357.86 11
    1950-1960 405.30 3
    1960-1970 375.43 5
    1970-1980 422.25 1
    1980-1990 387.95 4
    1990-2000 407.19 2
    2000-2010 365.98 9

    Note that the second best (1990-2000) is only 42 mm higher than the ninth worst (2000-10). That’s not that much difference by weather standards.

    Care to look at 30-year patterns?

  91. Carrick

    what would convince you?

    Did you read the recent paper linking increased snowfall in Antarctica with reduced rainfall in Southwest western Australia?

  92. Carrick

    The drying in Southwest WA looks to me to be starting in about 1970…

    Is that not ‘close enough’ to 1980? Or do you think it should start exactly at 1980?

  93. Nathan:

    Did you read the recent paper linking increased snowfall in Antarctica with reduced rainfall in Southwest western Australia?

    Link it and I might read it.

    Let me turn this around.

    It seems to me the real problem is I don’t need affirmation of the basic theory…I think it’s reasonably sound, so I’m not grasping for every straw that comes along and my initial approach is to take what the science predicts at its word and go from there. So when I see data that aren’t conforming to the model, my immediate response is to say “this probably isn’t related”.

    What is it that convinces you there is a causal relationship between a regional-scale drought and global warming? David Gould himself admits this pattern is at odds with the theory. Why does a result that violates the expectation of a theory suddenly be regarded as strengthening the theory?

    If you wanted to link the drought to regional-scale land-use changes, I’d find that a lot more plausible.

  94. Nathan:

    The drying in Southwest WA looks to me to be starting in about 1970…
    Is that not ‘close enough’ to 1980? Or do you think it should start exactly at 1980?

    Here’s the decadal data:
    1900-1910 608.48 9
    1910-1920 624.98 6
    1920-1930 614.12 8
    1930-1940 631.60 4
    1940-1950 585.16 10
    1950-1960 687.76 1
    1960-1970 628.07 5
    1970-1980 684.52 2
    1980-1990 619.98 7
    1990-2000 634.40 3
    2000-2010 554.41 11

    1980-2000 looks like they buck your trend to me.

    I ‘m always a lot more convinced when I have data and model line up, and given we know there are natural climate fluctuations of decade and longer length, it seems crazy making to me to ascribe every weather shift we see to AGW.

  95. Carrick where are you getting your data from?

    This is the third time I have linked to the rainfall for southwest WA

    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rain&area=swaus&season=0112&ave_yr=0

    “What is it that convinces you there is a causal relationship between a regional-scale drought and global warming? David Gould himself admits this pattern is at odds with the theory. ”

    WA has very pedictable weather patterns. Our rainfall (in the winter) is dictated by cold fronts moving up from the Southern Ocean. What we have seen is that high pressure systems in the Great Australian Bight tend to linger into autumn for longer, and arrive in Spring earlier. These high pressure systems deflect the cold fronts south, preventing the rain from arriving. This poleward movement of the high pressure system (the Hadley Cell, I believe) is expected under AGW.

    David Gould said the opposite of what you thnk:

    “The south-east of Western Australia and Canberra (where I live) are two places where it appears as though what is termed a step-change has occurred. There is more evidence that what has occurred in Western Australia is a result of climate change, and a mechanism has been proposed for the change, with evidence suggesting that the proposed mechanism is real. In this case, it is increased precipitation in Antarctica (another prediction of the effects of climate change), which is reducing the moisture in the air masses that move from Antarctica north to Australia.”

    The CSIRO modelling predicted more drought in the south and more rain in the north…

  96. Nathan:

    Carrick where are you getting your data from?

    Your links.

    I follow the “raw data set” links and process from there.

  97. Carrick,

    I did not say what you think I said. The predictions from the attempts at regional modelling of the effects of climate change for Australia are coming true. The main thing that the predictions are getting wrong is the *timing* of the changes. As an example, for Canberra rainfall decline has already exceeded the prediction for 2030.

    However, I did confirm that at present for Canberra the CSIRO do not think that there is enough data to come to a firm conclusion. They believe that it is likely that a step change has occurred. But they will not know for sure for around 10 years. For WA, they are sure. And they also believe that they have found the mechanism.

    Based on my own amateur analyses of Canberra climate data (found http://evilreductionist.blogspot.com/), I am more of a pessimist than CSIRO. I believe that runoff to Canberra dams will be effectively zero by 2017. And I believe that rainfall will decline to below desert levels by midcentury. However, I accept that at the moment the data is still equivocal.

  98. Thanks for proving my point, though, Nathan.

    In the almost complete absence of any real data supporting it, you are obviously warm and cozy with the notion that the climate shifts you are seeing are the result of anthropogenic warming, all the while being obviously edgy (at best) with the much more refined analysis that Lucia is doing over similar time scales and for the full Earth.

    It is as I said, the models necessarily become less reliable not more as you decrease the scale sizes you are analyzing. And the effect of local climatic fluctuations becomes more important not less as you decrease the scale size you are analyzing.

    No irony at all here, right?

  99. David Gould:

    The predictions from the attempts at regional modelling of the effects of climate change for Australia are coming true. The main thing that the predictions are getting wrong is the *timing* of the changes

    Time scales matter for models too.

    Seeing the same qualitative effect you are expecting, but not over the time scale expected, is not a confirmation of the model. It’s a contradiction to it.

  100. Carrick,

    For Western Australia, every 10-year rolling average from the 10 years ending in 1973 to the present has been below the long-term average. Yes, there was a bump up in those averages in the middle. But they did not cross the average threshold. I think that that long a time period below the long-term average is strong evidence that something odd is going on. It is not what you would expect by chance.

  101. Carrick,

    I agree. The models are wrong. And I think that they are even more wrong. That is why I am so pessimistic regarding climate change.

  102. Basically, knowing that the models are getting things right except for the timescale is not actually comforting …

  103. One correction. The last table I gave was from SEA not SEA. Here’s SEA:

    1900-1910 709.49 3
    1910-1920 617.86 10
    1920-1930 734.79 1
    1930-1940 713.06 2
    1940-1950 695.36 5
    1950-1960 677.10 6
    1960-1970 706.54 4
    1970-1980 645.25 8
    1980-1990 637.37 9
    1990-2000 659.92 7
    2000-2010 604.97 11

    Looks to me like this is a drying pattern that started in 1920, not 1980.

    Also David, I’m not sure what the basis for your pessimism is. It’s not like you’re getting zero rainfall in SEA (or are we shifting the boundaries yet again for where we’re looking at?).

    You’re still getting (on decadal scales) around 550-mm of rainfall annually. That isn’t great, but it is close to the historical annual average of 625±100 mm/year, and nowhere close to desert like conditions (which would be like 250-mm).

    If you are worried about increase in human demand, that is a real issue. We have waterfall shortages in the SE US also, where we get around 1200–mm annually.

  104. Carrick

    “Seeing the same qualitative effect you are expecting, but not over the time scale expected, is not a confirmation of the model. It’s a contradiction to it.”

    Are you serious?

    “In the almost complete absence of any real data supporting it, you are obviously warm and cozy with the notion that the climate shifts you are seeing are the result of anthropogenic warming, all the while being obviously edgy (at best) with the much more refined analysis that Lucia is doing over similar time scales and for the full Earth.”
    I will find the study linking AGW, Sw WA rainfall and precip in Antarctica.

    The problem i had with Lucia’s analysis was the short timescale and the fact that it didn’t go anywhere. Carrot Eater has prodded her eenough to get her to take it somewhere, and I have mentioned above that I am excited about where it is going. You can’t compare the two.

    I get the feeling this is little more than an attempt at a ‘gotcha’.

  105. David Gould:

    Basically, knowing that the models are getting things right except for the timescale is not actually comforting …

    If you are seeing an effect, and it isn’t matching the model, you don’t know it has anything to do with it. After all, it’s not like the models created the concept of “long-term drought” and it had never been observed before the models were run. We have periods of hundreds of years long in the US SW where we get dry periods followed by wet periods. These are from well before human industrialization, so I’m pretty sure that’s not an explanation.

    You either believe the models or you don’t. You don’t get to pick and choose when you are going to believe them, not at least with out a solid basis for argument.

  106. Carrick,

    Note that I was talking specifically about Canberra re desert conditions. It is Canberra that is the area that the CSIRO believes has probably undergone a step change in climate. The decadal for the last decade was more than 2.5 standard deviations below the mean.

    And runoff is being even more badly affected. Last year (for example) we received 23 per cent of annual average runoff. For the last decade, the average has been 38 per cent. Now, rainfall has only declined by around 18 per cent. Yet runoff has declined 62 per cent. This is a bad, bad news.

  107. Carrick

    “Also David, I’m not sure what the basis for your pessimism is. It’s not like you’re getting zero rainfall in SEA (or are we shifting the boundaries yet again for where we’re looking at?).

    You’re still getting (on decadal scales) around 550-mm of rainfall annually. That isn’t great, but it is close to the historical annual average of 625±100 mm/year, and nowhere close to desert like conditions (which would be like 250-mm).”

    Seriously, this is just silly. Why would decadal averages matter?

  108. Carrick,

    Re models: I understand what you are saying. But models can be wrong in a number of different directions and in a number of different ways. I would suggest that if a model is proving correct in its resolution of effects but it wrong in the timescales that that is indeed evidence that the effects of climate change are occurring faster than we think. Is it strong enough evidence to say definitively? In the case of Western Australia, Australian scientists believe it is; in the case of Canberra, Australian scientists believe that there is not sufficient evidence as yet.

  109. Nathan you understand this is a truism, right?

    “Seeing the same qualitative effect you are expecting, but not over the time scale expected, is not a confirmation of the model. It’s a contradiction to it.”

    If I had a model for solar eclipses and it occurred six months early, I would call that a failure of the model. Wouldn’t you?

    Seriously, you think that a long-term drought not predicted by the models is a success of the models?

    The problem i had with Lucia’s analysis was the short timescale and the fact that it didn’t go anywhere. Carrot Eater has prodded her eenough to get her to take it somewhere, and I have mentioned above that I am excited about where it is going. You can’t compare the two.

    Why can’t I compare the two? This sounds more like a plea for me not too. 😉

    As we’ve established:

    • They have similar time scales (one decade).
    • Lucia is looking at full-global integrated data to models.
    • You’re looking at regional scale changes
    • I’ve pointed out the problems with going regional scales: The models fidelity gets worse, the effect of regional-scale “weather noise” increases in its influence (the AGW becomes harder and harder to actually discriminate)
    • Lucia is using standardized tests that are well-established to compare model to data, you are eyeballing graphs with no specific model comparison.

    Seems there’s a problem here Houston.

    I get the feeling this is little more than an attempt at a ‘gotcha’.

    If pointing out blatant logical contradictions is a “gotcha”, it’s a gotcha.

    But I was shooting for a bit higher point than that, which is the thing that never ceases to amaze me, namely how humans fix belief.

  110. Carrick

    Why did you just average each ten year period? That’s useless.

    There’s no logical contradictions at work here. The CSIRO modelled data and the effects were WORSE than their model. You can claim the model is wrong and just chuck the results if you want, but that’s dumb.

    “But I was shooting for a bit higher point than that, which is the thing that never ceases to amaze me, namely how humans fix belief.”
    Good grief man, it’s simply because you ignore the data we give you and instead use dumb things like ranking each ten year period. And then claiming things like oh, each ten period isn’t so bad… Lunacy.

    You even ignored the conclusions of the document you linked.

    You have also failed to consider the impact of rising temps with the reduced rainfall. We pointed that out to you earlier, but you’d rather just look at ranked ten year periods.

  111. David Gould:

    I would suggest that if a model is proving correct in its resolution of effects but it wrong in the timescales that that is indeed evidence that the effects of climate change are occurring faster than we think.

    You would need to tie this into the rest of the planet. You can’t isolate SWA from the rest of the planet, the effects are correlated.

    It may be true that it is caused by AGW. Without model results to substantiate it, though, it’s a hard call to attribute any weather event (decade-scale or not) to any specific cause. And whether you’ve gotten somebody else to sign on to agreeing with you, doesn’t make this statement more or less true.

    And as I pointed out, as you make the scale more and more regional, the averaged amplitude of regional scale weather actually grows (until you reach the spatial correlation length for weather, around 1000-km or so, where the amplitude begins to saturate).

    Anyway, compared to the sorts of historical event that have occurred even within the last 100 years, what is going on there is neither profound nor nearly as wide-spread as the droughts seen in the 1930s and 40s. (That does not prove that this isn’t AGW related, I realize that, I’m simply pointing out it is on the same scale as other, naturally caused, fluctuations, so caution is necessary.)

  112. Nathan:

    Why did you just average each ten year period? That’s useless.

    Nathan this just shows how nonexistent your understanding of statistics is, nothing more. If we are trying to get rid of short-term trends, of course we use longer-period averages. 10-year is the minimum I would ever use without a weather noise model, 30-years is preferred by me.

    But since I started with the premise of decade scale changes, it would be “natural” to use decade scale measures to test it.

    There’s no logical contradictions at work here. The CSIRO modelled data and the effects were WORSE than their model. You can claim the model is wrong and just chuck the results if you want, but that’s dumb.

    It’s not “dumb” to point out that the model predicts one thing (droughts in 2030 according to David) and the data says something different (drought in 2000-10) and conclude they are not consistent.

    Good grief man, it’s simply because you ignore the data we give you and instead use dumb things like ranking each ten year period. And then claiming things like oh, each ten period isn’t so bad… Lunacy.

    Hm… Name calling as a form of scientific proof.

    They teach you that in grad school Nathan? In place of statistical analysis, I mean?

    You have also failed to consider the impact of rising temps with the reduced rainfall.

    I ignored it, because it’s (politely) science fiction.

    Warming temperatures general equal more rainfall, not less. David Gould said as much.

    You are just getting a bit wild here because again the conclusions don’t match your premises, and you obviously lack the math skills to argue sensibly. Had I ranked global mean temperature in a similar manner:

    1880-1890 -0.27 13
    1890-1900 -0.25 10
    1900-1910 -0.26 11
    1910-1920 -0.27 12
    1920-1930 -0.17 9
    1930-1940 -0.04 8
    1940-1950 0.04 4
    1950-1960 -0.02 7
    1960-1970 -0.01 6
    1970-1980 -0.00 5
    1980-1990 0.18 3
    1990-2000 0.31 2
    2000-2010 0.51 1

    I’m pretty sure you wouldn’t have raised a similar objection.

    (I bring this one up anytime somebody talks about the “cooling” 2000-10 decade.)

    Ranking by single years makes little sense, considering the amount of fluctuating signal that is present. Decades drops the amount of variance…

    [And yes, ranking data is standard practice. See “median”. But if you’re going to rank, it is generally sensible to do it with long term averages, or by other means to reduce the overall variance first.]

  113. Carrick

    “Anyway, compared to the sorts of historical event that have occurred even within the last 100 years, what is going on there is neither profound nor nearly as wide-spread as the droughts seen in the 1930s and 40s. ”

    The document you linked to earlier demonstrates thi sis not true. You read that document, how can you have failed to notice this?

  114. Nathan:

    The document you linked to earlier demonstrates thi sis not true. You read that document, how can you have failed to notice this?

    I have already posted numbers from the Australian weather service demonstrating it was not atypical. And you are referring to a cherry picked interval, I addressed that also. Cherry picking is very bad form as a means of statistical inference; it proves nothing.

  115. Carrick

    “You have also failed to consider the impact of rising temps with the reduced rainfall.

    I ignored it, because it’s (politely) science fiction.

    Warming temperatures general equal more rainfall, not less. David Gould said as much.”

    ARGH!!! The reason you don’t understand this is because you are being deliberately obtuse. Rising temps, is a fact in Australia. Rising temps means that plants and animals and people NEED more water. So the reduction in rainfall is WORSE because we need MORE water. Do you understand?

    “You are just getting a bit wild here because again the conclusions don’t match your premises, and you obviously lack the math skills to argue sensibly.”

    You are playing a stupid gotcha game Carrick. The conclusions are good enough for CSIRO, BOM, and a whole range of scientists. They’re good enough for me too.

    You also lack the character to admit that the paper you linked to, to further your case, actually supports mine.

    How is your decade, by decade analysis better than the BOM one?

    http://www.bom.gov.au/cgi-bin/climate/change/timeseries.cgi?graph=rain&area=swaus&season=0112&ave_yr=10

  116. Carrick

    The author of that document disagrees with you.

    There is a big difference between averaging each ten years, and looking at the 10 year average.

  117. Nathan (Comment#38846) March 21st, 2010 at 10:53 pm

    Snowfall increase in coastal East Antarctica linked with southwest Western Australian drought

    As increased precipitation is not apparent at the Russian antarctic stations it seems east antarctic precipitation is not ubiquitous.

    Eg Mirny

    http://www.aari.ru/resources/plot/plot.php?slope=-0.019236494437586595&inter=51.50189706966493&h=250&w=400&d=mir/prec.txt&m=12&mt=annual%20period&s=Mirny%20observatory%20(89592)&p=Precipitation%20(mm)

  118. Nathan:

    Rising temps, is a fact in Australia. Rising temps means that plants and animals and people NEED more water.

    Sigh.

    You’re not discussing precipitation now, you’re discussing water usage. That’s an entirely new topic and one that conflates many issues: It has to do with land usage changes and resource allocations. Global temperature plays some role there, but changes in agricultural patterns are certainly important as well.

    You are playing a stupid gotcha game Carrick. The conclusions are good enough for CSIRO, BOM, and a whole range of scientists. They’re good enough for me too.

    Politely, I don’t need anybody to spoon-feed me my thoughts. If you do, find another place to discuss those thoughts. I would like to reserve this thread for rational discussion.

    How is your decade, by decade analysis better than the BOM one?

    It’s neither better nor worse. It’s just another way of looking at data (tabular rather than smoothed graphic). And it includes information not on the chart, namely ranking.

    I also pointed out to you above, I initially followed the wrong link of yours for SWA. Here it is again without typos:

    1900-1910 709.49 3
    1910-1920 617.86 10
    1920-1930 734.79 1
    1930-1940 713.06 2
    1940-1950 695.36 5
    1950-1960 677.10 6
    1960-1970 706.54 4
    1970-1980 645.25 8
    1980-1990 637.37 9
    1990-2000 659.92 7
    2000-2010 604.97 11

    When I follow the right link I see the same pattern (see my comment to DG):

    Declining precipitation since 1920. That was hardly attributable to a phenomenon that was supposed to start circa 1980.

  119. Nathan:

    The author of that document disagrees with you.

    Actually you don’t know that he does, you are inferring this.

    There is a big difference between averaging each ten years, and looking at the 10 year average.

    If I even understand the comment, the only difference is I choose the intervals a priori.

    Selecting a 12-year 8-month (not 10 year) period to give a particular result certainly is “cherry picking” OTH. The quote:

    The long-term rainfall deficiency since October 1996 across South Eastern Australia (south of 33.5ºS and east of 135.5ºE) documented by MT08 was described as being severe but not unprecedented in the instrumental record. With an additional 3 years of below average rainfall, that statement is no longer true. The recent 12 year, 8 month period is the driest in the 110 years long record, surpassing the previous driest period during WWII.

    12 year, 8 month period

    Cherry picking.

  120. Carrick

    “1900-1910 709.49 3
    1910-1920 617.86 10
    1920-1930 734.79 1
    1930-1940 713.06 2
    1940-1950 695.36 5
    1950-1960 677.10 6
    1960-1970 706.54 4
    1970-1980 645.25 8
    1980-1990 637.37 9
    1990-2000 659.92 7
    2000-2010 604.97 11”

    This doesn’t show a trend starting in 1920… It shows one dry decade surrounded by wet ones.

    “You’re not discussing precipitation now, you’re discussing water usage. That’s an entirely new topic and one that conflates many issues: It has to do with land usage changes and resource allocations. Global temperature plays some role there, but changes in agricultural patterns are certainly important as well.”

    We were discussing drought, Carrick, and I tried to indicate to you that one thing that makes this drought worse is that it is hotter. When it is hotter, things need more water. Drought is about water usage. Changes in ag practices in Australia have tended to make drought less problematic (as it has everywhere in the world). However, this drought, is worse.

  121. Carrick

    Lets assume your data series is

    3, 10, 2, 5, 6, 4, 8, 9, 7, 11

    When does the upward trend start? My money is on about the 8. Definitely the 9.

    certainly not the 10.

  122. Nathan:

    We were discussing drought, Carrick, and I tried to indicate to you that one thing that makes this drought worse is that it is hotter. When it is hotter, things need more water. Drought is about water usage.

    If you want to make it about water usage fine.

    We till just now clearly discussing it as “a shortage of rainfall”, and it was very clearly about changing precipitation levels…show me one link of yours for example that discusses available water supplies rather than changes in precipitation patterns.

    David Gould very well, on his own blog, may be covering other issues too. I haven’t had a chance to pop over there and look.

    Changes in ag practices in Australia have tended to make drought less problematic (as it has everywhere in the world). However, this drought, is worse.

    /facepalm

    Having just said it was about available water capacity, you’ve now popped back to the “shortage of rain fall” usage again.

    Changes in AG practices usually reduce the water available for other uses. Many of the droughts, including this one, wouldn’t have been “near as bad” were it not for increase in human demand for available water. (If I had to pick, increase in population density is more important to this point than AGW for available water, but that’s just a guess.)

  123. Carrick

    That time period was not cherry picked, the document was an update of an earlier attempt.

    “This note intends
    to contribute to the overall program goal by
    updating the description of the rainfall decline in
    SEA, its continuation since 2006 and changes in
    characteristics (i.e. magnitude, spatial extension
    and seasonality).”

    And was written in May 2009 – hence the extra 3 years… It’s not cherry picked. He updated an earlier study looking at the previous 10 years.

    And this wasn’t the only point the author made. He did look at trends etc And worked out which seasons had reduced rainfall etc.

  124. Carrick

    Drought is not just about precipitation. We were always discussing drought. You focussed on precipitation.

    “Having just said it was about available water capacity, you’ve now popped back to the “shortage of rain fall” usage again.
    Changes in AG practices usually reduce the water available for other uses. Many of the droughts, including this one, wouldn’t have been “near as bad” were it not for increase in human demand for available water. (If I had to pick, increase in population density is more important to this point than AGW for available water, but that’s just a guess.)”

    ok… what? The improvements in ag practices mean we get for more for a particular quantity of water. So the improvements in ag practices mean that we get more food for the same amount of water.

    “Changes in AG practices usually reduce the water available for other uses.”
    It’s not the changes in practices that reduce water availability. It’s the fact they are there. Farmers in Australia have become better at using water, so as they change their practice they become better.

    I can’t see your figure, it won’t load for me. Also I left out the wettest year in my short list, so you would need a 1 after the 10.

    I can’t believe you think the 1920 started a declining trend when the next two decades were the wettest in the 20th Century.

  125. Nathan, anytime you pick the starting or ending points of an interval to make a particular inference, that’s “cherry picking”. That doesn’t make his statement wrong, it just reduces it’s utility for making inferences with.

    Had he kept the interval the same and just moved it by three years, I wouldn’t personally call it cherry picking, but it still lacks the rigor of a priori selection of the analysis intervals.

    Ten years is in fact an arbitrary interval, as is starting and stopping on the “0” boundaries. But that’s a strength, not it’s weakness. As soon as I allow the data to modify my selection criterion, I begin the process of cherry picking…aka “data mining”.

    On an aside note, the “1-decade” interval length was picked to approximately coincide with the 9-year PDO; this averages out part of the variability associated with that, just as the 5-year average drops out most of the ENSO-related variability, and the 30-year average kills most short-period climate variability.

  126. Nathan:

    Drought is not just about precipitation. We were always discussing drought. You focussed on precipitation.

    We both were discussing drought as is in “lack of rain fall” as in changes in precipitation pattern. You can have the same rain fall pattern and end up with a shortage of water, it happens.

    And every single url you or I linked to were about changing precipitation patterns, and their relationship to AGW and GCMs. Not much room in any of that for e.g., turf grass.

    I can’t believe you think the 1920 started a declining trend when the next two decades were the wettest in the 20th Century.

    I was linking SWA. Which are you looking at?

    try this.

    You could try a little less sophistry next time.

    I’m out of here. Bed time.

  127. Carrick, of course the trend will be down when you start form the wettest year in the 20thC. I must say that is a particularly useless figure as we were discussing when the trend changed, not what the trend is from the wettest year.

    Carrick
    The point he was making was that this was the longest drought… I don’t see how you can do that any other way than by picking the start… Remember the drought hasn’t broken yet and he wrote that in May 2009. Although some may say the Drought broke early this year (or very late last year) – we still haven’t seen that much rain. So in fact his 12 years 8 months could be extended to 13 years +. His conclusions had little to do with this time period, and were based around reductions in Autumn and Spring rains.

  128. I got your figure.
    I don’t see how showing a trend starting from the wettest year is useful. Were discussing where the long term trend changed, not what it is from the wettest year.

    If you use the BOM graph and look a tthe 10 year moving average, it’s pretty obvious there’s a drop in rain fall in the 70s.

  129. SteveF (Comment#38780) March 21st, 2010 at 10:38 am

    Any model that is an arguably accurate representation of the system will reasonably match the measured variability at all time scales, as well as match the long term trend.

    Accepting this assertion for the moment, can we extend it to spatial scales as well, i.e., should we also require that any given model match the variability at all spatial scales as well as at the global mean scale?

  130. Carrick (Comment#38801) March 21st, 2010 at 5:55 pm

    For a long-enough time window, the secular temperature trend will always overwhelm any temporal fluctuation associated with climate “noise”.

    So if what you are interested in is just the temperature trend, and not modeling short-period climate fluctuations, then “just waiting long enough” works.

    Are we allowed to average right through glaciations and deglaciations while we’re waiting for the true trend to stand out above the noise?

    Oliver

    P.S.: Sorry to break in on the nice exchange you guys have going.

  131. Sometimes an image helps …

    Looks like SW Australia definitely has got a downward trend. Only thing is, it started around 1920.

    On the other hand, SE Australia was around equally dry for the first half of last century. Hard to see a trend there.

  132. BTW, Carrick, David Gould and Nathan, there is a series of posts at David Stockwell’s about the SW Australian drought situation. David and the author of the paper linking the SW Australia drought to increased precipitation in E Antarctica had a lengthy and cordial debate which you might like to follow through. The first post (which has links to the remainder in the series) is here.

  133. oliver (Comment#38872),
    “should we also require that any given model match the variability at all spatial scales as well as at the global mean scale?”
    .
    Sure. But it may be difficult to come up with measurements of sufficient quality (low enough uncertainty) to do meaningful tests at different combinations of spacial and time scales. For the satellite period (1978 onward), there certainly is the needed temperature data available, but ~32 years is probably not enough time to evaluate variability on anything other than short time scales.
    .
    If a model is a fair representation of physical reality, then it should make good predictions and good hind-casts at all spacial scales (down to the grid scale of the model), all time scales, and all altitudes. This requirement doesn’t seem extreme at all, since modelers claim that their models are based on fundamental physical laws and measurement based “parameters”, and that they are a fair representation of reality, capable of making accurate predictions of future warming. The substantial divergence in model predictions shows that most claims of model concordance with reality are obviously false. I sure would like to hear climate modelers admit this irrefutable fact, and I sure would like to stop hearing that the science is settled.

  134. oliver:

    Are we allowed to average right through glaciations and deglaciations while we’re waiting for the true trend to stand out above the noise?

    I did specify constant forcing. If you go too long, then variations in long-term solar forcings will doubtlessly need to be modeled.

  135. Also there are probably long-term biosphere related changes, salinity driven oscillations and all sorts of other things. What is useful is that proxies seem to suggest we have a “frequency gap” between these longer term mixings and the shorter-term AO oscillations. These “gaps” are not uncommon when you are “switching” domain region from e.g., coupled AO oscillations to e.g. biosphere driven fluctuations. I probably don’t need to mention that the biospheric and any other really long term fluctuations should be considered highly speculative at this point.

    But anyway, the issue is eventually the models have to do something right if they are going to do anything beyond just confirm back of the envelope calculations + “hunch factors” for environmental CO2 sensitivity + water vapor feedback + other associated feedbacks.

  136. Nathan:

    I don’t see how showing a trend starting from the wettest year is useful. Were discussing where the long term trend changed, not what it is from the wettest year.

    You use what objective statistics measures tell you is the relevant year the trend started, and this is circa 1920, not what your eyeball + faith tell you should be 1980.

    The relevant metric is the third column (chi-square)

    1980-2010 -1.62 2.5026042
    1970-2010 -0.9829 2.8408512
    1960-2010 -1.8847 4.8739593
    1950-2010 -1.4525429 5.9633914
    1940-2010 -1.3382143 6.1158867
    1930-2010 -1.3265595 6.1187392
    1920-2010 -1.37755 6.2097402
    1910-2010 -0.6905697 32.166547
    1900-2010 -0.70096364 32.17546

    Note the sudden “hop” when you extend the period to the 1910-20 decade.

    What this is telling you is most of the variability observed since 1920 can be explained by a linear trend of -1.3 mm/year decline in rain fall. But again, the graph tells you that visually, but it’s a lot harder to dispute objective criteria.

    It’s not our problem if you lack a rudimentary understanding of statistics, and it’s definitely your problem if you are trying to claim no trend where a trend clearly exists.

  137. SteveF (Comment#38879) March 22nd, 2010 at 6:55 am

    oliver (Comment#38872),
“should we also require that any given model match the variability at all spatial scales as well as at the global mean scale?”



    For the satellite period (1978 onward), there certainly is the needed temperature data available, but ~32 years is probably not enough time to evaluate variability on anything other than short time scales.
.
If a model is a fair representation of physical reality, then it should make good predictions and good hind-casts at all spacial scales (down to the grid scale of the model), all time scales, and all altitudes.

    One should remember that models, almost by definition, tend to exclude some fraction of physical reality. A coupled model may completely exclude the atmospheric and ocean tides and yet fairly represent wind driven circulation and that might be okay. Your requirement may be a bit too stringent to be useful at separating the good from the bad.

    Additionally, if one is trying to make hindcasts, the initial conditions (and boundary conditions) are almost never known on anything close to the model grid resolution.

    But I agree, 32 years is awfully short for this sort of thing.

  138. Oliver:

    Additionally, if one is trying to make hindcasts, the initial conditions (and boundary conditions) are almost never known on anything close to the model grid resolution.

    Which is my point in comparing distributions.

    “Apples-to-apples.”

    Does the kind of climate generated by this model match the type of climate we actually observe? Notice that what Lucia is doing is probably the least-challenging test of the models that you can imagine.

    Comparing the correlation in rainfall patterns or temperature trend north to south Australia for example (or even in adjacent temperate zones) is a far more challenging test of the models than the global average.

  139. Carrick,

    Generally, it is true that precipitation increases with temperature. However, this ‘general rule’ varies regionally. In Canberra, for example, precipitation is negatively correlated with maximum daytime temperature. This is also true for, for example, a number of regions across the United States – there was a study that was done to attempt to improve the resolution of climate models.

    All models are wrong; some models are useful.

    This is a statement that I have seen on this blog in particular, and elsewhere, a number of times in relation to climate.

    If model is getting the general effects right but the timing of those effects wrong, then it is indeed wrong. But it also may be useful, in that it is giving us *some* correct information. (It is certainly more useful than a model that gets nothing correct. ;))

    As to climate models used to examine Australia being correlated across the globe, they may be – I have not examined the issue. However, I know that both CSIRO and the University of New South Wales have been working on models that look at Australia specifically. I do not know how well that they would translate to other regions of the globe – it would depend on how they were constructed.

    Re what I am looking at, I am looking at both precipitation (the ‘desert’ prediction) and available water (the runoff prediction). I am focusing on Canberra because that is where I live.

  140. David Gould:

    Generally, it is true that precipitation increases with temperature. However, this ‘general rule’ varies regionally. In Canberra, for example, precipitation is negatively correlated with maximum daytime temperature. This is also true for, for example, a number of regions across the United States – there was a study that was done to attempt to improve the resolution of climate models.

    I agree. I was speaking in generalities, of course. “All things being equal, which they rarely are”.

    If model is getting the general effects right but the timing of those effects wrong, then it is indeed wrong. But it also may be useful, in that it is giving us *some* correct information. (It is certainly more useful than a model that gets nothing correct. )

    In this case, the effect seems to have occurred before the cause, at least as far as the models go.

    In any case, in the circles I hang out in, a model that predicts an effect that isn’t synchronized to the cause is a big problem. Especially if the effect is a generic property of any model or data set (droughts are not confined to the late 20th century and early 21st… there is nothing to be learned from this in the absence of a model that can provide insight into why an effect is occurring at a particular point in time.).

    In regards to Canberra, the more localized you make your study, the less likely it is you are looking at climate rather than weather, unless you are willing to wait a bloody long time!

  141. Carrick,

    Well, the CSIRO suggest that we will have to wait another decade to know whether or not the decline in precipitation in Canberra is just another drought, albeit the worst one in our records, or whether there has been a shift in rainfall patterns in Canberra. So they certainly agree that more data is required for Canberra. They do not seem to agree that more data is required for the south of WA. However, that is a much larger area than tiny Canberra. 🙂

  142. Re the effect occurring before the cause, I do not see that. As an example, in Canberra the decline in rainfall has only occurred relatively recently – over the last 20 years. This does not seem to be out of whack with global increases in temperature. While there has been a decline in WA rainfall, the most dramatic decline has also been relatively recent.

  143. Carrick do the same test on yearly data.
    I bet you get a different result.

    People who have studied this in WA, including the BOM, give the year of change as 1976.

  144. David Gould:

    They do not seem to agree that more data is required for the south of WA. However, that is a much larger area than tiny Canberra

    It is easy to check, and I suspect it involves physics that they are unaware of.

    While there has been a decline in WA rainfall, the most dramatic decline has also been relatively recent

    According to what I posted for SWA, the decline has been at nearly a constant rate since 1920. You had an upwards “blip” in 1990-2000, but that doesn’t affect the overall assessment.

  145. Nathan:

    Carrick do the same test on yearly data.
    I bet you get a different result.

    Define “different”.

    There’s a reason for fitting to 10-year averages, as I mentioned above. Primarily , it removes effects of serial correlation in the data associated with atmospheric-ocean oscillations. If you fit to yearly data, it can still be done, you just have to be a bit more careful, but the answer won’t change significantly as long as you are appropriately cautious.

    People who have studied this in WA, including the BOM, give the year of change as 1976.

    Again, if you want to discuss “thoughts that other people put in my head”, I’d say that’s appropriate for a different thread.

    In this one, I’d prefer to stick to rational thought and rational analysis. What you say they claim at least isn’t borne out by the data

    If you want to point out an error or a different way of analyzing the data, that’s interesting. Blanket statements appealing to authority, not so much.

  146. One of the differences between earlier periods and more recent periods is the change in SD or, to put it another way, rainfall variability. For example, the last 40-year period had a standard deviation of around 77 mm. This is well below the SDs recorded for all 40-year periods prior to 1975. Graphing these 40-year SDs gives a rather dramatic picture, actually.

    http://evilreductionist.blogspot.com/2010/03/graph-of-standard-deviations-for.html

    (Yes, I have not put the years in – I will edit that later.)

  147. David Gould:

    One of the differences between earlier periods and more recent periods is the change in SD or, to put it another way, rainfall variability

    Yes I had noticed this too. Not sure what it implies. Of course part of that is simply a statement that mean rainfall drops (at lowest order you’d expect weather noise to scale with the magnitude of the quantity).

    Here is decadal average variance, normalized by the mean rainfall:

    1900-1910 0.16
    1910-1920 0.16
    1920-1930 0.15
    1930-1940 0.15
    1940-1950 0.16
    1950-1960 0.15
    1960-1970 0.14
    1970-1980 0.12
    1980-1990 0.12
    1990-2000 0.13
    2000-2010 0.13

    Simplistic arguments for effects of global warming would claim the variability should increase, not decrease. Even where that really true, I doubt it would apply here.

    My wild-a$$ed guess is part of the reduction in variability comes from the additional “water sink” from anthropogenic activity (that reduces the amount of water to be recycled into regional climate).

  148. Carrick,

    Re arguments around AGW, rainfall SD for Australia as a whole has certainly increased. Running 40-year sets, there is a sudden step change from between 57 to 60 mm to 84 mm to 95 mm between the 40-year period ending in 1972 and the 40-year period ending in 1974. You may be correct: it could simply be a function of the total rainfall.

  149. Carrick

    Look at the analysis that Tamino did for Victoria (Australia) rainfall.

    http://tamino.wordpress.com/2008/06/08/victoria-rainfall-fall-rain/

    What he did was take each 11 year average, not just the average of each decade. This is a much better way of observing the change in the trend.

    You can’t find a change in the trend by simply fitting a line of best fit to the data. That tells you nothing about the change. The biggest change in SW WA is the lack of high rainfall years, we haven’t had any high rainfall years since the 1960s. Something changed in the 70s.

    “What you say they claim at least isn’t borne out by the data”
    Well you certainly haven’t tested if there was a change in the 70s. Your test won’t show any changes in trend.

  150. By the way

    Had a huge storm in Perth yesterday… 68mm of rain in my neighbourhood. Massive hail too… Dented my car… 🙁

    Guess the Summer drought is over.

  151. Nathan:

    What he did was take each 11 year average, not just the average of each decade. This is a much better way of observing the change in the trend.

    Adding one year is “much better”?

    Why do you think 11 is a “magical” number, Nathan? What possible meaningful mechanism are you going to employ to demonstrate to us that adding one more year to the average magically changes everything?

    (It doesn’t change the outcome btw.)

    You can’t find a change in the trend by simply fitting a line of best fit to the data.

    That will be news to millions of people who do just this.

    Well you certainly haven’t tested if there was a change in the 70s. Your test won’t show any changes in trend.

    I “certainly” did test for that separately, I just didn’t show it. The trend is less from 1970- current than from 1920-1970, but not by a statistically significant amount.

    Glad to hear about the rain, but sorry to hear about the car.

  152. Carrick

    Why do you make this so hard? He did EACH 11 year average. Rather than just the first ten years, then the next ten years etc.

    How will you detect a change in trend by doing a line of best fit?

  153. “The trend is less from 1970- current than from 1920-1970, but not by a statistically significant amount. ”

    So try again, but this time using each ten year period rather than the average of each decade.

  154. Nathan:

    Why do you make this so hard? He did EACH 11 year average. Rather than just the first ten years, then the next ten years etc.

    You didn’t exactly explain it clearly. But whatever.

    Consecutive values are highly correlated due to the large amount of overlap… there is no new information provided by including those points. It is done that way for visual display purposes only.

    People sometimes will feed smoothed data into OLS codes, but it is just crackers to do so. There is simply no point to this.

  155. Carrick,

    What Nathan is saying is that using 10-year or 11-year *rolling* averages is better than picking 10 years, then the next 10 years, then the next 10 years and so on. I do not really know, but I generally do rolling averages, but there is no magic number of years – I often do 30, 40 or 50 year rolling averages if I have enough data.

    And he is correct that you cannot detect a change in trend by doing a line of best fit through the whole of the data, but I am guessing that you know that and just misunderstood, because obviously you do use lines of best fit for parts of the data and then compare them, as you have done.

  156. Nathan:

    So try again, but this time using each ten year period rather than the average of each decade.

    It would be an analysis error to do what you suggest.

  157. To clarify: I meant that I do not really know if one way is better than the other. I cannot see how it would make that much difference.

  158. David Gould:

    What Nathan is saying is that using 10-year or 11-year *rolling* averages is better than picking 10 years, then the next 10 years, then the next 10 years and so on. I do not really know, but I generally do rolling averages, but there is no magic number of years – I often do 30, 40 or 50 year rolling averages if I have enough data.

    This is mainly useful for visual representation of the data though. Ideally what you would do is use a brickwall filter to low-pass filter the data. then decimate the data according to Nyquist’s theorem. That’s what is idea for OLS not an “oversampled” version of the data.

    In fact, when I do ten year average part of what I’m trying to get away from is the correlation in the data from e.g. the ENSO (yes I’ve checked and that signal is present in the rain fall). It wouldn’t make a bit of sense to try and fit a smoothed version of the data… correlation in residuals is your enemy when you are trying to do OLS trend estimates.

  159. Carrick,

    But does it make that much of a difference? When I generate synthetic data, the trends do not appear to change all that much between using 10-year rolling averages and 10 year sets. However, the autocorrelation increases dramatically (obviously).

  160. Can you give an example of how it might be applied calculation-wise in looking at WA rainfall data?

  161. Here’s the rainfall rate fluctuation spectrum for SWA.

    Figure.

    Notice we get the ENSO and PDO (the 9-year) peaks.

    My decadal-average is just a crude method of rejecting the PDO peak. This could seriously be improved upon, but what Tamino did is no better than what I did (unless he’s using Hann-weighted overlapping averages or something slick like that).

  162. David:

    However, the autocorrelation increases dramatically (obviously).

    I worry about autocorrelation distorting your trend estimates, which is why I don’t include it. (If I do, I figure I have to go back and study the data and make sure it’s not “breaking” the analysis.)

    Beyond that, I’m lazy and the software tool I am using I wrote over 25 years ago (…, I’m getting old). It needs to be updated to include autocorrelation. But without addressing autocorrelation, I can’t use the chi-square test or similar methods on the data.

    The Nyquist Theorm or Nyquist Sampling Theorem, Shannon’s Theorem, etc basically states that if you have a band-limited signal (suppose for example you have smoothed data with no signal for frequencies above 1/10-years, read 1/”10 years”), then if you sample that signal at a rate twice the maximum frequency (in this case 2/10-years = 1/5-years) that you can reconstruct the entire signal at any time t from this discrete set of points using the sampling-theorem interpolating formula.

    The gist of this is “over-sampling a band-pass signal data, for example if your signal is band-pass limited to 1/10-years, the having annual values for that signal gives no more information than had you sampled it at 1/5-years.

    Hope that helps, and sorry about the jargon.

  163. David, here is the rain fall rate fluctuation spectrum for the 10-year smoothed version of the same data set I showed above.

    figure.

    Note that energies with periods below roughly 20-years are drastically suppressed (in an ideal filter all signals with periods shorter than 20-years would be exactly zero. Nonetheless, we see that the sampling rate set by the Nyquist theorem is about 1/10-years, and probably 10-year averages is over kill (OTH adding the high frequency data back in won’t influence the trend estimate very much.)

    I get confused on this myself at times, which is where having a tool that can generate the spectrum helps.

    What I do is compute the “Welch spectral periodogram”, which involves using half-overlapping windows of data, then averaging the square-of-amplitudes across multiple windows. Semiliterate explanation is here.

  164. Also, in a similar vein if you use a Hann-window that provides a superior performance to the unweighted average.

    #! /usr/local/bin/awk -f
    #
    # Computes Hann-average of a data set using a circular buffer.
    #
    BEGIN{
        nyears = 10;
        nbuf = 0;
    }
    {
        if (NR == 1) initWindow();
        t[nbuf] = $1;
        y[nbuf] = $2;
        if (NR >= nyears) printAverage();
        if (++nbuf >= nyears) nbuf = 0;
    }
    function initWindow()
    {
         for (n = 0; n < nyears; n++) {
    	 arg = 2*pi*(n+1/2)/nyears;
             win[n] = (1-cos(arg))/2;
    	 wsum += win[n];
         }
    }
    function printAverage()
    {
        tsum = ysum = 0;
        for (n = 0; n < nyears; n++) {
           tsum += t[n];
           ysum += y[n]*win[n];
        }
        print tsum/nyears, ysum/wsum;
    }
    
  165. Carrick

    So if you did this to the SW WA rainfall, what do you get?

    “I computed the rate of change of autumn rainfall (in mm/yr) for every 31-year time span in the data set, starting with 1900-1930 and ending with 1978-2008. I also computed error ranges (2 standard deviations, representing 95% confidence levels).”

    That’s what Tamino did for the Victorian Autumn rainfall.

  166. Carrick (Comment#38981) March 22nd, 2010 at 10:12 pm
    Consecutive values are highly correlated due to the large amount of overlap… there is no new information provided by including those points. It is done that way for visual display purposes only.

    Doesn’t keeping successive values in the moving average retain more information than doing the same thing and then decimating the output (equivalent to “the” discrete decadal averages)?

    Carrick (Comment#38989) March 22nd, 2010 at 10:36 pm
    My decadal-average is just a crude method of rejecting the PDO peak. This could seriously be improved upon, but what Tamino did is no better than what I did (unless he’s using Hann-weighted overlapping averages or something slick like that).

    I wouldn’t imagine that the choice of window is the fundamental problem in this analysis. Averaging over 10 years when the dominant period may be 12 years, however, is a serious problem. Isn’t this is the fundamental problem with trying to apply Fourier methods to very short timeseries relative to the timescale of the phenomenon of interest?

    Maybe I am just misreading the discussion. Clarification would be much appreciated!

  167. Oliver:

    Doesn’t keeping successive values in the moving average retain more information than doing the same thing and then decimating the output (equivalent to “the” discrete decadal averages)?

    Not in the sense of the sampling theorem. The decimated version (that is at the Nyquist sampling rate) contains all of the information available for a band-width limited signal.

    . Averaging over 10 years when the dominant period may be 12 years, however, is a serious problem

    Except I’ve looked at this all ready: The dominant periods for fluctuations are less than 10-years.

    Isn’t this is the fundamental problem with trying to apply Fourier methods to very short timeseries relative to the timescale of the phenomenon of interest?

    The question is, what are the timescales of the phenomenon of interest?

    From my point of view, we have a band of frequencies/periods associated with ocean/atmospheric fluctuations, and a lower-frequency (upper-period) bound on those fluctuations of maybe 55 years, depending on how you classify it.

    I agree there are a lot of other snakes in the briar besides these, like long-period biosphere-climate interactions, and the possibility of chaotic-like behavior like climate “hopping” between climate stability points. But I guess you have to start somewhere and even if problems are uncovered, these still teach us something.

  168. Nathan:

    (Tamino): “I computed the rate of change of autumn rainfall (in mm/yr) for every 31-year time span in the data set, starting with 1900-1930 and ending with 1978-2008. I also computed error ranges (2 standard deviations, representing 95% confidence levels).”

    Here’s the problem with this from my perspective:

    You drop the climate region to an area as small as Victoria, and simultaneously limiting yourself to a very narrow window in time, and you are seriously running the risk that any climate variations you are seeing are simply natural climate periods.

    Averaging over larger areas and longer-time scales than this ameliorates this problem because spatial and temporal averaging reduces the amplitude of the fluctuations associated with these natural fluctuations compared to the AGW signal you are looking for. What Tamino has done is arguably the worse thing you can do, because he has simply reduced the physical scale to the point where global climate models simply lack the temporal and spatial resolution to be able to resolve any circulation features (roughly you need the grid resolution to 10x finer than the features you’re trying to model). To model Victoria, it would appear you need a grid size around 75-km or so, so compare that to current state of the art resolutions of global climate models, which are around 250-km.

    You could argue that Victoria is part of some “larger pattern” but then there is no excuse for not looking at the “larger pattern” directly. But because the models can’t resolve Victoria-scale weather/climate really at all, they can’t tell you anything about the time scales you’d need to resolve AGW signals above the background of natural climate fluctuations.

  169. Carrick (Comment#39003) March 23rd, 2010 at 1:19 am

    Oliver:
    Doesn’t keeping successive values in the moving average retain more information than doing the same thing and then decimating the output (equivalent to “the” discrete decadal averages)?

    Not in the sense of the sampling theorem. The decimated version (that is at the Nyquist sampling rate) contains all of the information available for a band-width limited signal.

    Keeping the rolling average points gives some clue as to what’s going on in the record — look at the dip near 1960. If that point had been picked up by decimation, you might conclude that the decade was unusually dry, but when viewed in the context of surrounding “smoothed” points you can see that it is a notch — an artifact of the smoothing window length since you are picking up a pair of dips but no real trend or “hill.”

    In any case, the signal does not appear to be strictly bandlimited, either in looking the plot at the Australian Govt BOM or judging from the amplitude spectrum.

    Averaging over 10 years when the dominant period may be 12 years, however, is a serious problem
    Except I’ve looked at this all ready: The dominant periods for fluctuations are less than 10-years.

    Your spectral peak is broadly spread around 9 or 10. Does this mean that every quasi-oscillation was shorter than 10 years?

    Out of curiosity, how long was your periodogram subrecord length?

  170. Oliver:

    Keeping the rolling average points gives some clue as to what’s going on in the record — look at the dip near 1960.

    It is useful for humans for visual presentations,because we aren’t DFTs or OLSs.

    My only point is there isn’t any additional information in the over-sampled version of the temporal series (since you can reconstruct it to an arbitrary sampling rate from the Nyquist-limited series). Feeding an over-sampled time series into an OLS won’t give you improvement on your trend, but it isn’t guaranteed to not introduce bias to your result either. It can’t help, and without some care, it can hurt. That translates to there’s no reason not to use an oversampled version if you’re careful, but there’s no processing advantage to it.

    In any case, the signal does not appear to be strictly bandlimited, either in looking the plot at the Australian Govt BOM or judging from the amplitude spectrum.

    No signal ever really is of course, but if you’re referring to the “smoothed” version (low-pass filtered) of course that is.

    Your spectral peak is broadly spread around 9 or 10. Does this mean that every quasi-oscillation was shorter than 10 years?

    Those are the only ones I observed, even when I went to a single DFT over the full interval.

    In temperature data, I often see 22-year (full solar cycle) and a 55-year (some call it a PDO oscillation) peaks as well. These are also observed in careful analyses of really long term proxy records. (See e.g. this very interesting figure.)

    Out of curiosity, how long was your periodogram subrecord length?

    I tried a range of numbers. I settled on 30 years (I think). Each window of the data was linearly-detrended before it was processed, and I used a Welch window function if you’re curious.

    These are “amplitude” spectrum BTW, meaning they are normalized so that if you took the transform of

    y = a cos(2*pi*f*t),

    you’d get “a” for the answer at frequency “f”.

  171. I also zero-padded the time windows like a crazy-man. Likely maybe by a 16x factor.

  172. Carrick (Comment#39081) March 23rd, 2010 at 3:47 pm

    Oliver:
    In any case, the signal does not appear to be strictly bandlimited, either in looking the plot at the Australian Govt BOM or judging from the amplitude spectrum.

    No signal ever really is of course, but if you’re referring to the “smoothed” version (low-pass filtered) of course that is.

    Of course I was referring to the unsmoothed one, which is quite band-unlimited, as opposed to the low-pass filtered version which is only somewhat band-unlimited!

    Your spectral peak is broadly spread around 9 or 10. Does this mean that every quasi-oscillation was shorter than 10 years?

    Those are the only ones I observed, even when I went to a single DFT over the full interval.

    In temperature data, I often see 22-year (full solar cycle) and a 55-year (some call it a PDO oscillation) peaks as well. These are also observed in careful analyses of really long term proxy records. (See e.g. this very interesting figure.)

    This is interesting; while tinkering around myself I saw several peaks when computing a power spectrum over the full interval: at least one in the 10–15 year band and one in the 20–25 year band. A MTM method confirmed at least these two peaks (as well as other “peaks” around 17 and 30 years).

    Out of curiosity, how long was your periodogram subrecord length?

    I tried a range of numbers. I settled on 30 years (I think). Each window of the data was linearly-detrended before it was processed, and I used a Welch window function if you’re curious.

    Thanks.

    The Welch periodogram is fairly sensitive to the choice of subrecord length when trying to tease out the content at > 10-year timescales. Compare 30, 40 and 60-year windows; the “long tail” of the spectrum looks quite different! I typically do see the ~20 year peak, though.

    This is interesting, though. I was looking at the MEI (Multivariate ENSO Index) and did not feel that the rainfall oscillation ~10 years really lined up well with either ENSO positive or negative. I have my reading cut out for me!

  173. oliver:

    Of course I was referring to the unsmoothed one, which is quite band-unlimited, as opposed to the low-pass filtered version which is only somewhat band-unlimited!

    This is of course true. If you have an estimate of the amplitude of the longer-period components, generally the impact of the fluctuating part of the signal can be assessed, even if you don’t know it’s local phase. So just having longer-period signals isn’t necessarily a killer, it just places a limit on the detectability threshold for trend estimation that is independent of shorter-period fluctuations.

    This is interesting; while tinkering around myself I saw several peaks when computing a power spectrum over the full interval: at least one in the 10–15 year band and one in the 20–25 year band. A MTM method confirmed at least these two peaks (as well as other “peaks” around 17 and 30 years).

    Do you remember what kind of data this is?

    This is interesting, though. I was looking at the MEI (Multivariate ENSO Index) and did not feel that the rainfall oscillation ~10 years really lined up well with either ENSO positive or negative. I have my reading cut out for me!

    I’m not surprised to see it here. Regional scale data is essentially “tuned” to find an AO oscillation component, just the global component in some sense is minimizing it’s importance.

    (Think of the example where you had an oscillation that was large + in the northern hemisphere and large – in the southern… even though the impact climate is large, the net impact on the global metric could be much smaller.)

  174. Oliver, since you seemed to want it, I generated the MTM PSD for SWA.

    Figure.

    While it’s true there are peaks above 10-year periods, at least for this data set their amplitudes probably are weak enough we can consider the data “band-pass” limited below a 10-year period (notably the 1/2 SS and full SS show up, if these translate into changes in clouds & cloud albedo …). While I recognize that is an approximation, IMHO, it seems a good one in view of strength of the other climatic fluctuations in this data set.

    There are “odd” peaks at 17 and 35 years as well.

  175. Carrick,
    Most curious. Your MTM PSD looks quite different from mine in relative magnitude of peaks (although most peaks show up at the same frequencies).

Comments are closed.