Year End Trend Comparison: Individual model runs 2001-2008.

Are any of you wondering how the surface temperature trends from individual model runs compared to global mean surface temperature trends?

As many know, I’d started comparing IPCC projections to observations way back is… February? March? Early on, I compared data to the “about 2C/century” IPCC projection discussed in the AR4. Some criticized this test because it did not involve examination of the individual model runs underlying the projections.

Over the course of the year, model runs became available at “The Climate Explorer”. Eager beaver that I am, I obtained the individual model runs that continue with SRES A1B scenarios. I even pestered poor Geert Jan at The Climate Explorer because a few runs were unavailable. He not only uploaded those at my request, but even went the extra mile and linked the SRES A1B projections to their individual forcing files.

I manually downloaded files as they became available, wrote a script to turn these into anomalies based on Jan 1980-Dec 1999 (the IPCC baseline for comparison), computed the least square trends, and found the uncertainty intervals using the method discussed in section 4.1 of Santer et al. 2008 (aka “Santer17”).

Those of you familiar with that paper will note the resemblance between Santer17 figure 3 and the following figure:

Figure 1: Comparison of simulated and observed trends in GMST.  Model results in top panel are the results from 57 individual realizations of experiments with SRES A1B external forcings performed using 26 different A/OGCMs.  Observational estimates shown as trend 1 and 2 are land-ocean values from  GISTemp and HadCrut2 respectively. The whiskers indicate 2 sigma confidence intervals adjusted for temporal autocorrelation.  The top panel is based on monthly data from Jan 2001-Dec 1008; the lower panel is based on monthly data from Jan 1980-Dec.2008.Click for larger.
Figure 1: Year End Trend Comparison

What does the top panel in Figure 1 suggest?

The discrete symbols in Figure 1 above show the mean trends over time based on observation and model simulations; the whiskers indicate two standard errors based on the assumption that all temperature deviation from a linear trend with time are well described using AR1 noise. That assumption is likely flawed, but it’s possible to argue that the model may either under or overestimate the uncertainty in the statistical trend, but that explanation will be deferred for now.

To the far left of both panels, the two dark symbols correspond to HadCrut and GISTemp. The group of open symbols are models that included volcanic aerosols as external forcings; the dark symbols to the right are models that did not include volcanic aerosols as forcings.

For those who don’t know how to interpret the graph at all. First: the graph is mostly visual aid. You can call the obvious agreements and disagreements, but close calls are impossible to diagnose visually. As a general rule:

  1. Any trend treated as a specific estimate that lies outside the edge of the whiskers for the data representing 2 standard errors in the mean trend is inconsistent with the observations at a confidence of approximately 95%. I say approximately exact because the actual t-test requires specifying the number of degrees of freedom. The correct multiple corresponding to the 95% uncertainty intervals can be a little less than 2, or it can be larger. (For this data, it’s close to 2.)
  2. If the edges of the 2-sigma whiskers don’t over lap, the two means are inconsistent to the 95% confidence. Caveats apply.
  3. If the 2-sigma whiskers overlap a little, the two means may still be inconsistent with each other.
  4. If the 2-sigma bars overlap overlap a lot, the two means are not inconsistent with each other.

Questions and Answers for 2001-2008 assuming that the test in Santer17 applies:

  1. Is the estimate of 2C/century treated as a point estimate consistent with either Hadley or GISS during this period? No. This value, mentioned in the “about 2C/century” statement in the AR4 falls outside the range consistent with either Hadley or GISSTemp.
  2. How many model runs have trends below the Hadley least squares trend? 3 out of 57; (i.e. 5.2%).
  3. How many model runs have trends below the GISS least squares trend? 5 out of 57; (i.e. 8.8%. A perfect score would be 50%.)
  4. How should we think about the 5.2% and 8.8 values above? First, let us suppose the all model realization represents scatter due to “weather!” only, with no variability due to model biases. How often would we expect to see this sort of “weather” excursion? To figure that out, multiply the values by two to get 10.2% and 17.6% to estimate this value. If Hadley is the more correct metric, we’d expect this sort of excursion 10.2% or the time. As 10% corresponds to the IPCC’s break point bewteen “very low confidence” and “low confidence”, we might conclude that our confidence that models are correct should be rather low.
  5. How many rejections do we get if we compare the simulation trend to the Hadley trend at a significance level of 5%? 35 out of 57; (i.e. 61%. Theoretically, if the statistical model applies we expect 5% rejections. )
  6. How many rejections do we get if we compare the simulation trend to the Hadley trend at a significance level of 5%? 16 out of 57; (i.e. 28%. Theoretically, if the statistical model applies we expect 5%. )

Reflections on Q&A’s so far

How should we think about the fact that only 5.2% and 8.8% have trends lower than the two observed trends discussed above?

First, let us suppose the each model realization are statistically independent from all other realizations and the variation around the mean represents scatter due to “weather!” only. That is: none of the variability due to model biases relative to the mean. Suppose further the models, if run an infinite number of times, will on average match in infinite number of realization of earth weather and that spread really represents the spread of earth weather.

How often would we expect to see this sort of “weather” excursion? To figure that out, multiply the values by two to get 10.2% and 17.6% to estimate this value. If Hadley is the more correct metric, we’d expect this sort of excursion 10.2% of the time. Yet, this happened in the single realization with the start point selected in January after the SRES A1B scenario was made available on the web.

Recall that on page 22 of the technical summary of the WG1 report to the AR1, 10% corresponds to the IPCC’s break point between “very low confidence” and “low confidence”, and 20% is the break point between “medium confidence” and “low confidence”.

So, borrowing the IPCC terminology, if we think the spread in trends over all realizations is entirely due to “weather noise” with no contribution due to the biases in individual models, we might very well say that we have “low confidence” that the model projections agree with the observations.

Of course, having low confidence the models agree with the observation of weather that actually occurred on earth this tells us nothing about why the models seem a bit off. Zeke and I have been speculating in comments on another thread. It could mean any number of things.

Moving on two the third and fourth bullets: How are we to interpret the fact that 61% and 28% rejections of individual trends based on the paired t-test? Bear in mind, by rejection, we are saying that we think there is a less than 1 in 20 chances that specific model realization (i.e. run) comes from a model whose mean outcome matches that the mean corresponding with the earth weather.

First, bear in mind that this test already takes into account the uncertainty in our ability to detect the “underlying mean” or “deterministic trend” from a time series of data. We aren’t simply comparing whether the specific measured trend from Hadley matches the specific trend from a model: we are considering the size of those big error bars in figure 1.

The way the statistical test is constructed, if we test at a significance of 5%, then we should expect to reject model runs as inconsistent with data when the truth is the runs are consistent with the data.

However, instead, we are rejecting 61% of the time when compared with Hadley and 28% of the time when compared with GISS.

This cannot be the outcome modelers would hope for.

That said, the results of the 57 pair t-tests comparing models to an individual observation like Hadley are not uncorrelated. In the first place, the every one of the tests involves the one realization on earth. So, if the earth has had a cold snap we would expect an excess number of rejections on the high side with correspondingly fewer on the low side. Still, … 61% and 28% rejections, all on the high side, a rather high rates of rejection. To think the models projections are consistent with the data, we must believe the earth experience quite a rather serious excursion of “weather noise”.

Second, if you examine the figure, it should be obvious that many of the rejections come from runs corresponding to specific individual models.

This is because, notwithstanding snarky suggestions elsewhere, the spread across models is not entirely due to “weather!”. A sizeable fraction is due to differences across models. You may, for example, notice that all the NCAR CCSM models have large trends in the 1980-2008 panel. That model happened to have 7 runs, and is high, so it contributes disproportionately to the number of realizations rejected.

All in all, I think few would suggest the individual models realizations of trends mostly agree with the trend observed from 2001 to 2008. The observed and simulated trends mostly disagree; the simulated trends tend to be high side.

Or at least, things look bad if present the sorts of graphs and analysis discussed in section 5.1 in Santer17.

Upcoming

I’m sure you’ve noticed the lower panel starting in 1980?

I picked that as a start date because the IPCC AR4 uses Jan 1980-Dec 1999 as their baseline. Projections are communicated relative to the average temperature during that period. I’ll be doing the same Q&A for the comparison of simulations and observations for that period in an upcoming post. I’ll also be discussing what we get if apply the test described in Santer17 section 5.2.2.

56 thoughts on “Year End Trend Comparison: Individual model runs 2001-2008.”

  1. Lucia,

    The whiskers on some of the model runs are quite rediculous. Does that mean the models have failed to produce anything resembling real world weather? If so does it make sense to include them in the validation?

  2. Good stuff lucia.

    If someone could have done this at every prediction point going back to the early 1980s, it would have shown exactly the same thing each time.

    Temps are just not responding the way the models have predicted.

    In Hansen’s latest presentation linked to on the other thread, the explanation of this discrepancy comes down to two points – either they are misunderestimating (to quote a Bush-ism) the aerosols impact or the oceans are absorbing more of the forcings than predicted. He then goes on to show how long the oceans will continue to absorb the surface temps and it is a shocking 1500 years. The ocean absorption factor gets us to 60% equilibrium in 100 years but it takes another (over) 1,000 years to get to the other 40%.

    The models have accounted for this discrepancy by adjusting the Temp C change expected per Watt/m2 impact. Hansen quotes the equilibrium Temp C impact at 0.75C per W/m2 but the models are currently using 0.3C to 0.4C per W/m2.

    Above you noted the forcings estimate for each model is available. I’d like to know what the average forcing estimate for the models is in Watts/m2. ModelE is up to 1.92 W/m2 from 1880 to 2003 – do you have a comparable figure for the other models? or just 2001 to 2008?

  3. Raven–
    The notes by some of the modeling groups admit flat out they don’t reproduce the statistical properties of earth weather. If you examine the time series, some of these are amazingly odd. I nicked named one “planet alternating current”.

  4. Bill– Loads of information about temperature output, forcings etc. are available at pcmdi. I just haven’t tried to dredge it all up. So, I don’t know the answer to your question.

  5. Lucia, could you better describe the results of your analysis?
    “All in all, I think few would suggest the individual models realizations of trends mostly agree with the trend observed from 2001 to 2008. The observed and simulated trends mostly disagree; the simulated trends tend to be high side.”
    What does this actually mean? The models are wrong? The models don’t reflect weather very well? What do you conclude from this?

  6. Bill Illis– It occurred to me that the TAR and SAR might come out ok. They both project about 1.5C/century as a nominal rate. Depending on more specifics of with start dates selected as most appropriate to check the projections, those would probably come out ok. The actual trend since 1980 is about 1.6 C/century. That’s lower than the FAR models hindcast for that period, but it’s pretty close to 1.5 C/century in the TAR and SAR.

    The FAR was way high.

  7. Nathan– I’m going to be writing more posts with additional information. But based on what’s discussed here only large number of trends in individual runs are inconsistent with the observations at the 95% level.

    I’m not sure I understand what you are asking? What do you mean by “models are wrong”? The trends predicted by models do not appear to match the range consistent with the trend observed.

  8. I’m asking what this actually means:
    “The trends predicted by models do not appear to match the range consistent with the trend observed.”
    Is this important? What conclusions should be drawn from this? Where does this lead?

  9. Lucia,

    Planet Alternating Current certainly illustrates why calculating confidence intervals with ensembles of runs from different physical models is a dubious practice that tells us nothing about the predictive value of the climate models.

    Have you seen any information on the IPCC criteria used to decide which models are ‘good enough’ to include in the ensemble? Can anyone with a model show up or are their specific benchmarks that needed to be met? If the IPCC uses benchmarks to exclude certain models then no one could argue against developing a different set of benchmarks that would be used to select the models to use when comparing models to predictions.

  10. Lucia,

    Might I suggest that you look at some contour plots from individual days
    from several models and compare them with real weather at the same time. I think you will be shocked at the differences in the placement and magnitude of the jet stream and the equatorial weather.

    Jerry

  11. Lucia, I am a little puzzled at some of the values you have in the 1980-2008 trend plot. Specifically, giss-eh has 7 ensemble members (two of which seem to be exact duplicates), even though there are only 3 ensemble members that have the sres-a1b continuations in climate explorer (at least when I looked just now). The OLS trends for those three runs are 0.17, 0.22 and 0.11 deg C/dec – not the uniformly larger than 0.2 deg C/dec values you show. Possibly there is some weirdness in how the 20C3M and SRESA1B runs were spliced? Checking the individual time series might reveal something odd.

    Gavin– You may need an eye exam:

    -L

  12. Nathan:
    This: ““The trends predicted by models do not appear to match the range consistent with the trend observed.”

    means 1) the models predicted the earth’s trend. 2) we observed the earth’s temperature trend 3) if we account for the uncertainty due to “weather noise” using the method suggested in Santer 17, way, way, way to many of the model trends are different from the earth’s trend to permit us to accept the idea that the models, in general, correctly predict the earth’s trend.

    The disagreement tells us nothing about its cause.

  13. Gavin–
    By the way, according to table 10.4 in the chapter 10 of the IPCC WG1 document, GISS submitted 5 20th century runs and 4 SRES a1b’s for the AR1.

    I’ve pestered Geert when the number at The Climate Explorer are fewer than in the IPCC document, and he has always promptly updated from whatever is available at PCMDI. I think he sometimes has fewer than exist because the agencies hadn’t uploaded all their runs the previous time he updated to add new runs. Because he does this at 10% time, he’s not checking daily.

    Are the additional 2 twentieth century runs and 2 SRES A1B runs at PCMDI? I’d love to have them.

    BTW: Thanks for inadvertently confirming that my trends match yours. 🙂

  14. Gavin–
    I’m avoiding color because I’m doing this as a hobby and I don’t want to rack up page charges. Even before you made that error I was thinking I need to add numbers to the legend. If you look carefully, you’ll see black circles and black squares are duplicated. This software package has nice features that lets me plot from the output of my script. Unfortunately, there are a limited number of symbols available.

    GISS EH is one of the models that using various tests I happen to have run doesn’t look bad.

  15. Jerry-
    I haven’t downloaded gridded data. So, I can’t look at contour plots based on data I have on my mac. Anyway, my plan is to wrap up a discussion of the ability to project, and then consider other suggestions.

    But if you have comparisons of countour plots, I’d love to see them!

  16. I have yet to fully understand the fundamental theoretical basis of the Ensemble-Average approach used by the IPCC. So this is very likely a stupid question.

    If an Ensemble-Average approach is necessary, on what basis can individual results be used for anything?

    If individual results can be used in meaningful ways why is the Ensemble-Average approach necessary?

    Can anyone give me an analogy?

    Thanks

  17. Dan–
    As far as I can determine the reason given in the AR4 is this:

    They average over models because they think this will cancel biases in individual models. That’s pretty much it.

    So, presumably, they recognize that, as models have different sensitivities, time constants etc, at least some are biased relative to “truth” which is whatever actually applies to the earth.

    Then, they figure that if there are ranges of values of parameter choices, the outcome will be that some models will pick parameters that end up with senstivity a bit high; some a bit low. Similar things will happen for other features. However, one might hope that if a large number of choices will be made, the mean result will be unbiased.

    It’s a hope.

    To some extent, I have to admit I’d do the exact same thing prospectively. What else would you do if you were trying to make projections at all? (The alternative is to say we can’t make any.)

    But, by the same token, I think it’s important to compare the projection to reality and see how things panned out. Right now, things ain’t panning out so good. The average projected trend looks high for whatever reason.

    JohnV has suggested that, if solar forcing is a particular level, the reason is that the models didn’t account for this, and this is the cause of the mis-match. If so, this would suggest that, in future, modelers should at least attempt to include the solar variability in models. After all, no matter what they want to say, mis-matches over 8 years will be noted. And as you see, I have trends for 28 years illustrated above. You can probably guess what I will be saying . . .

  18. Lucia – Perhaps this is a silly question, but it has been a long time since I’ve done such an analysis. For the situation where the 2 sigma whiskers overlap just a little,isn’t the probability that the means are not consistent greater than 95%? For example, if the observed mean is, let’s say, 0.0 and the upper two-sigma value for the observed mean is 0.2, and a model mean is 0.32 with a lower 2 sigma value of 0.19, it seems that the chance that the means are consistent is less than 5%. Is this what the paired T-test addresses? If you could help refresh my memory of such things I would appreciate it.

    Overall, this seems like the type of analysis that should have been done at some point in time already since it clearly shows that some models are doing an ok job (e.g. GISS) of emulating what has happened and others (e.g., NCAR) clearly are not. Further it would suggest to me that only those models which at least have overlap between the 2-sigma values of model run and observations for the 1980-2008 period should be used in futher model ensembles (this seems like a very lenient test and a minimum basic requirement to me).

  19. BobNorth–

    For the situation where the 2 sigma whiskers overlap just a little,isn’t the probability that the means are not consistent greater than 95%?

    If the whiskers just touch we reject the hypothesis of equal means, and would do so at a confidence level greater than 95%. So, if I’m undestanding what you said correctly, you are getting this right.

    You seem to be describing what I say here

    If the 2-sigma whiskers overlap a little, the two means may still be inconsistent with each other.

    If the standard errors for the two cases was just equal the 1.4*sigma error bars touching is very close to the boundary for 95% confidence in most cases. (The precise value depends on the number of degrees of freedom.) But, you can’t just use 1.4 as a rule either because the correct distance is roughly 2 time the square root of the sum of the squares of the standard errors.

    In the limit that case A has almost uncertainty compared to the other case B, you pretty much just see if that best estimate for the case with the smaller uncertainty (case A) falls inside the 2sigma of the one with larger uncertainty (case B). If the two have equal uncertainties, you can try to estimate where the 1.4 sigma error bars are and use that.

    So, it’s really difficult to tell whether some intermediate cases are inconsitent based on the graph. But if you look, many cases on that graph are obvious.

  20. Bob North–

    Overall, this seems like the type of analysis that should have been done at some point in time already since it clearly shows that some models are doing an ok job (e.g. GISS) of emulating what has happened and others (e.g., NCAR) clearly are not.

    I would not disagree on this. That said, I suspect others will. If so, we’ll get to read their reasons. 🙂

    I would actually go further. At some point, I’m going to be discussing the issue of the cases that included volcanic treatment vs. those that did not. It will turn out that those that did use volcanic treatment, are, on average off, while those that did not will not be rejected for the 1980-2008 trend.

    This is a bit odd as it means that the models that failed to include the known cooling due to volcanos during the early part of the record, got the trend more right. But did they do so for the right reason? Or would their trends be equally high if they had included the known cooling during the early portions.

    I think on another thread, I told Zeke that I will eventually be creating a projection where I filter this way:
    1) Exclude models that are rejected based on trend mis-match from1980-now using the next test I will be discussing. (A multi-model mean.)
    2) Exclude models that didn’t include volcanic forcing during, and so may have decent matches during that period for entirely wrong reasons.
    3) Exclude models with only one run. (Or maybe, exclude those models that had only one run and that run rejected. I have to think about that.)

    I haven’t done this yet. But it’s worth seeing what this screen does.

  21. re: Dan Hughs

    I find the ensemble concept confusing as well. I don’t understand how inferences drawn from an ensemble – of individual runs using different assumptions and parameterizations (therefore different physics) – can be valid. Shouldn’t the comparisons with observation be limited to each individual model run?

  22. Layman,

    Lucia explained it nicely in Comment #9231. There is a notion of an overall “function space” of models which all cluster in some way around “realistic” but with independent errors.

    One possible objection is that the individual errors may not really be independent.

  23. Layman lurker–
    I did compare to individual runs. However, there is no reason not to compare to ensembles of runs from an individual model. There is information in that comparison. In fact, if we had a zillion billion model runs from an individual model, I’d be thrilled with that. That way, we’d know the mean trend from a model quite precisely and the only uncertainty would be due to the real earth weather noise.

    (In fact, when I started comparing, I was treating the ” 2C/century” in “about 2” as a multi-model average of a zillion billions models. I did that because the AR4 itself provides sketchy information– and I did not yet know where to get the model runs.)

    However, when comparing to an ensemble of models, you need to remember that the observation is only one realizations. So, you can’t treat the observed trends as a “point” trend. You need to account for the uncertainty in the observed trend.

    OMS–
    I’d be very surprised if individual errors in models were independent. As a practical matter, it’s almost inconceivable.

  24. Jack–
    Good question. But too whom? One could say that since models only make projections in probability, then one can’t falsify since anything could happen if we accept he possibility that 1 in a billioins jillion is still not impossible.

    I prefer to look at whether models can do specific tasks and talk about whether that seems true or false based on data. So, I try to test only: Did a specific set of projections match the thing the were intended to predict? In this case, the thing is the rate of change in the “underlying climate trend”, which is a sort of average of the weather. So, we can only detect it using weather measurements and by accounting for uncertainty in detecting the average.

    If the projections disagree with climate trends to 95% confidence level, I say that projection is false. Of course, by extension, if the models designed to create poor projections can’t do that, there is a problem with that aspect of the models. That is: for some reason, they don’t project well. (This reason may be poor choices in forcing files… but in which case, we need to learn to create better forcing files!)

    This is not to say they are useless– they may be able to tell us the north pole is cold, the equator is warm and explain things like why desserts exist etc.

  25. lucia (#9228),

    I have a friend at NCAR who saw some of the contour plots and was astonished at the discrepancies between the model output and reality
    ( not a surprise to me based on my experience with the models and my mathematical training).
    I think that once you look at the differences between the models and reality on a day to day time frame (instead of averages), it will be a real eye opener.

    Jerry

  26. Jerry,

    I don’t understand why you expect the day to day hind casts to match either the average or any given run since were are dealing with a chaotic system.

    IIt seems to me that the statistical properties of the daily weather variations are the more relevant metric.

  27. Lucia
    “The disagreement tells us nothing about its cause.”

    Thanks…
    Hmmm… So do you think it’s a problem with the weather noise of Santer17? Or is it that the CO2 sensitivity is too high? Can you express an opinion?

  28. Nathan–
    I don’t think it’s a problem with the assumption of AR1 in Santer 17 for reasons I have not yet explained. CO2 sensitivity too high is one of the possibilities but there are others.

    The fact is, one piece of evidence reduces the total number of possible theories, but it doesn’t entirely tell us something. If you read the discussion between Zeke and me, you’ll see he suggests the ocean mixing could be set too fast. There are many possibilities, and the models being off on GMST over a period of time doesn’t let us identify the precise thing that would be off.

  29. Interesting… Yes is a conundrum… Wonder what is going on… Does the same thing happen if youuse the yearly averages rather than the monthly?

  30. Lucia (Comment #9245): “[Models may] explain things like why desserts exist . . . .”

    That may be your best typo of all time. 🙂

  31. Raven (#9254),

    Look at the contour plots. The jets are in the wrong location. These are not minor weather variations, but serious model deficiencies. Not unexpected
    given the problems with too large of dissipation, ill posedness of the hydrostatic system, inaccurate parameterizations tuned to overcome too fast of enstrophy cascade, etc. Why do you think that they never show the
    contour plots of the horizontal fields and only spaghetti diagrams in time?
    Could it be that the game would be transparent even to the uneducated masses.

    Jerry

  32. Thanks Jerry,

    I agree that the inability to produce major atmospheric features such as the jet streams is evidence that the models have serious problems.

    I thought your last comment was a compliant that the models predicted that Jun 5 in NY would have a temp of 25 degC when it actually had a temp of 15 degC.

  33. Raven
    What Jerry means is that you will see discrepancies or even absurdities in instantaneous values while it might be that you wouldn’t see anything in time averaged values .
    To give an image – the model computes a 100° temperature in Greenland for day1 and – 100° for day2 .
    If you look only at 2 day averages (0°C) you will see nothing special and would say that everything is reasonable .
    Jerry’s point is that as the continuous equations are ill posed , the solutions have absurd properties that can only be seen on small time scales .
    Once you average they might SEEM>/b> to go away but an average of garbage is garbage even if it happens to look reasonable .
    .
    Regarding your previous post .
    You cannot say with the same breath that a system is chaotic and that there are statistical properties in it .
    These charcteristics are mutually excluding each other because a chaotic system has
    NO statistical properties .
    You can run any statistical tests you want on the Lorenz system (or any other known chaotic system) and all will fail without exception .
    So it is either chaotic or stochastical but not both .

  34. TomVonk#9281 – well, yes. I must admit I’m a little troubled by all this linear trending of climate data. Just a visual inspection of, say, Hadcrut says “a straight-line model is not appropriate”. To go from “there’s a warming trend of 2degC/century” to “global average temperatures will have risen by 2degC by 2100” is a complete absurdity. If it made sense at all we’d have to evacuate beach resorts every time the tide starting coming in.
    In my old stats textbook I recall a figure which showed a scatterplot which was clearly a quadratic function plus noise. The caption read, “The correlation coefficient should not be calculated”. Some statistics actually are meaningless. Or am I missing something?

  35. A general comment on weather as “chaotic” systems.

    I can definitely not claim much competence on the mathematical/formal aspects, but I suspect there are important differences in what different people mean when employing the term CHAOS.

    As far I have understood (it might indeed not be vary far :-)) the various understanding(s) of chaos can be assigned to two broad categories: 1) “Chatoic chaos” and 2) “Ordered chaos” (i.e. SOME degree of order)

    The little understanding I might have, is from reading two very different books way back when: Chaos by James Cleick, and “Order out of chaos” by Ilia Prigogine.

    Re 1: Apparently this interpretation of chaos assumes that chaotic systems in a fundamental way defies good(or useful) “real-time” or predictive descriptions.
    I wonder (even suspect) that the term *chaos* sometimes is being used as an ad hoc wildcard to “rescue” theories when real world data do not match the PREDICTIONS inferred from the theory.

    Re 2: This interpretation acknowledge the difficulties in description, but tries to explore the “quasi steady states” and other ordering properties in the system. This is the realm of phenomena like Lorenz attractors, “strange” attractors, etc.
    To me, this approach seems intuitively applicable to larger-scale phenomena like ocean currents, ENSO, NAO, AMO.

    Cassanders
    In Cod we trust

  36. Cassanders there are not really different understandings of chaotic systems .
    They are just ordinary systems described by a set of non linear first order ODEs exhibiting specific properties , non predictability (even in principle) being one of them .
    They have nothing mysterious or exotic and the mathematical tools are those of hamiltonian mechanics .
    An excellent introduction for those who are interested and have much spare time and that I’d suggest is http://www.amazon.com/Chaos-Nonlinear-Dynamics-Introduction-Scientists/dp/0198507232 .
    .
    Of course spatio temporal chaos like in diffusion processes or in fluid mechanics is a much more complex and as of today non resolved matter .
    But “simple” temporal chaos (most known example is the Lorenz system) is rather well explored .
    There are difficulties when the number of the space phase dimensions is large or quasi infinite and one enters in the ergodic theory which is mathematically a measure theory and as such rather hard .
    But that’s about it ?
    Btw your intuition is right – quasi periodic processes with seemingly “random” fluctuations like ENSO etc are often a signature of low dimensionnal temporal chaos .
    It is not a proof but the similitudes are indeed righht there .

  37. @TomVonk
    Thanks for the clarification. I guess expressing my thoughts as questions is more appropriate.

    Can I assume there is no true dichotomy here? I.e. Can a system previously being described as truly chaotic(sic) change to a (Low dimensional temporal chaos) if/when structuring (ordering) mechanisms has been discovered.

    Cassanders
    In Cod we trust

  38. I do not know what you call “truly” chaotic .
    A system is chaotic or isn’t . When it is , it is so truly and stays forever so .
    .
    The notion of “low” dimensionnal and “high” dimensionnal chaos is not a really rigorous distinction .
    When the dimension of the phase space is low (low number of independent degrees of freedom) it is only easier mathematically because the number of equations is low .
    When it is high (typically the classical N body system with N big) it is untractable because the computer power is not big enough to solve numerically .
    .
    I suspect that what you wanted to ask was if a system that has been thought STOCHASTICAL with some more or less exotical probability density function can be later recognised as chaotic .
    As I said above the answer is unknown for spatio-temporal chaos but my conjecture is yes .
    For purely temporal chaos the answer is yes but it is rather hard .
    The problem is that chaotic systems are EXTREMELY good at simulating randomness and trends .
    The key is in the time scales .
    If you look at a chaotic system on a wrong time scale you can see randomness or trends or both .
    Unfortunately when you don’t know the governing equations and have only empirical time series you can’t begin to guess what the right time scales are .
    There are rigorous mathematical methods that allow to make the difference but here the problem is the number of points in the time series .
    You can easily imagine that if a 20 or 60 year time scale played a role in a process then you’d need dozens of such pseudo periods what means hundreds of years of data .
    As far as climate is concerned we have and by FAR not the beginning of the amount of data we’d need to say something about its dynamical properties .
    Not mentionning the fact that climate is a spatio-temporal process which is notoriously more difficult than only temporal non space dependent processes .
    Typically the GMT is a variable about which one can’t say much because it is spatially averaged and the non linear dynamic theory deals only with real , local , time dependent variables .

  39. Thanks for effort Lucia, I was quite surprised to see GISS producing something reasonable, perhaps the underlying reason for Hansen reducing his decadal prophecy to 0.15C?
    However, the point of this lurker’s comment is to ask is 0.15C within expected natural variation?

  40. Clothcap–
    What is or is not within the bound of natural variation is a matter of some contention. 🙂

    However, if we accept that all residuals from the linear fit are due to “weather noise”, and that AR(1) describes the that noise, and we use a confidence level of 95% to test a claim that the trend is 0.15 C/decade based on data since Jan 2001, then according to Hadcrut, a mean trend of 0.15C/decade does not fit within the natural variability while according to GISTemp it does fit within the natural variablity.

    The 0.15C/decade does fit the mean trend since 1980 very, very well. Moreover, the observed least squares since 1980 is actually higher than 0.15C/decade.

  41. It seems to me that if the climate is truly chaotic, then the actual temperature series is just one realization of an infinite number of possibilities. Some of the possibilities would fit some models, but most probably would not. If the climate is chaotic the modeling is really a silly endeavor.

  42. Raven,

    To add to Tom’s clarification, I will cite a mathematical example that I posted on CA. It is possible to take any time dependent system (even one that does not describe the earth’s atmosphere) and add a forcing term to obtain any solution (not just a mean) that one wants. This is essentially what the modelers have done, i.e. tuned the forcings to obtain the mean they want to match historical trends. However, because of the counterexample on CA this does not mean that the system of equations is accurately describing the earth’s environment. In fact, that is why the modelers must retune the forcings when they reduce the mesh size or find a discrepancy with reality (such as a major volcanic influence). Thus one needs to look at the solution on a day to day basis to see if it is accurately describing reality (which it is not).

    Jerry

  43. My question on “weather” or climate chaos is whether the chaotic unpredictability is simply the distribution of heat energy around the system or does it also affect the actual whole planet radiation balance?

  44. Alan–
    My guess is probably both. The instantaneous distribution of temperature has to ultimately affect instantaneous heat loss.

  45. In which case, Lucia, does a decade long “failure to warm” without obvious cause correspond to a failure of the forcing model, a redistribution of heat into some unknown planetary heat store, or an explicable chaotic “weather” event that has increased outgoing radiation?

  46. My question on “weather” or climate chaos is whether the chaotic unpredictability is simply the distribution of heat energy around the system or does it also affect the actual whole planet radiation balance?

    Like Lucia said it is both .
    Let’s take the following example in the domain that has already been discussed – the energy dissipation .
    If we assume equilibrium (let’s use “equilibrium” in some spatially and temporally averaged sense) then the kinetic energy transferred by viscosity transforms in heat and the fraction is constant .
    Now if you assume that this process is chaotic , then it is per definition out of equilibrium and this fraction will vary .
    When it is less then the “equilibrium” value you will get a windy cool Earth and when it is more you will get a calm warm Earth .
    The chaotic nature of the system will make it vary unpredictibly between the 2 states .
    Of course this is only a simplified picture intended to show how it works .
    .
    So you see that both the energy distribution and the radiative properties will vary in such an example of a chaotic process .

  47. If I understand you properly, Tom, you are hypothesizing that the amount of insolation distributed between surface heat and wind kinetic energy varies chaotically.

    But can that detectably impact the radiation balance – ie the outgoing planetary radiation- over any substantial time span? Both the hotter surface and the energetic atmosphere are sources of energy that can be converted to outgoing radiation?

Comments are closed.