FGOALS GCM: Planet Alternating Current?!

As I previously mentioned, I’m just looking at global mean surface temperature “model-data” from the GCM’s that form the basis for the predictions/projections in the AR4. As we often read, each model has its strengths, each has its weaknesses, but somehow the average is “best”.

So, I can’t help wondering if the extremely weird “weather” from planet FGOALS is a “weakness”? Of do we expect the “weather noise” for the earth’s surface temperature to remind us of alternating current. Below, I have compared the 12 month lagging average of FGOALs temperature hindcast/prediction from 1980-2030 (SRES A1B after 2000):

Statistical features for 12 month averaged monthly data from 1980-2029:

  1. Average rate of increase: 1.7 C/century, 1.6 C/century, 1.6 C/century for runs 0, 1, 2 respectively. (Average= 1.6 C/century. Stdev=0.01 C/century.)
  2. Residuals for linear regression: 0.19C, 0.20 C, 0.19C for runs 0, 1, 2. (Real earth residuals: 0.13C. That is to say: simulated “weather noise” is about 50% for variable. )
  3. Qualitative description of weather noise: Corrupted sinusoid? Or in lay terms: It looks nothing like real earth weather noise. If real weather noise were that predictable, forecasting the next record in GMST would be easy!
  4. The “weather noise” in this model looks nothing like the other 4 models I have downloaded so far. I guess we’d say this is “not robust”?

Is there anything else to say about FGOALS?

There is at least two more things worth observing:

  1. Projections from this model are included in the AR4.
  2. So far, I have found no discussion of how this sort weird weather noise in models might affect our confidence in the unproven ability of models to predict the evolution of GMST.

73 thoughts on “FGOALS GCM: Planet Alternating Current?!”

  1. These are some of the actual models used by the IPCC to forcast global temps?

    Did anyone even bother to look at them?

    Did they take submissions from high-school earth science classes?

  2. lucia,
    It would be much appreciated if you could provide a little bit of background info (or a link) for each model as you write it up. I think I remember a summary of the models in AR4 — do you have that link? Thanks.

    PS: I’ve only had time to read lately. I hope to have time for the solar Monte Carlo analysis later this month.

  3. Clark–
    Obviously, people look at the output. The “average” for this case is included in Figure 10.5 in chapter 10 of the AR4. It’s distinctive features are difficult to see when hidden in the spaghetti, and also when someone just pulls out individual years. The “features” really only become apparent if we examine the correlogram (which looks really weird) or the monthly data. Smoothing by putting through an annual average filter really made that “AC” pattern pop out.

    It is features like these that make me believe it is more sensible to use observations of earth “weather noise” rather than “model weather noise” when performing hypothesis test.

    I am, however, downloading all the data and just looking at various features before running tests. (I’m looking eyeballing the correlogram, looking at the monthly data, seeing how the data compare etc.)

    I’ll be showing things as I eyeball the data! 🙂

  4. JohnV–
    I should add, I’m downloading the models with multiple runs for the SRES A1B first, and those with only 1 run after wards. I’d doing it this way so I get an idea of variation within models as I go along.

  5. lucia,

    The AR4 table is useful. I was also looking for basic model info such as the institute that developed the model, physics included in the model, etc. I found this page at PCMDI:

    http://www-pcmdi.llnl.gov/ipcc/model_documentation/ipcc_model_documentation.php
    (linked from http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php)

    Under “Known Biases and Improvements” for FGOALS_g1.0 (not FOALS), I found this document:

    http://www-pcmdi.llnl.gov/ipcc/model_documentation/more_info_iap_fgoals.pdf

    which includes the following quote (page 2, section 1(3)):

    As most coupled GCMs without any flux correction, the coupled GCM FGOALS_g1.0 suffers from prominent cold biases in the tropical Pacific or the so-called “Double ITCZ”. Because the simulated SST is 1℃colder than the observed in the central equatorial Pacific in the stand-alone OGCM, then the cold bias is amplified through air-sea coupling and thus 2-3℃ cold bias in SST and uplift of thermocline in the tropical Pacific can be found in the coupled GCM, which results in very strong and regular ENSO variability, e.g., the standard deviation of Nino3 index is about 2.1℃ (the observed value is 0.85℃).

    A newer version of the model (FGOALS-g1.1) attempts to improve this and other problems, but was not ready in time for AR4.

  6. JohnV–
    It looks like you know where to find the information you wanted, and you found it. 🙂

    And we learn precisely what I showed: This model, with the whopping amazing ENSO was used to form the basis of the AR4.

    Plus, your sleuthing reveals that the relatively poor ability to predict “weather noise” is recognized by the modelers and even posted.

    What one now must ask is this: Why would Gavin suggest we should replace the properties of observed “earth weather noise” with “model weather noise” –which is admitted as being unrealistic by the modelers themselves– when doing hypothesis test?

    Had this poor fidelity been a secret one might assume Gavin’s suggestion was made in good faith. We’d simply assume he believes the models correctly describe “weather noise”, or at least do it reasonably closely. But, now that we read the modelers are perfectly aware the “weather noise” is unrealistic, what is one to think? 🙂

  7. lucia,

    I didn’t think the information would be that easy to find. It was just a couple of clicks from where you downloaded the data.

    There are a few ways to estimate the real-world weather noise. Ideally we would have many years of temperature data that is not “contaminated” by volcanoes or other effects. Since we actually have very few such years, the weather noise must be estimated. You’ve estimated by looking at the most recent 7.5 years and by choosing a small number of older years that are not contaminated (using very strict criteria). I’ve estimated using a less-strict criteria for selecting years and found substantially more noise. Gavin has estimated using the models. They are all estimates — including yours.

    Questioning if a decision was made in “good faith” is not helpful.

    It’s true that some models over-estimate the weather noise. Apparently these are known limitations. It will be interesting to see if other models under-estimate the weather noise.

  8. JohnV–

    That page is not just a few clicks from where I downloaded the data. That page is at an entirely different site! I downloaded it from the climate explorer: http://climexp.knmi.nl , a site Gavin recommended. The climate explorer has the advantage of providing annual average data, and it’s all publiclly available without anyone getting permission. That means everyone can check and look at the data themselves.

    But, you seemed to have found the information you wanted. It would not have occurred to me to include that in my post, as my only motive is to show features of the GMST predicted by the models use in the AR4. The information you posted a) does not discuss GMST b) discusses problems the group intends to fix, but that fix did not affect the AR4.

    So, from my point of view, that information is interesting as an aside. In particular, it is interesting in the larger context surrounding why I am even looking at this model weather noise: Gavin suggested that using the real earth weather when hypothesis testing is bogus (as in not just another choice on might make.) He insists, in quite strong language, that one must use the weather noise from the models used in the AR4.

    In that context, the information you bring up shows that modelers themselves know the weather noise in at least some of those models is largely unrealistic.

    I think my readers will be interested in your comment and I value it. But your tone seems to suggest that somehow I should develop some sort of psychic power, anticipate your interests, and include information you specifically want– particularly as you believe it would be so easy for me to do so.

    On your suggestion that is not helpful to comment on possible bad faith:
    First recall that one of the main reasons I am looking at all thise “weather noise” is becuse Gavin insists that model weather noise must be used to hypothesis test the model trends. The language he uses for alternative choices is that they are “bogus”. So, he is not suggesting model weather noise is just one possible legitimate choice among others.

    Each of us must decide whether we believe this tacit knowlege is universally shared and/or a particular source of climate information give balanced or fair information. To that end, if you can easily show us that modelers are prefectly aware that model’s do not correctly reproduce “weather noise”, then one must wonder why Gavin insists it must be used.

    This is true whether some models have unrealistically high or unrealistically low weather noise. (The entire goal of this exercise is to figure that out. But some models happen to have weather noise that is not only too high, but also appears insane. Sophisticated statistical tests are not required to show they are just nuts.)

    On the volcano issue: I looked at the characteristics of earth weather during periods without volcanos at your insistence.

    As for your consistent characterization of my criteria for eliminating years with volcanos as “strict”: I didn’t use a “very strict” criteria for volcanic aerosolsnoise. I used the criteria used by modelers themselves– that is those who ran models to study the effects of volcano noise on weather.

    I neglected exactly the same small volcanos they neglected. In fact, my cutoff actually includes the smallest volcano Robock included in his paper and includes a volcano GISS Model E uses when running GCM runs. Obviously, the zillions of smaller volcanos that erupt every year are neglected by modelers and by me. We have no choice: data aren’t even complied and collected for those.

    My graph is explaining my cuttoff is shown below:

    clcik for results
    If you look carefully, you will see I included one stratospheric volcanic eruptions that is thought sufficiently important to include in the forcing file at NASA GISS!

    My reasoning is explained here.
    http://rankexploits.com/musings/2008/what-period-should-we-use-to-compare-uncertainty-bands/

    I told you before that I don’t precisely recall your cut-off criterion, nor your reasoning for selecting it. I am aware that, based on your choice you get a different answer. I told you this before, and you suggested that you thought you were sufficiently clear in comments before and I should hunt down the details you sprinkled in comments everywhere.

    So, given the scattered nature of your documentation of what you did, my respsonse to that is: Sure, if you include periods when the dustveil is varying– as occurss when they are dropping out of the sky after a volcano, then you will get more variabilty.

    That is precisely the effect I am trying to screen out. So, I don’t include them. That’s the reason I screen out years with measureable amounts of stratospheric aerosols. So, obviously, I’m not going to keep some in!

    You are welcome to adopt other choices, and describe your results. But the fact is, WordPress does not make it easy to search for comments by a particualr commenter, and even if it did, I’m not going to try to dive through your many comments to try to re-constitute what, precisely, you did. So, as it stands, I don’t now your cutt off your precise method, nor your precise results.

    If you want to explain what you did, in one coherent post, where someone (like me) can read it, and understand your choices, we might be able to discuss why I prefer one cut off and you prefer another one. But, as it stands, all I know is you did something different from what I did and you got different results!

  9. lucia,

    My mistake regarding the source of the data.

    I did not intend to convey the tone that you heard. Inferring tone from written comments is notoriously unreliable. I was merely curious about the “FOALS” models (actually FGOALS) and asked if you had any info at hand. When I dug a little myself, I found interesting documentation and shared it. That’s all.

    IIRC, Gavin’s comment about “bogus” weather noise was restricted to using the noise on a single 7.5-year period to estimate the noise on all 7.5-year periods. As you have said, and as I now understand, this procedure is valid if (and only if) the residuals are AR1 noise.

    Anyways, this is getting off-topic and I’m out of time. I knew I should have avoided posting today…

    One final question: If you’re so inclined, what is the standard deviation in 90-month trends from the FGOALS runs? How does it compare to your values of between 1.1C/century and 1.5C/century?

  10. Yes. The method I use is valid if and only if AR1 applies. That is a separate issue from whether or not “model weather” is good bad or indifferent. I’m trying to figure out how good or bad the AR1 bit is in a variety of ways. That’s not my main motive for looking at model weather. (That will not be entirely clear to others until I do the tests I plan to do– if I can. The really horrible weather cases are going to complicate things. But some of the models have sufficiently well behaved weather that I may be able to do what I hope to do!)

    I entirely disagree with your interpretation of what Gavin meant based on what he actually wrote at his blog and in comments here I don’t even see why you would imagine he thinks he is criticizing the use of AR1 in particular. His “bogus” comment was not aimed at the specific issue of AR1 noise. In fact, he didn’t even mention the issue of AR1 noise or red nose in his post advancing his contention that one must use “model weather noise” to test models, and he did not specifically criticize the use of AR1 noise in comments here. (He did suggest we don’t have enough data to accurately obtain the coefficients, which is true. But that, in principle, can be dealt with by running monte carlo. When we do that, we still falsify!)

    Hmmm… Bdelieve it or not, compared to other models FOALS has medium low standard deviations fro 90 month trends! I calculated for 91 month trends (because that gets us through July) and I got ±2.1 C/century as the standard deviation based on the three runs going from Jan 2001 to now. (But I only did the first 91 months.)

    Let me go do the other batches. . . If I calculate for all possible 91 month trends between jan 1 and 2030, I get ±1.6 C/century.

    I’m mostly avoiding calculating too may of these in favor of just eyeballing data for features that are so un-earthlike they pop-out by eye. Foals caught my eye when I looked at the correlogram which starts oscillating between 0.6 and -0.6 for many, many lags. Basically, the average GMST for earth looks like someone stuck a themometer in the ENSO pool region, put it on steroids, and let that rip!

  11. lucia,

    I did not say that Gavin’s “bogus” comment was directed at the assumption of AR1 noise (at least not directly). I said it was directed at using the uncertainty in a single 7.5-year trend to estimate the uncertainty between 7.5-year trends. (I only mentioned AR1 to save you the effort of explaining when and why the procedure is valid).

    BTW, I hope you’ll be able to add a Search box to this template.

  12. Well… JohnV– if he said what you suggest he said, then he doesn’t believe in standard statistical methods. Estimating the uncertainty between many N year trends based on a single N year trend is precisely what is taught in statistics courses when doing linear regression. So, unless you say something quite sophisticated– which Gavin did not say– that more general statement would be precisely, wrong, in a very boneheaded way.

    In contrast, complaining about the precise statistical model (e.g. AR1 or something else) can be sensible.

    If you are going to try to explain what Gavin meant, you might want to include quotes of what he actually said. 🙂

  13. Lucia, I will be interested to read the final outcome of your investigation into which “noise” to use when comparing model predictions to actual nature.

  14. lucia, we’re arguing right past each other (again).

    As for including quotes of what was actually said, that’s why I asked for a search box. As it is, the only place I can find the word “bogus” is here:

    http://www.realclimate.org/index.php/archives/2008/05/what-the-ipcc-models-really-say/langswitch_lang/en

    I’ve found lots of places where you claim that using model noise is invalid, but no cases where Gavin says (properly determined) real-world noise is invalid. Please provide any quotes that backup your assertion that: “The language he uses for alternative choices is that they are “bogus””.

    Anyways, I asked a simple question about the “FOALS” model this morning (it’s actually “FGOALS”). I managed to answer my own question. Then I incurred your wrath for saying that accusing someone of bad faith is not helpful. Accuse away, if you must, but it does lower the level of your blog.

  15. Why is it that most warmers are assholes?
    Admin: Please don’t use that word. — lucia

  16. JohnV–
    You didn’t raise my wrath. You posted a comment with many ideas interlaced; some of those are idea you have re-posted many times in comments. I responded. I disagree with you on the whatever you meant by “not useful”. That is a very vague statement, and my general experiences is when people post the claim that something is “not useful” without suggesting either why it’s not useful or what might be useful, all they mean is “What you did does not advance my goals.

    I explained that I think it is sometimes useful to comment on whether or not something is done in good faith. If we disagree, that’s fine with me.

    The link you provide is precisely the place where Gavin uses that term. As I noted, he never addressee AR1, or says anything remotely like you suggested he meant. I read it to suggest that he is saying his way and only his way is the correct way to perform a hypothesis test. His way is to use noise estimated from “model weather”. I don’t know how that post can be read differently.

    Gavin also discussed his ideas here, and it’s quite clear he thinks one must use “model weather noise” to test the models.

    I’ll look for a search box– but that still won’t make it easy to find things you said in comments. The search box will be based on Google, and as such, you can already google your name and the topic. It is very, very difficult to find your descriptoin of what you did with regard to volcanos.

  17. lucia,

    My reading of Gavin’s post and comments is that *your* conclusions are “bogus”. Not that *any* analysis done with real-world weather noise would be bogus. There are many ways to estimate real-world weather noise. Assuming AR1 and using the C-O uncertainty from 2001 to 2008.5 is one way. As I read his comments, he thinks that way (your way) is bogus.

    In the past I have also resorted to questioning the motives of others, as you are doing with accusations of bad faith.In my experience, it is not useful for anyone and does not advance anyone’s goals. It only lowers the level of conversation.

  18. JohnV–
    Gavin doesn’t mention me, my blog, or my analysis specifically. He does not describe my method.

    So, I assume he is talking generally, and suggesting his method is “the” one.

    Had he specifically stated what he meant, I could rebutt.

    Evidently, you do read him to be rebutting me, yet he did not do so specifically. I disagree with your idea about calling or not calling people on bad faith. I think all this indirectlyness– not linking to the post one may or may not be rebutting, not stating the argument one is rebutting, and all the elliptical circumlocutions are indications that one is acting in bad faith.

    I will say so. And if you suggest that it is never useful to state this sort of thing directly, I will disagree. While I respect your opinion, I think it is sometimes best to state how how a certain behavior appears. If the person engaging in that behavior, or their supporter, wish to explain why that specific behavior is in good faith– then great!

    I happen to thing that discussion would be useful. But for it to occur, one must sometimes say that certain behavior looks like it is done in bad faith.

    If you thin otherwise, fine. But, in that case, bear in mind that your suggesting my pointing it out is “not useful” is likely to lead to a conversations where we discuss the specific behavior that I believe was done in bad faith.

  19. Lucia, thanks for an interesting look at the models. I’ve taken a look at the GISS Model EH 20th century hindcast results (5 runs) from the KNMI site, and have posted the results up at

    http://homepage.mac.com/williseschenbach/.Pictures/giss_eh_20th_century.jpg

    A few notes on the graphic.

    1) The model runs do marginally worse than a straight line at matching the HadCRUT3 record …

    2) The model responds much much more strongly to volcanos (e.g. Krakatau 1883, El Chichon 1983, Pinatubo 1991) than does the real world. The agreement with the volcanic forcing is supposed to be one of the things that shows the models are good at modeling the real world … go figure.

    3) There are parts of the record where the models are in good agreement with each other, and others where the models disagree widely. One would be tempted to think that where the models are in good agreement is a time where, for whatever reasons, the natural variability is small, and vice versa. One would also be tempted to think that where they are in good agreement, they would be more likely to be close to the observations, and vice versa.

    However, there is no correlation between the width of the model hindcasts and the RMS error between the model mean and observations (correlation = 0.05).

    4) The observations are outside the 5-run envelope more than half the time.

    5) The average error from the 5-run mean in standard deviations is 1.94 (using the models’ standard deviation for each year vs. observation for that year).

    While I’m sure that Gavin could explain why in fact when properly understood this represents a stellar modeling success … those results don’t impress me much.

    My best to you,

    w.

  20. Funny,

    A while back on RC I asked gavin If it wouldn’t make sense to throw out
    obviously bad models. He waved his arms, did a spin, and disappeared in a flury of fluff talk.

    This model is just bad on its face.

    I’ll repeat my engineering focus. Find the best models, improve them.
    having 21 “models” where some are utterly dumb does nothing for the
    crediblity of the science. Heck, I’d issue ModelE or MITs GCM to everyone else and say, there’s your code base, work from that.

    shrugs.

  21. Mosh, you say:

    Heck, I’d issue ModelE or MITs GCM to everyone else and say, there’s your code base, work from that.

    These models are all cousins, the gene pool is way too small to pick one yet. I haven’t found one yet that’s worth a bucket of warm spit, although the GISS model is one of the better of a bad family …

    w.

  22. Willis–
    I’m going to be looking at the variability during the periods with no-stratospheric volcanic activity. I’ll be doing something systematic eventually. But, for now, I’m just downloading and eyeballing.

    Some of the models are just weird. But, they are used for projections anyway. As a technical matter, I can’t understand why they don’t screen. Of course, it’s possible if you set up 5 reasonable criteria, none of the models would pass.

    By reasonable, I mean criteria that a) are quantifiable, b) could be described to scientists who aren’t climatologists and c) wouldn’t make outsiders bust out laughing when they read how weak the screening was.

  23. SteveMoscher–

    This model is just bad on its face.

    Based on the note JohnV found, it appears the modelers themselves may believe this model is not ready for prime time. But, it was submitted to the AR4 and used for projections anyway.

  24. FWIW, I also think that only the best models should be used. I suspect there is a selection criteria in place, but I do not know what it is. (For example, Lumpy would probably not be acceptable).

    If the existing set of 21 (?) models was pared down to a shorter list, I imagine there would be complaints about “cheating” in the selection process.

  25. JohnV–
    There might be complaints of “cheating”, or favoritism. Likely we’d witness each modeling group writing papers describing what’s wrong with other models, rather than just highlighting what’s good about their own models.

    That would mean climate scientists would be acting like scientists in other fields! After all, in other fields, people rather openly criticisize shortcomings of rival models.

    Somewhere in the AR4 there is some prose discussing acceptable models. Lumpy is not acceptable. If I remember correctly, the model has to attempt to predict temperatures over the surface of the planet, and at multiple levels in the atmosphere. I didn’t see any criteria for accuracy or precision, but I may have just overlooked that. Whatever they do require, it appears if it’s a GCM, it makes the grade. It can be utterly inaccurate or crummy, but it makes the grade.

    But, if someone finds a performance metric, then… well.. maybe there are some. 🙂

  26. Lucia,

    Here are the criteria for accepting a GCM that could be deposited in the IPCC DDC: http://www.ipcc-data.org/sres/gcm_data.html

    To this end, the IPCC TGCIA defined a set of criteria that were applied to identify a small number of GCM experiments whose results could be deposited at the IPCC DDC. Models should:

    be full 3D coupled ocean-atmospheric GCMs,
    be documented in the peer reviewed literature,
    have performed a multi-century control run (for stability reasons)and
    have participated in CMIP2 (Second Coupled Model Intercomparison Project).

    In addition,the models preferably should:

    have performed a 2 x CO2 mixed layer run,
    have participated in AMIP (Atmospheric Model Intercomparison Project)
    have a resolution of at least T40, R30 or 3º latitude x 3º longitude
    consider explicit greenhouse gases (e.g. CO2, CH4, etc.)

    Notice the use of the word ‘should’ instead of ‘must’ or ‘shall’.

  27. lucia,

    I should have been more clear. Researchers competing to make the best models would be a good thing. Researchers highlighting the flaws in their own models is already happening (check the links from the site I linked above).

    The complaints of cheating that I worry about are different. Instead of you posting about the limitations of FGOALS or ECHO-G, another blogger would be posting about the strengths of the excluded models. Any warming bias in the included models would be taken as evidence of cheating by the IPCC (perhaps not by you, but by some).

    As I said though, on balance I think it’s best to include only the best models. Or to show two sets of results — one for the best models and one for all models. The big problem is creating an objective set of criteria for “best”:

    – temperature noise characteristics?
    – 20th century temperature trends?
    – what level of the atmosphere?
    – ocean heat content?
    – glacier and sea ice?
    – relative or absolute humidity?
    – precipitation?

    The criteria would depend on the goal. An exaggerated ENSO is terrible if determining the noise in 7.5-year trends, but it may have little effect on 30-year trends. If studying precipitation, excellent performance in temperature trends may mean little. Choosing a different subset of models for each type of study would be a terrible idea because it would increase the probability of over-fitting.

    Choosing the best models is a good idea, but implementing the selection criteria could be a real problem.

  28. JohnV,

    Judith Curry over on CA ( soory no link) talked about a symposium where the issue of downselecting models was discussed. It’s an active debate I believe. Gavin, in one short comment on RC, seemed to be on the
    “democratic” side of things, wanting to let everyone play. But clearly, there are models that hindcast better than others. So, why let the poor performers continue to play when you move to “forecast” mode. It makes no engineering sense to me. I’m not ascribing any motives to this decision,
    since I cannot observe motives. It’s just odd. It seems such a waste of resources. Pick the best hindcaster and improve it. in an orderly fully documented way. Perhaps people have been mesmerized by the consensus rhetoric. 21 models are better than 1. Zoolander logic.

  29. Willis,

    If push came to shove I’d pick MIT because of the superior documentation.
    sorry gavin.

  30. JohnV–

    I looked at the links at the site you posted. Based on several sets of documentation I examined, it would appear modelers fail to disclose many oddities one finds easily when looking at these data. It’s possible the modelers don’t consider some of these oddities oddities.

    In most fields, flaws are found more quickly when outsiders have an incentive to shoot bullets in a model.

    As for your list– including all of those sounds like a good start. However, I suspect that if someone created modestly strict criteria for those things, and enunciated them, every single model would fail.

    This idea that a model can be good at one thing (like predicting GMST) but bad at others (like predicting ENSO) is one of the bizarre ideas climate scientists disseminate through blogs. It may make sense for econometrics models, but it makes no sense for models based on physics.

    In terms of historic performance of models describing transport of mass, momentum and energy, getting medium size features wrong is strongly correlated with getting the average behavior wrong. So, getting a major driver like ENSO wrong is quite likely to cause other things to be wrong. (Or, whatever it is that causes ENSO to be wrong, causes the other things to be wrong.)

    Consider this: Since radiation losses vary as T^4, unphysicly large wild oscillations in temperature are likely to result in excess radiative losses. So, a “wide ENSO” planet might tend to have a cold bias, particularly in the regions affected by ENSO.

    Of course, other mistakes could counter act this cold bias. But.. look. FGOALS is cold! Imagine that!)

    Recall, the claim is that we can have confidence in long term forecasts despite the fact this has never been shown to work and the reason we can have this confidence is the models are good at describing other features.

    There is a reason we read this claim: The fact is, if other features are clearly wrong, we should have little confidence in the unprove predictive ability! Getting the details right is a necessary, but not sufficient condition for expecting projections to be right!

    If that is this is the justification for why we should believe model long term forecasts for GMST, then models should be good at describing all other features.

    With respect to the AR4: Models with weird weather noise are included in the forecast. It is true that, by some happy accident, they will be correct.

    But why should we have confidence in the prediction?

  31. Well, if the models are weird, I wonder what type of relationship modelers have with them:
    [Carolyn is introducing Lester to the Real Estate King]
    Carolyn Burnham: My husband, Lester.
    Buddy Kane: It’s a pleasure.
    Lester Burnham: Oh, we’ve met before, actually. This thing last year, Christmas at the Sheraton…
    Buddy Kane: [pretends to remember] Oh yeah, yes…
    Lester Burnham: It’s OK, I wouldn’t remember me either.
    Carolyn Burnham: [laughs nervously] Honey, don’t be weird.
    Lester Burnham: OK honey, I won’t be weird. I’ll be whatever what you want me to be.
    [Lester kisses Carolyn wildly, then looks at the Real Estate King]
    Lester Burnham: We have a very healthy relationship.

    Are the models whatever the modelers want them to be?

  32. There is a reason we read this claim: The fact is, if other features are clearly wrong, we should have little confidence in the unprove predictive ability! Getting the details right is a necessary, but not sufficient condition for expecting projections to be right!

    If that is this is the justification for why we should believe model long term forecasts for GMST, then models should be good at describing all other features. -lucia

    I thought that

    WeakModel(1) + WeakModel(2) …+ WeakModel(n) = Infallible Consensus

    which is why we never leave any models out out of the grand summation. In that setting, if any model gets some features right, the Consensus can take credit and all failure is just expected variation. In can be confirmed but never falsified.

  33. Lucia,

    If you are looking for a quantifiable criteria for model selection, I’d consider something Willis Eschenbach said, “The model runs do marginally worse than a straight line at matching the HadCRUT3 record.”

    Now I don’t know what Willis did to get those results, since I have long forgotten the math required to figure it out on my own, but it made me think.

    Seems to me that a model should fit the data better than a less advance model. I can’t think of a model less advanced than a straight trend line through the data. Call me naive but I’d think professional models should produce better results than a 10 year old with a ruler.

  34. Raphael–
    I agree that’s a useful test. I’m particularly interested in issues that later affect of ability to apply hypothesis tests to forecasts.

    Basically, the question in my head is the one Roger Pielke Jr. has been asking:
    What types of weather, should they occur, would falsify consensus predictions?

    In particular, I think it’s important to find methods that have low “type 2” (false negative) errors at a specified confidence limit.

    So, I’m looking at properties of “weather noise”.

  35. lucia,

    I disagree with your assertion that individual models can be strong in one area but weak in other areas. Consider your Lumpy model. If you train it using data to 1980 and let it predict GMST to 2008, how well does it do? It probably does quite well on GMST with no ENSO or weather noise at all.

    You’re stretching a little when stating that a large ENSO would reduce the average temperature appreciably because of the T^4 relationship with radiation loss. I realize this wasn’t your key point, but it was easy to test…

    I did a little analysis in Excel using a sinusoidal ENSO with various magnitudes. In each case I computed T0 (the nominal temperature) such that the fourth root of the mean value of T^4 would be 300K. That is, T0 so that the average outgoing radiation would be the same as for a constant temperature of 300K:

    T = T0 + Te * sin(theta)
    where
    T0 is the average GMST
    Te is the amplitude of the sinusoidal ENSO on GMST

    Te = 0.5 K
    T0 = 299.9994 K

    Te = 2.0K (4x larger than above)
    T0 = 299.9900 K

    Te = 8.0K (16K temp swing for ENSO)
    T0 = 299.8409 K

    That is, an ENSO with a swing of 16K from peak to trough (on GMST) would reduce T0 by only about 0.16K. It’s a marginal effect at best.

  36. Pingback: Model Noise Revisited - cont’d « Scientific Prospective
  37. JohnV–
    Note that I said “In terms of historic performance of models describing transport of mass, momentum and energy, getting medium size features wrong is strongly correlated with getting the average behavior wrong.”.

    Lumpy doesn’t fall in this category. In this regard, Lumpy is more like a constrained econometrics model than a GCM. Lumpy is a currve fit. Lumpy isn’t just weak at predicting ENSO, it doesn’t even try.

    For models that are set up to compute the GMST as a consequence of computing details the way GCM’s are set up, getting the details right should matter. It always has in transport models of this type.

    If the details don’t matter, then we shouldn’t need GCMs at all. Simpler energy balance models should be fine at projecting GMST. If climatologists believed they were, they would, presumably, use them. They don’t.

    You are right that the T^4 problem isn’t strong enough.

  38. JohnV, you say:

    That is, an ENSO with a swing of 16K from peak to trough (on GMST) would reduce T0 by only about 0.16K. It’s a marginal effect at best.

    Since the effect you describe is about a quarter of the size of the warming for the entire 20th century, I’d hardly call it “marginal”.

    I also don’t understand you when you say:

    I disagree with your assertion that individual models can be strong in one area but weak in other areas.

    It is well known that each model has strengths and weaknesses. To take one example of hundreds, the GISS Model E gets the albedo right, but the cloud cover wrong. It also is good on the east coast of continents, but bad on the west coast of continents. So your statement makes no sense.

    w.

  39. Willis E:

    My ENSO comment wasn’t about trends. Lucia had suggested that a large ENSO in a model would reduce the model GMST. I checked and found that the affect on the GMST would be small unless the ENSO was extremely large. Lucia and I agree on that point.

    There was a typo in my statement about model strengths and weaknesses. I should’ve said “I disagree with your assertion that individual models can’t be strong in one area and weak in other areas.” I think your disagreement is with lucia on this one.

    lucia — Willis E is saying *your* “statement makes no sense”. Are you gonna take that? 🙂

  40. JohnV– Your sentence that he criticized doesn’t convey what the meaning I intended. My sentences includes examples– and those matter.

    But the idea that we cab count on a model predicting the evolution of GMST well, while getting the details wrong is dubious. It would be one thing if the ability to forecast GMST despite getting details like El Nino wrong. But the idea this can be done is a conjecture.

    So, any detail a model gets wrong should be generally expected to shed doubt on forecasting ability for GMST.

  41. Can I ask a simple question of everyone?

    It is apparent to anyone who does any research that the models are relatively poor representations of the real world climate. They are slowly improving, some perhaps at a faster rate than others.

    Why then are climate scientists so reluctant to discuss the limitations of their models and why do press releases from scientific bodies always seem to emphasise the ‘skill’ of the models?

  42. lucia asks:
    “What types of weather, should they occur, would falsify consensus predictions?” and states: “So, I’m looking at properties of “weather noise”.”

    The problem that concerns me is deep ocean “weather”, which, when it eventually surfaces, is interpreted as atmospheric “climate”, as it evolves quite slowly. We do not have enough data to assess ocean “weather” variability. Yet that is the most likely X-factor that (a) the models are missing and (b) could account for unexpectedly anamolous decadal-scale warming (1980-1998) and cooling (1999-2008) “trends”.

    IOW the types of falsifying patterns that one should be searching for in the models probably do not occur in the empirical record, for lack of looking. e.g. THCs (and ITCZs) that don’t behave “properly” (if only we knew what “proper” was).

    A damnable situation, if this is correct.

    Still – characterizing model noise as best we can is the only way forward, so kudos.

  43. lucia, thank you for your response. You say:

    JohnV– Your sentence that he criticized doesn’t convey what the meaning I intended. My sentences includes examples– and those matter.

    But the idea that we can count on a model predicting the evolution of GMST well, while getting the details wrong is dubious. It would be one thing if the ability to forecast GMST despite getting details like El Nino wrong. But the idea this can be done is a conjecture.

    So, any detail a model gets wrong should be generally expected to shed doubt on forecasting ability for GMST.

    It appears that the misunderstanding may be that we need to distinguish hindcasting from forecasting. It is easy, almost trivial, to hindcast the evolution of GMST while getting the details wrong. As my example showed, the GISS Model E gets a whole raft of details very wrong. Here’s a quote:

    Model shortcomings include ~25% regional
    deficiency of summer stratus cloud cover off the west
    coast of the continents with resulting excessive absorption of
    solar radiation by as much as 50 W/m2, deficiency in absorbed
    solar radiation and net radiation over other tropical regions by
    typically 20 W/m2, sea level pressure too high by 4-8 hPa in the
    winter in the Arctic and 2-4 hPa too low in all seasons in the tropics,
    ~20% deficiency of rainfall over the Amazon basin, ~25%
    deficiency in summer cloud cover in the western United States
    and central Asia with a corresponding ~5°C excessive summer
    warmth in these regions. In addition to the inaccuracies in the
    simulated climatology, another shortcoming of the atmospheric
    model for climate change studies is the absence of a gravity
    wave representation, as noted above, which may affect the nature
    of interactions between the troposphere and stratosphere.

    Note that some of these errors are over 50W/m2. However, none of these errors in the “details” stop the GISS model from at least somewhat successfully hindcasting the GMST.

    However, I would say that they will stop the GISS model from successfully forecasting the GMST.

    Best to all,

    w.

  44. The use of models clearly has a historical reason. Models exist. Modeling the Earth’s climate starting from first principles is a valid endeavour. It is relatively useful in meteorological forecasting. If we ever get the models right, they WILL be very useful. I think models are useful TODAY when it comes to understanding qualitatively the main features of our climate.

    There has been a very large investment of money and human resources in models over the past 30 years. It is just not possible for the researchers in that field to say: “Well, after all that investment, the models are still not good enough, so we just won’t use them”. So they pretend that the models have “skills”, even though they can hardly quantify them. Yet at the same time, they also proclaim that they can be improved, thus the need for continued funding.

    Modelers are not stupid. They know perfectly well that the models perform poorly. But what to do? It’s easy to talk about quantitative criteria, but what should they be? You want to fit temperature, humidity, wind, cloud cover, and so on, on a 3-D grid including oceans, and over a long period of time. It’s not like you can calculate a simple R2 parameter! Furthermore, we don’t even have the data to compare most of these parameters! The papers I’ve read that compare models are reduced to more or less eyeballing the results, or, worse, resorting to “intermodel comparison”, leaving aside any comparison with the real world! All the models look bad in the end, but some look less bad. So, model intercomparison is really a VERY IMMATURE field!

    In any other context, the modelers would readily acknowledge that. In any other context, climate modeling would be a minor, poorly funded research field (maybe it stil is!!). But in the charged political context of AGW, this must be kept as the modelers’ dirty little secret. Providing the forecasts that will save the planet is their big chance to become a bigger, important research field, with the budgets that go with it.

    So the big question that comes up again and again is: are the models useful for forecasting GMST in a doubled-CO2 world? Personnally, I think not. At least, not more useful than much simpler, empirical, approaches. Since it appears that what we are interested in is a single, scalar variable, namely GMST, under the influence of a limited numbers of scalar forcings, it would make a lot of sense to treat the Earth as a black box, with known inputs and outputs, and just attempt to find the transfer function that best emulates the output given the same inputs, including what is called “noise”. Of course, there are more sophisticated variations on that theme (à la Scafetta and West, for example). In the end, I don’t see how that type of approach is intrinsically worse than complex models that perform poorly.

    But admitting that fact would take the spotlight off the modelers. An entire research field would find itself looking for another justification for its existence. You never want to find yourself in that position.

    So we may discuss the merits of models ad infinitum. We may prove over and over again that they have no skills. The likely result is that nothing will change.

  45. From my perspective, the proper question is not “are the models any good”? Yeah, sure, they can be useful. There are things to learn from the models … although it’s hard to separate the wheat from the chaff.

    The proper question for me is “why has the modeling effort progressed so little in a quarter century”?

    I mean sure, there is better resolution and more layers and less flux adjustment and more forcings and the like, but we’re no closer to getting any clarity on climate sensitivity. Thirty years and untold millions of person-hours, and we’ve gone from “1.5 to 4° per doubling” to “2 to 4°/doubling”. Same thing with the forcing from a doubling of CO2. The IPCC FAR models used values from two watts/m2 to over four watts/m2. Not much bang for the millions of bucks spent, when no progress is made on such fundamental questions.

    I contend this is because the very basic idea, the underlying modeling paradigm, is wrong. Current climate models all treat the climate like a ball on a billiard table. You push the ball of climate a little north, it goes a little north. You push it east, it goes east You push it twice as hard, it goes twice as far. No preferred position, a level linear playing field.

    But reality is not like that. The most useful representation of the earth is as a giant heat engine, with the ocean and the atmosphere as the working fluids. As specified by the Constructal Law, flow systems like the climate are constantly re-organizing to maximize global performance subject to global constraints. Bejan has shown great success in the initial steps of this way to understand climate.

    Modeling this concept of the planet requires a totally different model than those in current use. That’s why I think that the modeling effort to date has been so unsuccessful … because the underlying model paradigm is flawed at its core. The climate is not some docile beast where, when CO2 goes up 2 units, there is a predictable linear amount of change. The table is not flat, the earth has preferred temperatures and preferred configurations and preferred regime changes, which are not easy to disturb.

    The climate is not a ball on a level table, free to push in any direction. The climate is a raging river of energy, a tera-watt scale torrent which is constantly shifting and changing to maximize its performance. And like any raging river, if you push it a little east, it is just as likely to go north or south, and not east at all.

    To model that does not require resolution improvements and better tuning of parameters in the current fleet of models. That’s just shifting deck chairs on the Titanic.

    Modeling the dynamic, active, constantly maximizing nature of the climate requires an entirely new approach to the conceptual modeling of the earth’s climate, with entirely new computer models to match.

    w.

  46. In any other context, the modelers would readily acknowledge that. In any other context, climate modeling would be a minor, poorly funded research field (maybe it stil is!!). But in the charged political context of AGW, this must be kept as the modelers’ dirty little secret. Providing the forecasts that will save the planet is their big chance to become a bigger, important research field, with the budgets that go with it. -Francois O

    I think that is a bit harsh. With a few prominent exceptions, I think that the modelers have been candid about the degree of uncertainty and of completeness of the working assumptions. The claims made on behalf of the models is the problem.

    What lucia is doing here is extraordinarily valuable not just as a very substantive nuts and bolts test of the reliability of the models but what it has told us about how the quasi-official alarmist spokesmen regard the models–something not really subject to any tests of validity. Single or short-term events can serve as discrete confirmations of alarmism (an iceberg has melted!) but doubts require as much as 30 years of highly contrary weather to even raise the possibility of disproof. Heads I win and I only lose if there are 30 tails in a row.

    It is also instructive that when there are reasonable dissents about the quality of science (McItrick and McIntyre); issues raised about whether other climate factors such as land use are getting short shrift due to the obsessive focus on CO2 (the Pielkes); and/or whether this proposed policy response is proportionate and cost-effective (Bjorn Lomborg) the reaction is loud, orchestrated, hostile and rather ad hominem. The modelers are not the ones doing that.

  47. George,

    Sure, if you go to the source literature, you find that candidness. But never outside of it. Where do you see a press release for a new result based on models that would warn that models are not that great? Just lask week there was some news about a paper by Soden claiming that extreme weather will become more frequent. Brian Soden is a very competent guy, and he knows maybe more than anyone else that these preditctions are far from foolproof. But he keeps publishing, and allowing these press releases to come out. Good for him! He’s having a great academic carreer!

    But it is worse than that. Willis is right in saying that they need a new paradigm. Current approaches have clearly failed. But there’s one reason for it. Models are FORCED to be stable! They have a bad habit of going astray, so there are all sorts of little tweaks to keep them in check, whether it’s flux adjustments or what else. See that condition on a “stability run”? The models must prove that they are stable. So my guess is that the only way to do that is to keep everything linear. You just can’t introduce nonlinearities, otherwise it quickly gets out of control. But in the real world, the climate is very nonlinear, on all sorts of scales. It’s full of feedbacks! But how do you include that in a model? Either you understand the feedbacks, and we don’t understand most, or you keep trying to obtain them from first principles, with a few parameters, and you get all sorts of instabilty, so you tweak the parameters to keep it stable. Then you claim that if it is stable, then it is GOOD! Modelers are happy just to get something stable!

    Modelers are not going to commit professional suicide by admitting that they have failed. It might take a long time to shift that paradigm. I’ve seen it elsewhere. In my own field, 20 years ago, it was all about the “optical transistor”, something that would supplant electronics on speed. Today, you still have tons of papers published that are still trying to push that idea. Yet, it has really been dead for at least 15 years. It’s just not going to work, for reasons that are well known to everybody. In the mean time, silicon-based electronics has seen tremendous progress, and Moore’s law is alive and well. But many researchers are still making their living on the optical transistor and its derivatives. Such is life in the academic world.

    I agree with you that what Lucia is doing is admirable. But let’s face it, her influence on the course of things in academia is zilch. Even McIntyre is hardly acknowledged. The scientists quickly close ranks when an outsider tries to expose their incompetence. The reality is that the system of public funding for science has grown too big, and must support too many people. None of them want that to change.

  48. I agree with both George Tobin and FrancoisO that in the literature we find all sorts of admissions that many of the assumptions underlying projections are dubious, and in consequence, the model projections are more uncertain that one would discover reading climate blogs written by those who promote the splendiferousness of GCM’s.

    Notwithstanding Gavin’s blog post, does the IPCC consensus state anywhere that the variability in 8 year trend in models is due to “true earth weather noise?” No, In fact, if you check the literature, you will find that they admit that a) there is variability across models and b) many models don’t have realistic “weather noise”.

    Fair reading of the AR4 suggests that the IPCC did not wish to communicate the idea that the full spread across all realizations is meaningful when assessing the variability or uncertainty in projections.

    * The IPCC does not use dispersion over all individual realizations of model runs to represent their uncertainty interval and doesn’t even include illustrations of these things. The use the dispersion over the average projection for a model. This averages out the weirdness of the “model-weather” — as would be appropriate if one truly believed the weirdness of model weather didn’t matter, and only the average was meaningful.

    * When a model group runs many realizations, the spaghetti string illustration showing projections for the GMST show averages over all realizations. Individual realizations are only shown when a model group ran only one realization.

    So, these choices which the IPCC actually made would suggest that, contrary to Gavin’s suggestion, the IPCC as a group does not consider “model weather noise” to be a respectable representation of “earth weather”!

    The difficulty is that some idiossyncratic ideas are being disseminated by climate modelers who blog!

  49. I very much appreciate the nuanced and insightful commentary at this blog, and on this thread in particular. If we can keep the noise low and the signal high, this could be a very productive forum.

    lucia’s views on IPCC presentation of scientific and model uncertainty coincide closely with mine. I know the GCMs are poor approximations. The modelers know that too. That is why they are forced to discuss these limitations in the supplementary information in the IPCC reports (e.g. too-cold high latitudes in FGOALS). However I feel there is insufficient proof that the GCMs are “good enough” to support alarmist projections. “Precautionary principle” is a policy tool that I happen to agree with (in principle), but is far beyond the scope of my interest. I want to understand exacty what lucia is exploring: “weather noise” in these models.

    Specifically: what is the probability that 20th c. CO2 sensitivity has been overestimated by focusing on an instrumental period (up to 1998) when the warming trend might have been exaggerated by internal weather noise. And the flip side of the coin: what is the probability that the current cooling trend is occurring as a result of internal weather noise *despite* the effects of GHGs.

    To answer this question (a 20th c. backcasting problem) requires a very good understanding of internal weather noise, including ocean “weather”.

    Next is the forecasting problem (21st c.). That you got a certain forcing in the past does not imply you will get the same result in the future. If strong negative feedbacks tend to kick in at high T, there could be a cap on the degree of warming. We have seen this in the paleoclimate record. So why should it not happen also as a consequence of AGHGs? THIS is the question. How far can the little blue planet bend before it breaks?

    This, FO, is why one-dimensional EBMs will not do. You need a 3D hydrosphere to intiate the kinds of powerful negative cloud feedbacks that could cap warming.

    Gavin Schmidt and his minions drive me crazy because they will not tackle these two questions (backcasting overfit bias, future negative feedbacks) head on. Their aversion and obfuscation leaves me feeling they are protecting an agenda of self-service. Because I know they are intelligent. I assume they understand these questions, the way we at this blog do.

    Thanks you for your work, lucia. Now that the paleoclimate hockey stick is scientifically disproved, I look forward to learning more about “internal climate/weather noise” and the role it plays in estimating external forcings during warming and cooling periods.

  50. bender—
    I’m largely motivated by the question: “What does it take to falsify any particular projection?” This is the question Roger Pielke Jr. has been asking.

    After all, it often appears that while some vocal modelers claim projections could in principle be falsified, they insist on methods that make hypothesis testing a practical impossibility.

    One approach to hypothesis testing is entirely empirical. I’ve been correcting for red noise. It is legitimate for people to suggest that these means my uncertainty intervals are too small– and I’m exploring that. But, that direction remains primarily empirically based. (How to get correct uncertainty bars empirically is difficult. We can only work with the empirical data we have. So, obviously, the focuse can’t be on the deep ocean.)

    I post these results from time to time. And while they can be critized, doing a test and correcting for red noise is at least an approach people understand. I also post slide and eyeball from time to time. The graphical methods give everyone some idea how things are tracking.

    Gavin suggested (if you can call the tone of his post “suggesting”) that one should estimate the expected uncertainty intervals for “weather” using the variability of weather from models. I think that’s bunk. But, since he suggested it, I am willing to consider it, provided the “model weather noise” is not clearly different from “real earth weather”.

    So, I’m downloading, looking at it, and trying to identify systematic tests. I’d expected it to be easier to identify one because I expected the weather noise in models to actually resemble the data. But, some of the weather noise is so odd, I’m not quite sure what to make of it.

    Still, I think I may have some ideas– and I may even email you to get them reality checked.

    Ocean data would be good…but it’s not as readily available.

  51. The comments from #4981 on down (but not the ones above) are cut off from the left side of my screen. I’m missing the first 8 characters or so. Using IE6.

    Just thought I would advise since it may be associated with the “new look”.

  52. Gavin often says that “the models produce a THC”. What is the test statistic for deciding that an alleged model THC exists or, better, is realistic? I mean, eyebaliing these things is fine, to a point – until you start imagining things that are not *really* all that prominent in the output. Relevant to the thread because THC is a major source of hemispheric-scale noise, and we know a little bit about how it works, empirically.

  53. Lucia,

    Obviously, if there were, “types of weather,” which, “would falsify consensus predictions,” there would be some question about accuracy of the models. But, I would ask, “Are the models good enough for their purpose?”

    I would draw a parallel between climate model ensembles and ensembles used for hurricane track predictions. Because of uncertainties and lack of information, no single model for hurricane tracking is adequate to determine its final destination. When utilized as an ensemble, the likely track is much clearer.

    If a hurricane rounds Cuba and is predicted to hit Texas, but makes a sudden turn and hits Florida instead. Does that falsify the model? Or does it just show that the model isn’t 100% accurate.

    I don’t know much about either modelling or validation of model ensembles. But I wonder if considering this parallel might help. Could you show a hurricane tracking ensemble falsified using the same techniques? Would it be an fair test of that ensemble? I am not suggesting you look at data for hurricanes. I am simply using it to look at the question from a different perspective.

  54. Bender, you say:

    “This, FO, is why one-dimensional EBMs will not do. You need a 3D hydrosphere to intiate the kinds of powerful negative cloud feedbacks that could cap warming.”

    My point was that it may be that you can never get there. That it is just too complex a system to be modeled accurately, therefore you will always get unsatisfactory, if not unfalsifiable results. This is pretty pessimistic, I agree. But maybe we need one or more major breakthroughs before that problem can be tackled. So in the mean time, a simple model may just be good enough, at least for policy purposes.

    When I did my work on the carbon cycle, I was amazed that I could get such good fits of CO2 vs time with a very simple model using a few coupled differential equations. I was also amazed that nobody had tackled that problem before. Of course the problem was, in the end, that I could get good fits with an infinite number of parameter sets, so that was a strong limitation. But I really just needed more, and more precise, data. And that is a big problem with GCM’s too: we lack the real world data. If there had been more money poured at gathering as much data as possible since the past 20 years, maybe there could be an intensive effort to get a single model that matches those data accurately enough. All easy to say, of course…

  55. the model projections are more uncertain that one would discover reading climate blogs written by those who promote the splendiferousness of GCM’s.

    Lucia, the situation is also much worse when one considers the way the media report climate change. They may never refer to the fact that results are based on models, just ‘new research’ or ‘scientists say’ . If they do refer to the models they will rarely qualify it with a statement of limitations. Thus the wider general public never gets close to the matters that are being discussed in certain parts of the blogosphere.

    Francois O,

    I couldn’t agree more with your comments in #5013 and #5024. They were what I was driving at by my simple question posited above.

  56. Raphael–

    Roger’s question is hypothetical. I don’t necessarily mean to ask “Has weather that falsifies predictions occurred?” Though, if one finds a method to test the hypothetical question, one naturally then applies it to weather we’ve had.

    In your example on hurricanes, the answer is “that depends”. The problem is, in your hypothetical, you didn’t describe what the ensembles project and you don’t describe what we know about real hurricane tracks.

    Also, what makes sense in testing depends on what is studied and what the model projections looked like.

    If a hurricane projection model always said the the 67% confidence intervals for the hurricane landing location was anywhere between NovaScottia and Panama, and the average location of a “hit” was Maryland, the model would be useless.

    And yes, there are questions one can ask to falsify based on the average “hit” location!

  57. Illustrative of the problem of advocacy versus science, I note that today’s ‘Climate Debate Daily’ has a link to a 2005 Real Climate entry Dummies Guide to the latest “Hockey Stick” Controversy. This link was posted presumably in response to Steve McIntyre’s recent reaming of the new Amman/Wahl defense of the ‘stick and its reappearance in the CCSP Global Climate Change Impacts draft.

    The 2005 article was tres RC in that (a) both the title and tone drip with more condescension than average and (b) it ducks the substance of the criticism entirely. The Wegman report (done by statisticians who did a Tamino on Mann et al.) lays out very substantive set of methodological criticisms beyond what McIntyre and McItrick did. The report also says that Mann does not appear to understand ‘principle component analysis which is precisely what is recited to us Dummies in the 2005 RC post. The RC ‘defense’ simply repeats the same problematic conclusions with an eye roll as if that were forever sufficient.

    I happen to think that paleoclimate work is fascinating and I have great respect for the difficulties involved. It’s a shame that a valuable but infant discipline was called upon to do so much politicized heavy lifting. Instead of being hammered for overselling the conclusions of their work Mann et al should have been in a position to be respected for moving the discipline as far as they have instead of correctly being seen as agents for an agenda.

  58. I do not believe there was a press release for this report,
    Climate Models: an assessment of strength and limitations Maybe I just missed it. Also, to follow up on Willis’ question,

    “why has the modeling effort progressed so little in a quarter century”?

    I would note that <a href=”David Rind has published a paper asking the same question.
    It is doubtful that
    averaging different formulations together will end
    up giving the “right” result, especially because we
    have no way of knowing whether the various choices
    that have been made even circumscribe the proper
    sensitivity.

  59. Lucia,

    Thanks for your patience. Believe it or not I did actually take a few classes of this mathy stuff. 🙂 Unfortunately, I now realize that remembering enough for an A and understanding are not mutually inclusive. Of course, I no longer remember enough to even get an A, though I certainly had hoped the understanding of the principles remained. But, alas, I discovered a small (huge) gap in my understanding because of one visualization used by a professor which failed to serve it’s purpose.

    Ah the curses of pre-programmed responses to certain visualizations, and the desire to get a degree rather than a desire to learn.

    Wise man say, when the teacher draws a bulleye demostrating low precision and high accuracy, you shouldn’t wonder about weapon and ammo specifications, range to target nor skill of shooter.

    Wise man say, when learning about accuracy and precision, and the teacher draws a bullseye with a shot pattern demostrating high precision and low accuracy, your thoughts should not wonder why the shooter did not adjust between shots.

    Sorry, it was rather an “eureka!” moment. And allow me to say, things just make alot more sense now.

  60. Ellis,

    Thank you for the link to the Rind 2008 paper. It deserves to be widely read. At the moment only CO2 Science seem to have picked up on it.

  61. One thought occurred to me relative to multiple model usage. Is there any other discipline where this would be done in this fashion? The only other usage that comes to mind would be another meteorological phenomena: hurricane forecasting. Is this endemic to only weather/climate phenomena? Anybody got any other examples?

  62. #5068 College football rankings. The various models that go into the composite computer ranking are very diverse, some fairly sophisticated. But they all attempt to answer the same question: who is more asymptotically likely to beat whom? Although apples and oranges, each model can be thought of as generating an upper triangular square matrix of winning probabilities. The computer ranking is effectively the ordinalized output of the average probability matrix. Similar to climate science in htat estimating these probaiblities involves lots of heurisitics and difficult-to-defend parameterizations. It’s an art.

  63. Here’s what Stephen Schneider had to say in 1990 in ‘Global Warming. The Greenpeace Report’ , p48, in a chapter titled The Science of Climate-Modelling

    “Choosing the optimum combination of factors is an intuitive art that trades off completeness and( the modellers hope) accuracy for tractability and economy……..such a trade -off between accuracy and economy is not ‘scientific’ per se, but rather is a value judgement, based on the weighting of many factors”

    Here’s what Bader et al said in the recent ‘Climate Models: an assessment of strength and limitations’

    Climate modeling has been steadily improving
    over the past several decades, but the pace has
    been uneven because several important aspects
    of the climate system present especially severe
    challenges to the goal of simulation.

    Almost 20 years on from Shneider’s comments it is clear that whilst improvements may have been made the models are not ‘fit for purpose’ to drive policy decisions and the climate science community really needs to open up and acknowledge this fact.

  64. Tom–
    I’m looking at the monthly global mean surface temperatures included at the climate explorer. So far, I’ve also only examined those that ran SRA1B cases for the AR4 and uploaded the runs to the climate explorer.

    It appears CCSM3 did not upload any runs for the SresA1B. So… nope. They seem to be the only group that did not run an SresA1B case. So, I’m not looking at CCSM3– at least not yet.

    But, of course, if you are interested in that model, you c ould download the data and look at it to learn whatever you hope to learn from looking at it. 🙂

  65. “But, of course, if you are interested in that model, you c ould download the data and look at it to learn whatever you hope to learn from looking at it.”

    Lucia, I appreciate your sense of humor! BTW, that was NOT a bald attempt at getting you to do my bidding, but simply a question.

  66. Tom–
    The monthly GMST data (which is all I’m looking at) is easy to look at. It’s just one long string of numbers. I actually may end up downloading that set because I’m looking for tests I can actually do that have any hope of distinguishing something at p=95%.

    I’ve been trying different tests. On the one hand, the models do poorly generally. But on the other hand, the only test that can’t be shot full of bullets for some statistical “issue” are the non-parametric ones. And for those, the limited number of runs is making things difficult. ( If you want to show something happens less than 1 in 20 times, for many tests it’s nice to have at least 20 samples. For some non-parametric tests, you only need 5 samples. More is better– but 5 is a key value because that’s the number where (1/2)^n < 0.05, meaning at least a result of "I got heads five out of five times" does happen less than 5% of the time. But for the types of tests where I need 20, I often have only 17 independent "samples" and for the type where I need 5 I generally only have 1-4 indepedendent samples.)

  67. Lucia, always a pleasure. You say in your OP:

    Below, I have compared the 12 month lagging average of FGOALs temperature hindcast/prediction from 1980-2030 (SRES A1B after 2000):

    There is a hidden problem in this approach, which is that many of the GCMs use forcings in their hindcasts that they do not use in their forecasts. I don’t have the numbers here at the moment, but for example, many of them use things like black carbon and aerosols and solar and volcanoes in their hindcasts but not in the forecasts.

    Since you are using both hindcasts and forecasts, I’d have to assume that the variance would be much greater in the former than in the latter …

    All the best,

    w.

  68. Willis– The first thing I did was just “look” at the forecast/hindcast.

    When I look at quantitative issues, I’m going to look at a hindcast period and a forecast period. Both are “no-volcano” periods for must tests.

Comments are closed.