Multi-Model Mean Projection Rejects: GISSTemp, start dates ’50, ’60, ’70, ’80.

On Friday, I promised I would create a graph showing the results of an uncertainty analysis to test whether the difference between GISSTemp observations and the multi-model mean trend in earths surface temperatures based on IPCC AR4 models that were extended into the 21st century exhibits a statistically significant difference from zero. My major results are: The difference is statistically significant for longer time spans but not shorter time spans. The summary graphic is illustrated below, analytical choices follow briefly:

Figure 1: Trend in difference between GISSTemp and Multimodel Mean IPCC AR4 models extended into the 21st century using the A1B scenario.

The graph above shows that the computed least squares trend in the difference between GISSTemp and Multi-Model Mean projections is negative if we happen to being our analysis with start dates of March 1950, and January of 1960, 70, or 80. For these start years, the models are predicting more warming than obseved and the difference is statistically significant.
 
The magnitude of the excess warming is the multi-model mean of simulations is not trivial: a difference of -0.04 C/decade represents roughly 20% of “about 0.2 C/decade”, the nominal rate of warming projected for the early part of the 20th century in the IPCC. More relevantly, it represents 37% of the trend in GISSTemp since the March 1950. So, if one thinks mis-estimating the trend by 37% “matters” to policy, then this magnitude of bias matters.

Though this difference is non-trivial, it is the case that given the amount of scatter in mean trends from individual models and the internal variability in models and the earth’s own weather, it is still possible for those who want to hide difference of this magnitude to create graphs that hide this non-trivial difference in trend, avoid mentioning the actual magnitude of the difference in the mean trends and decree the match good. But make no mistake: The fact that some wish to hide the statistically significant and non-trivial in magnitude difference between model and observed trends does not mean the models and observations agree.
 
It is also worth nothing that using the analytical choices discussed below, the difference in the multi-model mean and the GISSTEmp observations is not statistically significant if we begin analysis in 1990, 2000 or 2001. Though the difference is large, the uncertainty in the computed trend in the difference between observations and GISSTemp is even larger.
 

Analytical choices

  1. The difference in warming is computed by subtracting the multi-model mean of monthly temperatures from observed monthly temperatures. This results in a time series of monthly temperautres. The models chosen are those models extended using the A1B scenario by the IPCC in the AR4. Both the 20th century runs and their extension into the 21st century are available at The Climate Explorer. Motivation for analyzing the trend in the difference is provided in a previous post.
  2. Trends are computed using multiple least squares regression using time and MEI lagged by 3 months as exogenous (i.e. causal) parameters. The MEI index starting in 1950 is available here. In the approach here, noise coefficients were computed based on the start date used to compute the trend. (It could be argued one should compute the noise coefficients based on the longest time period, assume these are correct for all times, and then compute the uncertainty intervals this way. I have also done this. This method gives similar results, with somewhat smaller uncertainty intervals starting for the start dates from 1980 forward.)

Summary

Analysis of the difference between the the multi-model mean projection and GISS Temp, indicate the difference between the two exhibits a statistically significant trend at a confidence of 95% with models over predicting warming. The difference in trend since 1950 represents 37% of the observed warming suggesting that the multi-model mean has tended to over predict warming. The analysis is not suited to detecting the underlying cause of the bias. It could be due to parameterizations in one or several models, inaccurate estimates of forcings during the period analyzed, failure of most models to include the 11 year solar cycle in projections, it could be that the observations are biased low and as with all statistical analysis, the mis-match could be due to a very strong weather event. Irrespective of the reason, the multi-model mean of simulations shows more warming than GISSTemp observations and that difference is statistically significant if analysis are initiated beginning in 1950, ’60, ’70 or ’80, but not when initiated in 1990, 2000 or 2001.

146 thoughts on “Multi-Model Mean Projection Rejects: GISSTemp, start dates ’50, ’60, ’70, ’80.”

  1. Gosh,

    And everybody asks me why I believe in a 1.5C century.

    Err.. observational record is high by 15%
    and what’s a linear model from 1850 look like?
    and from a bias perspective is the table tilted toward finding positive feedbacks?

    Funny Lucia, As a modeller we would always build a simple model first. And nobody got to change the code unless it made the model more skillful. That pissed off a bunch of people who thought the process that they modelled “must make a difference”

  2. “a bias perspective is the table tilted toward finding positive feedbacks?”

    Maybe. Or not. The problems could be in the ocean mixing, the poor knowledge of aerosol loading, or failure to include the solar. I did some fiddling by just adding a fake 0.1 C/ 6 year solar correction to data for the past 6 years and that doesn’t change any rejections. So… I don’t think it’s that.

    It may well be that some of the models are fine and others are wrong. But pretty evident that the modelers need some method of diagnosing some models are likely wrong and throwing them away before making projections. One hopes they are starting to realize this even if they aren’t willing to publicly admit that the multi-model mean projection for the metric with the supposedly “most robust” predictions is biased high.

  3. I never did get the ensemble idea. I suppose it’s a symptom of not having a way to test a model once it’s built …..so if you have a model you think is good just throw it into the mix.

  4. Hank,

    I think its also something of an artifact of the politics of the IPCC process. Some modeling groups are arguably quite better than others, but national pride being what it is, it was much easier not to discriminate based on modeling group quality, include them all, and hope that the outliers fell out in the ensemble mean.

  5. Zeke, that doesn’t sound good – but a very democratic way to build consensus, I suppose.

  6. Zeke–
    I think the IPCC process creates a host of social dynamics not normally present in scientific research. Normally, if many scientists thought a particular model was poor, that would be reluctant to cite it. There is little overt criticsm– it’s just mostly ignored. This helps save people’s face and also permits a certain level of collegiality at professional meetings.

    With the IPCC process, if the model nominally fills the criteria of being an AOGCM, not having too much drift etc. it is accepted. National politics being what it is– and the document being so visible– a group would have to agree to kick out a model and then explain the reason. They can’t deal with bad things by ignoring. So, metaphorically, to toss a poor model, the group must slap the modelers who come up with the poor model in the face and the slap will be public. This is normally avoided in the more ‘normal’ scientific process.

    What’s worse is that in terms of science rather than policy, you actually want groups to take risks when incorporating new parameterizations into models. But you sort of know that taking a risk on a new very different parameterization in a full AOGCM puts you at risk of creating simulations that are on the edge of the distribution of all models and also more wrong that other models. If both occurr your model will be at risk of being decreed the odious one that is false. (Oddly, former — being outside the pack– may be more dangerous than disagreeing with the data!)

    Having your model decreed not good enough for the ensemble is something you will have to explain to the funding agency! Not fun.

    So, there are lots of reasons– none having to do with better accuracy– why the group of climate scientists as a whole might be reluctant to decree any individual model false. Everyone of the modelers know that next year, if they try a new parameterization, their model might look bad.

  7. Lucia and others.

    One mail I did not get a chance to get to in the book, in fact the whole ISSUE of GCMS and SRES raised in the mails hasnt been addressed ( yet……..hehe)

    Wednesday, 13 February 2008 15:46:33 : Filename: 1202939193.txt

    LOTS OF STUFF, but the big issue here is MODEL DEMOCRACY

    see:

    2. Is “model democracy” a valid scientific method? The “I” in the IPCC
    desires that all models submitted by all governments be considered
    equally probable. This should be thoroughly discussed, because it may
    have serious implications for regional adaptation strategies. AR4 has
    shown that model fidelity and model sensitivity are related. The models
    used for IPCC assessments should be evaluated using a consensus metric.

  8. Well… one hopes they do winnow. In fact, they have to. If they don’t decree some “probably false” — that is not worthy of being treated as true– then they can’t make the poor agreement yesterday’s news.

  9. Lucia: A while back I posted a note at WUWT about GISS temps and Anthony responded that he considers GISS temps to be badly flawed. I was wondering if you agree and if so if a similar analysis could be done with one or more of the other temperature series (RSS, UAH,etc).
    Or: is there a reason you utilize GISS temps, e.g. because the models are tuned to that series?

  10. lucia – a thoughtful and helpful analysis. Of course, I’m on this kick about testability of models. What you say makes it occur to me what an impossible or nearly impossible task it is to create a model. So many dials to turn. It must be like oldtimey superheterodyne radio tuning. I remember my father telling me that an atwater-kent had seven dials and the person tuning had to sit in a chair bent over the thing to keep the signal coming in.

  11. jack,

    Lucia’s test, If I understand it, is to compare against a global record that extends back until 1950.

    1. UAH & RSS start in 1979

    That leaves three choices.

    A. GISS
    B. HADCRUT
    C. NOAA

    These three choices are greatly intertwined. That is, they use, roughly, the same data streams. For the most part there is substantial agreement between them with the differences amounting to a few tenths of a degree here and there.

    Are there problems with these datasets? Probably, the whole process of constructing each has not undergone a rigorous ( in an engineers mind) cross checking, independent ( truely) verification and validation.

    That fact doesnt stop Lucia from doing her analysis. Its like this.
    the climate science community takes these records to be largely true. GRANT THAT for the sake of discussion.

    NOW, ask the question: is the record you take as being largely true consistent or inconsistent with the GCM output.

    Simply, you can “bracket” the questions about GISS for the purpose of analysis

  12. Jack–
    I don’t have any reason to believe GISSTemp is badly flawed. I compare surface temperatures to projections of surface temperatures. RSS and UAH measure the lower troposphere, which is a slightly different thing. Comparing surface temps. to UAH is a granny smith to golden delicious comparison. Close but not the same.

  13. Hank– AOGCM tuning is quite different from radio tuning. It’s not that it can’t be done– but the approach is different.

  14. Certainly the IPCC (or maybe an an-hoc group of modelers beholden to the IPCC) should get rid of models that make very bad projections (which in fairness to Tamino, seems to be mostly what he tries to do in his most recent ‘I hate Lucia and don’t want her her to say the model average is wrong’ post). Of course, modelers rejecting certain models sets a rather harsh Popperian precedent (specifically, if they predict like crap then they are crap) which could come back to bite any of them. So watch for the modeling community to be very cautious about throwing away bad models.

    An interesting point that modelers do not seem to address is why different models make different predictions. It would seem a fairly straightforward process to extract from each model key factors like assumed aerosol forcing, assumed ocean lag constants, water vapor feed-back, etc.; models that do poorly would seem to present an opportunity to examine where they go wrong and perhaps help to improve the survivors.

  15. SteveF–
    Sure. Tamino seems to be advocating screening models. But he wants to screen the false models out of the original set, and then decree that it’s ridiculous to observe the original set gave poor results.

    I’m sure the modelers do discuss why different models get the sorts of results they get. That doesn’t mean the public face of modelers– i.e. Gavin– is going to admit the problems in public. RC exists to spin, Gavin does it with enthusiasm.

    Screening to get a correct result is going to be difficult. The problem is that most models look very bad at doing something. Some get the surface temperature trend right but have the most screwy “weather noise” you can imagine. Should those be kept? Some seem to be very cold if you look at the non-anomaly method; some are quite hot. Do you get rid of those?

    Presumably, if the goal in screening is to knock out models that get the physics wrong, you don’t screen using surface temperatures only. Though… who knows. Maybe they will do that.

    We’ll see.

  16. Lucia,

    I wonder if Gavin and company can possibly imagine how much it would help their cause to have open and public discussions about the parameters that are incorporated into models, and the specific weaknesses in different models. As things stand, those on the outside (that is, the rest of us) mostly want to strangle them for being so secretive and duplicitous about model performance.

    I would sure like to hear someone like Gavin say something like, “Well you know, the GISS Model E gets the temperature change about right but is off in the calculated average surface temperature by x degrees, and we are not really sure why.” Lots of talented people would be willing to help figure it out if it were done in the open.

  17. The A1B SRES is useless for purposes of extrapolation. Current oil and gas use are well below the SRES estimates for 2010 already and it’s not just due to the current economic crisis. The total consumption of petroleum from 2010 to 2100 is 2.8E12 bbl and consumption in 2100 is projected to be not all that much less than current (59E6bbl/day compared to 85.5E6 in 2005). The low side estimate of total petroleum availability is probably Deffeyes at 3E12 bbl, and we’ve already consumed 1.5E12 of that. Coal is equally problematic. Gas is somewhat up in the air with the recent shale gas technological advances. Still, gas consumption in 2100 of 531E9 cu.ft. compared to the EIA estimate of 112E9 cu.ft in 2010 seems a stretch. It’s way worse if you assume that the overestimate of oil and coal consumption is replaced by gas. Of course, since the global domestic product is estimated by extrapolation of current national domestic produc’s adjusted by MER (monetary exchange rate) instead of PPP (purchasing power parity), it’s likely that it’s way overestimated as well, which would reduce energy use substantially.

  18. SteveF: FYI Here is a link to documentation of one of NCAR’s models. Some parameters are known physical constants, some are probably unobservable and some are probably “fit”. You can back up to get more info there and even get the source code. I wonder if any sensitivity analysis has been performed. Considering the complexity of the system it is no wonder that different models yield different results.

    http://www.ccsm.ucar.edu/models/atm-cam/docs/description/description.pdf

  19. DeWitt Payne (Comment#30825) January 19th, 2010 at 7:47 pm

    The A1B SRES is useless for purposes of extrapolation. Current oil and gas use are well below the SRES estimates for 2010 already and it’s not just due to the current economic crisis.

    Two words, India and China.

  20. Re: bugs (Jan 19 22:08),

    China is supposed to have coal reserves second only to the US. China is importing coal from Australia now. Does that tell you anything? Anthracite coal is essentially gone world wide. Bituminous coal may well have peaked and production is shifting to sub-bituminous. The energy content of each grade is lower than the previous grade. The ratio of energy returned to energy invested keeps shrinking for oil and coal.

    Nobody can burn oil that doesn’t exist. If global oil production peaked in 2005 then the estimates of oil consumption or production in almost all of the SRES are bogus.

  21. Lucia, I thoroughly enjoy your blog and learn a lot. On this topic however, I am a bit confused. My memory of past statistics courses says that we should be doing a significance test against the null hypothesis. That is, that nothing unusual is going on. ie. it is pretty well accepted that there has been warming going on since 1850 at a rate of 1-2 degrees per century. Shouldn’t we be doing a significance test of observations against that trend before we accept the alternative hypothesis that one or more of the GCMs is correct.

    It would at least be interesting add that test to your analysis.

    Thanks, please tell me if I am completely out to lunch.

  22. On the winnowing of models, If they winnow based on how well the models fit some physical observation (necessarily in the past at the time of winnowing) doesn’t this just result in overfitting. For the ensemble model, it seems to me that this is no different than tuning the parameters of a single model to get a better fit.

  23. Lucia,
    As far as I can tell,I think this analysis has the same failing as your previous IPCC falsifications. You form an average trend from a population of model results, among which there is a lot of variability. And you form a summary trend from observed weather, which also has variability. You examine the difference for significance. But you test only the variation in the difference due to the weather noise, omitting the intra-model variability. It’s true that the model variability can’t be easily treated as random, but that doesn’t make it right to leave it out.

    I note that your 2001 and 2000 starts are now said to be not rejecting significantly. Is this a new finding?

  24. Nick Stokes,

    “about 0.2C/decade” would imply that 0.150000000001 – 0.24999999 should be ok.

  25. DeWitt Payne (Comment#30838) January 19th, 2010 at 10:46 pm

    Re: bugs (Jan 19 22:08),

    China is supposed to have coal reserves second only to the US. China is importing coal from Australia now. Does that tell you anything? Anthracite coal is essentially gone world wide. Bituminous coal may well have peaked and production is shifting to sub-bituminous. The energy content of each grade is lower than the previous grade. The ratio of energy returned to energy invested keeps shrinking for oil and coal.

    There’s the problem, people still burn it.

  26. Why use March 1950 instead of January?
    Why use less than 1% of the data for your analysis? There are 120 data points per decade of average monthly temperature, why confine your analysis to just one of them? Is that really representative?

  27. John Knapp

    we should be doing a significance test against the null hypothesis

    Yes. And one gets to chose what hypothesis is considered null. In this test, the null is: the trend in the difference between GISSTemp and the multi-model mean prediction for surface temperatures is zero. That null is rejected for the years indicated.

    Shouldn’t we be doing a significance test of observations against that trend before we accept the alternative hypothesis that one or more of the GCMs is correct.

    To do this test, you would need to specify the years spans and the data set to permit me to do the trend. The two year spans shouldn’t overlap.

    Regardless, I don’t know why we need to do this test before we would say GCM’s simulations match data. No matter what the long term trend, either the GCM simulations match it, or they don’t. We can ask that question and test.

  28. Wow, some of you have developed wacky ideas as to how the idea of an ensemble of models came to be the “truth”.

    In reality, ensemble forecasts are carried over by atmospheric scientists from meteorology.

    Ironically they do not wish for these models to be tested as weather models would. Which makes sense, except that they think that in terms of averaging the models will behave about the same…

  29. Nick Stokes

    As far as I can tell,I think this analysis has the same failing as your previous IPCC falsifications. You form an average trend from a population of model results, among which there is a lot of variability.

    Yes. Glad to read you don’t like the method everyone around here has been criticizing as stupid and likely to cause projections to be wrong. Averaging all the models together is method the IPCC uses to create their best estimate of the projection. (If you search a bit, you’ll read Gavin’s patient explanations about why it’s a good method.)

    Please let the IPCC and others know you think it’s a poor method so they can avoid using it in future. When they create new projections using a new method, I will test those. If they pick a bad method again, I will test that same bad method.

    I note that your 2001 and 2000 starts are now said to be not rejecting significantly. Is this a new finding?

    With GISSTemp, 2000 was already not rejecting using the “red noise” model, and I think the 2001 was alternating depending on the month. So, we’d gotten “fail to reject, fail to reject” in the past. However, the ARMA model widened the uncertainty intervals so they are now safe if we test against GISSTemp. However, you’ll see this is not the case with Hadley– we’ll get recent rejections.

    This sort of behavior– failing to reject being inconsistent across noisy measurements — is the sort of thing that we are likely to see if simulations are wrong but the amount of data puts us in a region where typeII error is high. (That is: when a small amount of data says that we are likely to fail to reject even though the null is probably wrong.) Of course, it’s also what we would expect if one of the observational sets is whacked.

  30. Nick–
    I should add– those GISS error bars for 2000 and 2001 would be noticably smaller than shown if I computed the “noise” parameters based on the data set since 1950-now and applied those. But those two years would still reject. I seem to recall freezing the noise parameters for the ARMA was the method Tamino recommended about a year ago. I’m not sure what he did in his more recent post.

  31. Nathan

    yes… see what you are doing here seems to be rather pointless.

    How so?

    So you might think this paradigm should have been still-born and never caught on. However, people have persevered with it over a number of years, trying to fix it with various additional “bias” terms or ensemble inflation methods, and generally worrying that the ensemble isn’t as good as they had hoped.
    If people and the IPCC are still clinging to the method, the IPCC AR4 projections are based on that method, and they have been resisting streams of evidence it doesn’t work, what is pointless with showing the public the method the IPCC prefers doesn’t work?

    Maybe, with some luck, the scientists will change their method. They may manage to do so without ever making a public admission the results were wrong and the method was always obviously stupid. But there is still nothing pointless with showing the currently fashionalble method does not work.

  32. Turboblocke (Comment#30849) J

    Why use March 1950 instead of January?

    The MEI used in the correlation is reported by NOAA since Jan 1950. However, it predicted future temperatures, and the best correlation is given with a lag of 2 months. That is: If we want the best regression for MEI and temperature, MEI in Jan predicts Temp in March.

    So, data available for regression looks like this
    Time, MEI, Temperatures
    Jan No data Temp_in_March
    Feb No data Temp_in_March
    March MEI_ Jan Temp_in_March
    April MEI_ Feb Temp_in_Feb

    And so on. So, I can fit regression in March of 1950 or after, but I can’t fit a regression starting in Feb or Jan because I don’t have MEI data from Nov. and Dec.

    That’s why 1950 starts in March, but all other fits start in Jan. I could switch to all March– but that would probably seem weird to people too.

  33. John Knapp–

    On the winnowing of models, If they winnow based on how well the models fit some physical observation (necessarily in the past at the time of winnowing) doesn’t this just result in overfitting. For the ensemble model, it seems to me that this is no different than tuning the parameters of a single model to get a better fit.

    The hope is that you eliminate the models that have some flaws in replicating the physical process. That said: the final test is always whether or not the new batch and the new “best estimate” correctly predicts data that was not available at the time of the winnowing.

  34. lucia says:
    “The fact that some wish to hide the statistically significant and non-trivial in magnitude difference between model and observed trends does not mean the models and observations agree.”
    .
    How dare you suggest anyone would ever want to hide such a divergence! What are you – a conspiracy theorist?

  35. bender–
    No need for conspiracy. Each individual who happens to fall in the group “some” may be acting inpendently.

  36. bug:

    There’s the problem, people still burn it.

    I’m pretty sure DeWitt’s point is that the price starts (cost per unit energy) goes up when you switch to less desirable fossil fuels.

    For example, the supply of low-sulfur petroleum (light sweet crude) is smaller than the supply of high-sulfur petroleum. It’s expensive to convert to refining the high sulfur petroleum, and it costs more to produce it.

    We’re on a road where the costs of fossil based energy sources must increase over time, due to increasing scarcity, increased production costs, and decreased efficiency of the fuel. At some point, we will cross a threshold where alternative energy sources become more economically attractive, and the switch to the alternative fuels will happen in a natural, market driven way.

    Hopefully you aren’t so vested in central government-based approaches that you can’t admit that technological/market based solutions are able to solve the problem instead.

  37. Lucia:

    Yes. Glad to read you don’t like the method everyone around here has been criticizing as stupid and likely to cause projections to be wrong. Averaging all the models together is method the IPCC uses to create their best estimate of the projection. (If you search a bit, you’ll read Gavin’s patient explanations about why it’s a good method.)

    I recognize you are just repeating the same analysis method used by the IPCC and by individual authors in the climate community.

    I belong to the camp that thinks it is ill-advised to combine different models and treat the difference in models as if that were related to measurement uncertainty.

    Some models put more physics in than others, and simply comparing the variability as you add or drop different physical assumptions has no bearing on the modeling uncertainty.

    Obviously the model that is able to reproduce the most physics is the best model, and simply comparing it to another model that doesn’t include all of this physics in itself won’t tell you anything about the uncertainty in the simulated results from that more precise model.

    All it will tell you is what the effect of leaving out the extra physics in the more precise model is on the less precise model and nothing more.

    To illustrate this, I have a class of models I call the “single oscillator theory,” I have a second group I call “frequency domain” solutions” that are fast but neglect nonlinear effects, and I have “full nonlinear time-domain solutions”. For both the frequency and time-domain solutions, I have approximations for 1-d, 2-d and then the full 3-d model.

    Which of these (assuming I’ve done the coding properly) is the most accurate?

    Why the 3-d nonlinear time-domain model of course.

    Everything else is an approximation of this, and differences between this most exact solution and the other simplified models just tells me the effect of relaxing assumptions like nonlinearity, or going from 3-d to 1-d models.

    The same goes here: Global climate models that include less physics are simply a test of the effect of not including that extra physics. They tell you nothing at all about what you do know about your “best” models.

    How does one test climate models given that you can’t just treat the collection of models as a statistical ensemble? It’s the way I’ve described in the past: Run multiple implementations of the same model, varying the input parameters of the model according to the uncertainty in their values, and treat that group of runs as an ensemble of data.

  38. I think Zeke Hausfather (Comment#30803) is not wrong that there were issues of collegiality that may have trumped any preference for precision and the ultimate exclusion of weaker models. But I suspect there is another dynamic at work as well.

    It is my understanding that most of models project closer to lukewarmist outcomes than catastrophic scenarios. If you started pruning the models furthest from observations the axe might disproportionately fall on the on the more dire scenarios.

    If the IPCC consensus were then presented as a clear majority opinion of about 1.5 to 2.5 degrees over the next century instead of a range extending out to 5 degrees (with the implicit suggestion that the extreme outcome was just as likely) then the IPCC reports would be a lot less useful to those who pay for them.

    If instead you just lump together all the models and average them it permits the outliers to remain members in good standing and within the “consensus.”

  39. Nathan:

    I can see why the IPCC authorities did not admit Annan Hargreaves to the inner sanctum.

    The notion that we can never really know what the right answer is and can only devise clever probabilistic approximations runs counter to the far more effective marketing image of scientists quite certain of their findings and models which are being fine-tuned even as I write this.

    Given that the modelers themselves still cling to the terribly unsophisticated “truth-centered paradigm” that Annan and Hargreaves have transcended, why should lucia be barred from critiquing the modelers on their own terms?

    I also find critiques like the one you cited at “Pollytics.com” rather annoying. The “decline” in this instance (rather than the one the Climategate emailers worried about) is in relation to the trend the “truth-centered paradigm”) told us to expect. Nobody denies the existence of noise and variation. But being piously lectured about the insignificance of a decade of rather flat temperatures would be less annoying if many of the same folks had not hyped the 1990’s as clearly diagnostic of The Trend.

  40. Lucia-
    “Hank– AOGCM tuning is quite different from radio tuning.”

    Well, based on the metaphors I’m reading in here. I can’t help wondering if an industrial art from the 1920’s or 1930’s (manual radio tuning) is being dropped for something from a time long long passed (winnowing the wheat from the chaff). None the less, I believe you when you indicate and suggest that it’s something more involved than fiddling numbers.

    I would also remark if people like Gavin expect the lay public to believe the science, lay people will expect a decent explanation or description of how models work. It’s not like there isn’t a healthy market in America of laypersons wishing to learn more about science….. judging from all the science shows and books and articles I see on relativity, quanta, dark energy, dark matter, the 11 dimensions of the multiverse and so on and so on.

    My question for the day is whether the “parameter” of parameterization has anything to do with a parameter of a parametric equation – which I have a vague recollection of from high school algebra.

  41. lucia (Comment#30853)
    “Glad to read you don’t like the method everyone around here has been criticizing as stupid and likely to cause projections to be wrong.”

    No, I’m not objecting to the method of summarizing the model results. Forming the average trend from the population of model results is not unreasonable. It’s just an estimate that you know is uncertain, because there is variation within the models. You subtract from it an observed trend, which is subject to weather noise.. The difference has two sources of uncertainty – model variation and uncertainty about the observed trend. Yet in testing whether the difference is significantly different from zero, you consider only the uncertainty about the observed trend.

    Suppose you were asked whether the collective wisdom here is any good at predicting the monthly UAH. So you average the results. The UAH comes in, and is somewhere in the middle of the range of predictions, but different from the average. So you ask whether the difference is significant, based only on the acknowledged uncertainty of the reported UAH figure (let’s assume that is small). That difference is very likely significant. Did that tell you anything? No, because you know that the average of the predictions has an uncertainty of about 0.1 to 0.2 C based on the spread. The fact that the UAH comes in near the centre of the spread is the best you can expect. If the average had been not significantly different, based only on the small UAH uncertainty, you’d rightly dismiss that as a fluke.

  42. Nick

    Yet in testing whether the difference is significantly different from zero, you consider only the uncertainty about the observed trend.

    Wrong. The average surface temperature over N models also contains noise arising from the internal variations in the models. It would only go away if I had an infinite number of models. In which case, it would also go away if took the difference, computed 22 trends, and then did the t-test differently.

    I’m pretty sure both ways work out similarly. But now I’m curious. I can do it the other way and see.

  43. HankHenry, “parameter” refers to any of the variables or functions of the equation that are assumed fixed for a given problem, and do not dynamically evolve over time in accordance with the dynamical equations of the system.

    Here is a really simple example:

    y”(t) = -g

    Here y”(t) is the acceleration of an object. “y(t)” is a dynamical variable… it evolves over time in response to external forces in this example.

    Here “g” is the sole parameter of the system, it is the input into the problem that govern the behavior of the system. In this case, “g” is “parametric constant.” You can also have parametric functions, e.g., see the parametric oscillator.

  44. lucia (Comment#30817) January 19th, 2010 at 6:46 pm

    from memory there was a thread at RC where I brought this up.

    Gavin suggested what you did, that some models do well on temp and poorly on precip. others the other way around. I recall at the time I directed people to the Taylor diagram..

    Anyways, A couple other points. Why have models with no volcanos.

    Finally. there are many models. some only have 1 run in the mix;
    others have 2, 3, 5 etc.

    the average of the models ( seems to me) ought to be an average of the model average, not an average of all the runs..
    by that I mean WRT calculating the N in the sd about the model mean

  45. Lucia:

    The average surface temperature over N models also contains noise arising from the internal variations in the models

    I don’t agree with this at all. The differences between models need have nothing to do with “noise”.

    For example, if one model includes convective effects associated with water vapor and the second doesn’t, the first is a better representation of reality than the second, and the difference between the two models is not a noise, but an error in the simpler model.

    Put it another way, Nick’s point is basically valid, though you still have variability in the climate system associated with weather. This might be handled by running a series of simulations with small “tweaks” of the parameters of the system (e.g., Monte Carlo simulation) to produce a realistic measure of the natural variability of the system, which in turn could be folded into the testing for goodness of fit.

  46. Yippee.

    Jules and james sem to concur with the english major.
    one member one vote

    So our conclusion is that all this worry about the spread of the ensemble being too small is actually a mirage caused by a misinterpretation of how ensembles normally behave. Of course, we haven’t actually shown that the future predictions are good, merely that the available evidence gives us no particular cause for concern. Quite the converse, in fact – the models sample a wide range of physical behaviours and the truth is, as far as we can tell, towards the centre of their spread. This supports the simple “one member one vote” analysis as a pretty reasonable starting point, but also allows for further developments such as skill-based weighting.

  47. Shoot–
    I don’t think it’s conceptually possible to subtract the observation form the trend individually. If we do that, the series aren’t linearly independent and I can’t just do a t-test on the group. I need to think about that.

    I think they way I’m doing it doesn’t violate the assumptions. (I might run some montecarlo after some thought.)

    But either way: The method I’m using still includes noise in the multi-model mean.

  48. lucia (Comment#30883)
    “But either way: The method I’m using still includes noise in the multi-model mean.”

    That’s good – but could you please explain how you estimate and include it?

  49. HankHenry (Comment#30870)
    My physics lecturer defined a parameter as a variable constant.

  50. Nick–When you average over 22 models, the resulting time series is still noisy. That is today: Supposed we create a synthetic trend with a known trend and some noise process. Now, create 22 s noisy series and average them.

    The resulting series is still noisy– though less than the original series. So, we can imagine this average is approximately equal to the underlying series. (It will be if we run an infinite number of series and average.)

    If you hand someone the 22 series, they can try to determine the underlying trend two ways:
    1) Average the 22 noisy series, fit a trend to the average. They can try to diagnose the type of noise process (ARMA, AR, whatever) and then based on that diagnosis estimate the uncertainty in the mean trend of the process trend from this one noisy series

    2) Or they could compute 22 trends, then average. They could then estimate the difference between their sample average and the underlying trend in the ordinary way for any sample of 22 trends. That is, the find the standard error in the mean (SE= Stand. Dev/ sqrt(22-1) ), then find the multiplier for the 95% confidence intervals (about 2) and the 95% confidence intervals are about 2* SE.

    Both methods work. The first relies on being able to diagnose the type of noise structure. The second does not. In this example above instance, you’d probably do the second method, because you don’t need to diagnose the noise structure. But in more complicated problems, you may not have the option to do the second method for some reason.

    In my recent posts, I’m using method (1) because when comparing the models to the multi-model mean, it is the earth’s noise that dominates. So, we can’t do method (2) for the earth– we have to diagnose the type of noise. Unfortunately, the “noise” is not the same thing as “the residuals from a linear trend”. Some of those residuals are due to volcanoes. So, instead, I get that “signal” out by subtracting the multi-model mean. But, just as some noise remains in the mean described in (1), some noise remain in our multi-model mean for the AOGCM simulations. So, those residual to a fit of the difference between observations and models are not purely earth’s weather noise,there is also a contribution from the noise remaining in the multi-model as computed from a finite sample of models.

  51. Carrick==

    The differences between models need have nothing to do with “noise”.

    The differences between monthly mean surface temperatures across models have contributions from both the “noise” (i.e. model internal variability) and the “signal” (whatever the model would predict if you ran that AOGCM one bajillion times.)

    Nick is worried I’ve lost the noise part and so I’m claiming to detect a bias when, he thinks, noise would drown that out. I haven’t lost the noise. This test is saying the bias — the part that you are talking about– exists and stands out above the noise.

    Even 22 runs from one individual model isn’t enough to take out the noise. Even if it was, the effect of that noise would be neglible in any t-test to determine if the mean trend matches the observation. IF we actually had one bajillion runs of each model, the only uncertainty in comparing the model mean trend to the observations would be the uncertainty in the trend associated with the observation.

  52. The test I’m applying support the notion the ‘star’ in James cartoon is not co-located with the center of the models– which would be the bullseye in the skewed target below:

    You get to chose whether vertical or horizontal is temperature trends– but basically, the tests show the temperature is not equal to the “bullseye” temperature. The observed temperature trend is low.

  53. Re: Carrick (Jan 20 09:51),

    We’re on a road where the costs of fossil based energy sources must increase over time, due to increasing scarcity, increased production costs, and decreased efficiency of the fuel. At some point, we will cross a threshold where alternative energy sources become more economically attractive, and the switch to the alternative fuels will happen in a natural, market driven way.

    Yes, sort of. However, the economics are not as clean as some would have you believe. Say oil is $X/bbleq and switchgrass ethanol is $2X/bbleq. If oil increases to $2X/bbleq then switchgrass ethanol becomes attractive, right? Not really. Because the cost of the energy to produce the switchgrass ethanol has to be factored in so when oil doubles, then switchgrass ethanol production cost also increases. If we’re really, really lucky, the factor for the rate of increase of switchgrass ethanol compared to oil is less than one, that is the ratio decreases from the original factor of two, and the price curves cross some time not too far away. Of course technology could play a role as well. If someone invents a magical process to separate ethanol from water that uses much less energy than distillation, the ethanol price would go down and the curves would cross sooner. I’m not holding my breath. That also assumes that switchgrass ethanol is a primary energy source. Is the energy returned igreater than the energy invested to produce it? An EROEI of greater than 2 would be nice. I’m not sure that’s anywhere close to true either.

    Of course you could tax oil and subsidize ethanol to make the price competitive. We’ve already seen the economic distortions from that sort of thing with corn ethanol. Not a pretty sight.

    As far as a smooth transition, that’s not necessarily in the cards either. It takes time to design, permit, construct and bring on line large scale alternate fuel plants, even longer if the misanthropic end of the green spectrum throws a monkey wrench into the works with legal challenges every step of the way. The process can’t even start until someone is willing to commit the money. That size of investment means the investors must be very confident of profitable operation. That may not happen fast enough to prevent an economic crisis that will make current problems look trivial.

  54. Lucia:

    The differences between monthly mean surface temperatures across models have contributions from both the “noise” (i.e. model internal variability) and the “signal” (whatever the model would predict if you ran that AOGCM one bajillion times.)

    If you had two models trying to simulate the exact same physical processes but approximating the physics differently, you could call that “”internal model variability”. (I do that sometimes too…write source for different ways of approximating a result, and using the degree of agreement as a measure of the uncertainty in the calculation).

    The problem here though is that the models aren’t all trying to solve the same physics, and if you have one that includes a physical effect and another that doesn’t, the model that includes the physical effect (if it matters) is going to be a better model.

    The difference between the models in that case is just a measure of how important that physical effect is, rather than a statement of uncertainty in the models where none exists.

    I’m not going to sweat if I can’t get you to agree with me, but only in climate have I seen this approach of lumping models together used. Any other field I’ve seen, you compare the models individually with the data and the ones that have the best goodness of fit wins. And if none of the models describe the data well, you try and understand what is left out, and incorporate it later.

  55. Carrick–
    I have agree with you. But if you see two runs from the same model, you will notice the each predicts different monthly mean temperatures for matched months. This happens because the dynamics in the models are non-linear and so solutions exhibit “turbulent-like” or “chaotic” type variability. That’s the “noise” Nick is worried I have averaged out.

    For the cases with replicate runs, you can tell that the mean trends differ for two models. That is: if you average over many runs, you can see that model “A” shows a distinctly higher trend than model “B”– on average. These differences do arise because of different parameterizations, or the decision by one group to include an effect vs. another group to exclude it. These differences are not “noise”. It’s “signal”.

    But if we want to report whether the “signal” in models A and B differ, we need to recognize that the actual time series we have did include “noise”. I’ve done that for observations and the model mean. Given the number of models, and the magnitude of the “noise”, the trend over time is large compare to the noise. This means the “signals” in the multi-model mean differs from the earth’s trend and it is sufficiently large to be distinguished by noise.

    What Nick is suggesting is I did something to neglect the magnitude of the noise in my tests. I didn’t. I included it.

  56. DeWitt:

    Yes, sort of. However, the economics are not as clean as some would have you believe. Say oil is $X/bbleq and switchgrass ethanol is $2X/bbleq. If oil increases to $2X/bbleq then switchgrass ethanol becomes attractive, right? Not really. Because the cost of the energy to produce the switchgrass ethanol has to be factored in so when oil doubles, then switchgrass ethanol production cost also increases.

    Ethanol is a very bad example, unless you are getting it from beets, choose biodiesel if you want to discuss a more practical biofuel.

    I was thinking about solar or wind power, which are only weakly linked to the cost of energy production. Wind power is already break even, and there will be a price point where solar powering your home is an economically efficient alternative to large power plants.

    Another example comes from the agricultural facilities where they are composing manure to generate electricity via the methane generated by the anaerobic bacteria to provide power to the farm and also to the grid. At the moment, this only works for large farms (economy of scale) but as energy prices go up over time, this technology will become economically feasible even for smaller farms.

    Not everything scales directly with the cost of petroleum.

  57. Lucia:

    I have agree with you. But if you see two runs from the same model, you will notice the each predicts different monthly mean temperatures for matched months. This happens because the dynamics in the models are non-linear and so solutions exhibit “turbulent-like” or “chaotic” type variability. That’s the “noise” Nick is worried I have averaged out.

    I’m not sure what Nick was worried about (or at least my interpretation of what he said is different than yours), but I think that what you do for multiple runs of a given model is correct.

    But I think the approach taken by IPCC, Santers and others of combining these models OTH is flawed. It is interesting to test it since it is a manipulation that they do in this field (probably because it artificially inflates the variance) and it’s interesting even with the inflated variance the models still don’t do a good job of describing the data over longer time periods.

  58. Carrick–

    But I think the approach taken by IPCC, Santers and others of combining these models OTH is flawed.

    I agree. Lumping all the models together to create a best estimate isn’t a method that works for different models in other fields, so why should it work here?

    As it happens, the mean trend you get from that proceedure disagrees with the data.

  59. Dewitt, my brother took me to a switchgrass seminar put on by the University of Illinois. The new thing seems to be miscanthus giganteus over switchgrass. There was also a man from Celegene at the seminar who talked about Celegene’s great interest in searching out new genes while also saying everything they were learning was proprietary.

    What you say about fuel costs involved in production is very true. The advice given was: don’t bother with this unless there is a processing plant nearby…. since the crop is so bulky. The optimistic point that was made was that once they crack the problem of turning cellulose into carbs or fermentable sugars these switchgrass/miscanthus processing plants can be annexed right onto the existing corn ethanol plants. Another fascinating thing I discovered was that during world war 1 there was a plant in Terre Haute Indiana that turned corn into Acetone, Butanol, and Ethanol in something referred to as the ABE process. (At that time the shortage was acetone which was needed to produce cordite for the war.) The man that developed this ABE process is considered the father of industrial fermentation and was named Chaim Weizmann. He’s an interesting story in and of himself.
    I tend to agree with you that the development of ethanol as the fuel we all use has some big drawbacks (from the standpoint of both political policy and practical economics). It was interesting to learn though that when the times called for it industrial fermentation was an option used to relieve a shortage.

  60. Lucia #30886,
    I still can’t see how you are taking account of the inter-model variability in your significance test, using method 1. You say “They can try to diagnose…”, which sounds like you haven’t. If you just test that average, as a number, against the observed weather trend, then you’d get the same answer whether the models varied or in fact all predicted exactly the same trend.

    You mentioned a t-test. That sounds like the right idea. You test the difference between two means, allowing for the variance of each. But what variance did you attribute to the model mean (trend)? It sounds to me like it was zero.

  61. Nick–
    I treat the noise process as ARMA(1,1) for the purpose of estimating the uncertainty intervals. This may or may not be correct– but the residuals do appear to follow this form of process. We can argue about whether or not this looks right– but it seems like a reasonable fit. If you want to discuss this fine, but my wording above reflects the fact that if we are discussing a hypothetical situation, in some cases, the people looking at the data might not have sufficient data to make a reasonable diagnosis. That they might only be able to try to diagnose in some situations, and only succeed in others doesn’t give you any information about whether I have or I have not.

    The noise process looks like ARMA(1,1) here. Is it for sure? No. But when testing these models, we can’t avoid the need to assume a form of a process and analyze based on that.

    If you just test that average, as a number, against the observed weather trend, then you’d get the same answer whether the models varied or in fact all predicted exactly the same trend.

    I don’t know what you are even trying to say here.

    What do you even mean by testing an average as a number? Or get the same answer as what?

    You test the difference between two means, allowing for the variance of each.

    No. Computed the model mean monthly surface temperatures based on for the 22 models.

    This creates a time series (1). which contains noise.

    Then I subtracted the monthly mean surface temperatures from GISS, which is, itself a time series.

    When I subtract time series (1) from (2) I get a different time series (3)

    If the models were faithful to observations, time series (3) would have no trend. Also, if the models are faithful to the observations, subtracting (1) from (2) eliminates the shared volcano signal, which is not due to weather but due to shared forcing. So, we are left with a time series that should not contain persistent, temporally autocorrelated deviations from zero known to be caused by volcanoes.

    However, time series (3) contains noise arising from earth weather in (2) and the residual noise from the average over the 22 models in (1).

    So, then I do a time series analysis on the time series (3).

    I’ll clarify a paragraph in the post to make it clear the difference is difference in monthly mean temperature, and so the result is a time series with monthly values.

  62. I thought there was a swathe of evidence that the GISStemp data has issues.
    John Theon has an interesting perspective
    http://epw.senate.gov/public/index.cfm?FuseAction=Minority.Blogs&ContentRecord_id=1a5e6e32-802a-23ad-40ed-ecd53cd3d320

    From the perspective of assessing the various models, surely somewhere there is a simple chart of the climate models – processes; assumptions; datasets; physics; etc.

    All I can find so far is FORTRAN code and input data – at least it’s there – so will probably take outsiders years of work and supercomputer access to review. : (

  63. Mosher

    Annan doesn’t agree with you as he has climate sensitivity much higher than your 1.5.

    Lucia

    You didn’t explain why ‘about 0.2/decade’ translates to ‘exactly 0.2/decade’. Shouldn’t you ‘target’ level have a range; your red bar should be thicker.

  64. Nathan–

    You didn’t explain why ‘about 0.2/decade’ translates to ‘exactly 0.2/decade’.

    Why would I explain that when I’ve never said it does?

    Zero is zero. The red bar should be smaller, but I want people to be able to see it.

  65. Nathan–

    Annan doesn’t agree with you as he has climate sensitivity much higher than your 1.5.

    Oh… and you should work on your reading comprehension skills.

    Mosher didn’t say or suggest climate sensitivity is 1.5. You will note the comment doesn’t use the word climate sensitivity. Also he is commenting on an article that does not discuss climate sensitivity. Also, if you know anything about dimensions (Length, Time, Mass) you will notice the dimensions after 1.5 do not correspond to a climate sensitivity.

    It is rather is bizarre that you should decide Mosher is discussing climate sensitivity.

  66. lucia #30906,
    Thanks for that explanation, which clarifies a few things. I had understood that you computed the trend of each series (ave model and GISS) separately, and tested the difference between those numbers. Hence the reference to t-tests.

    What you have done gives the same trend difference result, but the testing does take account of the month-to month variation of the 22-model average. What it still doesn’t take account of is the variation in the population from which that average came. That went out when you did the averaging, and plays no part in the tests.

    You could relate it to a question of what the 95% confidence means. It means roughly that, if you could somehow repeatedly redo the test allowing all the things that could vary to do so within appropriate distributions, then in 95% of cases, the trend of the difference series would be negative. So it’s a question of what is meant by “all the things that could vary”. You’ve allowed for different observed “weather”, and for models making different time approximations to weather. But you have required implicitly that any rerun would be done with the same 22 models with the same parameter settings (I think this point is what Carrick was getting at too). That isn’t very interesting, because the IPCC has clearly shown that different models give widely different results. There are many ways you could have the subsets (like your 22), and to show that the models generally overpredict, you’d have to show that the result wasn’t sensitive to that particular choice. In other words, do something about variance between models.

  67. Carrick,

    “I was thinking about solar or wind power, which are only weakly linked to the cost of energy production. Wind power is already break even, and there will be a price point where solar powering your home is an economically efficient alternative to large power plants. ”

    Break even in what respect?? Please give me some links on this as I do not know of any installations that have reached this point!!

    Britain’s wind was a spectacular failure this winter for instance. EVERYWHERE I have read about either subsidises or does not have wind.

  68. Nick

    What it still doesn’t take account of is the variation in the population from which that average came. That went out when you did the averaging, and plays no part in the tests.

    This is just wrong. The variation in monthly values across the population contributes to the nose in the time series for the average.

    Go ahead and run monte carlo for this:
    1) Create 22 synthetic series with known white noise. (To make it simplea and to avoid your having to figure out a noise model for the residuals.)
    2) Compute the mean monthly “temperature” value over the 22 series call that a “model mean”.
    3) Compute the trend m for the model mean series in step 2 and also compute the uncertainty in the model mean from a linear regression. Call this sm. (The formula should be in your sophomore year statistics book.)
    4) Repeat a bajillion times.
    5) Compute the average “m” from the bajillion runs. Call that M.
    6) Compute the average “sm” from a bajillion runs. Call that SM.
    7) Compute the standard deviaton of “m” based on teh bajillion values of “m”. Call this SD.

    Compare SD2 to SM2. (Compare variances because SM2 is the unbiased estimator; the standard dev is biased.)

    Note the similarity in SD2 to SM2. Then think about that.

    Next, if you want, expand the method to examine the ratio of (m-mtrue)/sm = t where mtrue is the trend you used to generate the series and m and from sm are the values from step 3 and 4 above, and using number of degrees of freedom (which is the number of data in the time series – 2), find the critical value for this ratio if that case accepts or rejects the null at whatever confidence interval you used.

    Then, keep track of all of the accepts/ rejects and see if you accept/ reject the true trend at the rate consistent with the confidence interval you.

    Then think about that.

    After you figure out that you are working from a premise that is just wrong, and that we can perfectly well average the 22 series and then compute the trend or compute the trends and then average, we can discuss this more. But for now, you are just wrong.

  69. Lucia
    When you say that “a difference of -0.04 C/decade represents roughly 20% of “about 0.2 C/decade”, ”
    you fail to note that the about 0.2/decade has only one significant digit, and spans a range encompassing 0.150000001 – 0.2499999999 etc, so the roughly 20% statement is not really accurate is it? It’s actually within the range of ‘about 0.2/decade’.

    As to Mosher’s 1.5 climate sensitivity, he has made that claim before. All I was saying was that it may look (to Mosher) that Annan is supporting his general idea’s as a Luke Warmer (which is where he is going with all this), but in fact Annan supports a higher sensitivity even though he understands the models are ‘one vote one member’ or however Mosher put it.

  70. Lucia
    ” The variation in monthly values across the population contributes to the nose in the time series for the average.”
    Well, it diminishes the noise. If all the models gave closely similar results, you’d get more noise, and less likelihood of significance.

    But it’s not getting at the main issue, which is model selection. What statistical significance means is that you’d probably get the same result if things that you can’t specify had turned out differently. One of the things that should be allowed to turn out differently is model selection. There’s nothing in your analysis that reflects that. There is no number that actually measures the model spread.

  71. Nathan–
    You fail to realize that I know that the IPCC rounded down when they said “about 0.2 C/decade”. We know the actual numbers they projected, and I’m actually being nice to them.

    As for your statement about what Mosher claimed before, maybe you should find the claim and cite it before just posting a response to some comment you think you remember from some conversation somewhere else. It’s really silly to just change the topic of conversation out of the blue with no warming and without giving people reading any clue what you think you are talking about.

  72. Nick–
    The null hypothesis assumes all individual models and the observations are drawn from a population with the same underlying trend. That is: The spread is all due to internal variability. If this is not so, then at least some of the models are wrong.

  73. Nick– Out of curiosity, why do you think the “main” issue is model selection as opposed to whether some models give false trend? I don’t see one as being any more important that the other. Why would you bother with model selection (your main issue) unless you believed (or concede) that some IPCC models are false (that is give incorrect biased projections)?

  74. Lucia

    Mosher would know what I am talking about.

    “You fail to realize that I know that the IPCC rounded down when they said “about 0.2 C/decade”. We know the actual numbers they projected, and I’m actually being nice to them.”

    I don’t think you’re being “kind to them”. It isn’t being particularly kind If their claim is that the first two decades are ‘about 0.2C/decade’ then attempting to show they are wrong when reality doesn’t reach exactly 0.2C/decade.

    Over at Open Mind Tamino clearly shows that the present data lies well inside the spread of models, you don’t think this shows that the models are doing a reasonable job of representing reality?
    And James Annan clearly articulated why using the model average as some sort of target/bullseye isn’t appropriate, the model average isn’t some measure of how well the models do compared to reality.

  75. Nathan–

    I don’t think you’re being “kind to them”. It isn’t being particularly kind If their claim is that the first two decades are ‘about 0.2C/decade’ then attempting to show they are wrong when reality doesn’t reach exactly 0.2C/decade.

    No it would not. What I’m telling you is if you either a) look at the tables or b) download the projections you will disocver that they rounded down. They projections call for more than 0.2C/decade.

    For some reason, you want to ignore the more detailed information in the report and also don’t want to look at the projections and speculate they might have rounded up from some lower trend. Well. They didn’t.

    And James Annan clearly articulated why using the model average as some sort of target/bullseye isn’t appropriate, the model average isn’t some measure of how well the models do compared to reality.

    Yes. James is now clearly articulating why he thinks the method the IPCC decided to use isn’t approprate. It remains to be seen whether the IPCC will listen. However, it is the method they used. It turns out it can be shown it didn’t work.

    I’ve read the stuff at Tamino. He does not show what you think.

  76. lucia #30933
    “Nick– Out of curiosity, why do you think the “main” issue is model selection as opposed to whether some models give false trend?”

    All the models give a false trend. They give a range of estimates. You want to know whether the estimates are helpful. There are at least three sources of discrepancy between the models and the observed trend:

    1. Models in the population are disposed to predict different trends. That’s structural.
    2. From a given run of a model, you have some uncertainty about what the trend is (reflected in a regression). That’s the month to month variation issue, which is reflected in any model averages.
    3. The observed data also has an uncertain trend, also reflected in regression analysis.

    When you ask how reliably models relate to the observed trend, you need to account for all these sources of variation. You’ve allowed for 2) and 3), but omitted 1). But it’s there, and acknowledged. And it isn’t that some are true and some false; none is exactly true, and we don’t know a priori which are best. So we take an average of a group, knowing that because of the variation, different choices would yield different results. Since we don’t want a result that is tied to some specific choice, the variation due to choice needs to be added to the overall variance. The IPCC does that implicitly with their colored bands on the graphs.

    That is what’s missing in your Monte carlo thought experimant above. The 22 white noise models are identical, and your just showing that variances don’t depend on which order you aggregate in (just an ANOVA kind of thing). I think you could have gone further – you’d get the same results from 1 model with 22 bajillion iterations.

  77. Nick–
    Oh? You think the bias in models is ackowledged? It’s acknowledged in some places but not others. For example, Esterling and Wehrner and other recent papers describe the full range of trend exhibited by models as reflecting internal variability and so explain recent low trends as consistent with weather. These comparisons of the observed trend to the full distribution implicitly assume a null hypothesis that the trend in models matches that of observations and that all trend in models equal each other.

  78. Lucia

    “I’ve read the stuff at Tamino. He does not show what you think.”
    So what do you think it shows?

  79. lucia #30938
    It’s not a systematic bias issue. Models just solve a whole lot of PDE’s with physical modelling assumptions. They do it differently, and so will always come up with somewhat different estimates of specific trends etc. That’s universally acknowledged by anyone who knows anything about models.

    It doesn’t make sense to talk of bias here, because they aren’t designed specifically to generate trends – we don’t know exactly what trends they will generate in any given situation, and we don’t know what is correct.

  80. Nick–

    That’s universally acknowledged by anyone who knows anything about models.

    Nevertheless, peer reviewed papers published within the past year specifically assume the contrary in order to claim the models are consistent with the observations. So, “universal” must mean, except in peer reviewed papers in climate journals that are based on the contrary assumption.

    What do you mean it’s not a systematic bias issue? Either the mean over many runs of a model track the truth or they it doesn’t. If it doesn’t, that’s a systematic bias between what the model predicts and reality. This simple definition doesn’t go away if you incant “PDE” or “modeling assumptions”.

    It doesn’t make sense to talk of bias here, because they aren’t designed specifically to generate trends – we don’t know exactly what trends they will generate in any given situation, and we don’t know what is correct.

    What we know is that the observations and the multi-modle mean almost certainly not generated from processes with the same trend. We don’t need perfect knowledge of the individual trends to know this. We don’t need to know waht trends the models will create before we run the models.

  81. Nathan (Comment#30929) January 20th, 2010 at 8:45 pm

    “As to Mosher’s 1.5 climate sensitivity, he has made that claim before. All I was saying was that it may look (to Mosher) that Annan is supporting his general idea’s as a Luke Warmer (which is where he is going with all this), but in fact Annan supports a higher sensitivity even though he understands the models are ‘one vote one member’ or however Mosher put it.”

    No. Here is my point. there are for the sake of argument 17 models. In the IPCC cases you will find that some of these models only submit 1 run. Others submit 3 or 5 or whatever. So maybe you have a total of say ( pulling a number out the hat) 60 “runs”
    My gut has always told me that it makes more sense to average
    the runs per model and then take an mean of all the models
    so N =17. I dont know why I think that. perhaps because I wouldnt want a bad model polluting the sample with a large number of runs. For example, I build a really simple but bad model and I run 100 runs. Should I average those 100 bad runs with 1 run of a good model? Probably not. I should create a model average first
    SO : one model one vote. THOSE are james and jukes words.

    WRT the number 1.5C that’s my number for the heart of lukewarmerism. Tom Fuller, for example, would go as high as
    2C and I’m betting some might go as low as 1.

    That is independent of the methodological point I’m making about how to average models. But Again, I said that was my gut feel. ( hey jones had those about the MWP) one model; one vote.

    In my ideal world, people would just adopt a standard model
    and everybody would…

    1. submit proposals for improving it and those changes would
    get approved/disapproved.

    2. Run a bunch of runs with that model.

    I liken it to the situation I worked in. we were handed “blessed” code for doing radar cross section estimation. Was it perfect?
    hell no. Did it get us in the ball park? Yes. Could we suggest improvements? yes, a formal process. Did the code get better?
    yes. Did it ever match the “pole” tests. no. same with CFD?
    I dunno I didnt work with that stuff.

    Anyways, I was just glad to see that annan ( a much bigger brain)
    agreed with my gut feel. thats all

  82. Nathan–
    I’ve already said commented on his correction for response to volcanoes which is better than nothing, but which, alas, does a less good job at dealing with the effect than another simpler method. Feel free to read that post. http://rankexploits.com/musings/2010/dealing-with-volcanic-eruptions-when-testing-models/

    When the variability due to volcanic eruptions is dealt with properly, the uncertainty intervals are smaller than when they are dealt with using a method that does not match the physical response of the climate to volcanic eruptions. In other words: Tamino chose a method that might appear to correct for volcanic eruptions but if one understand the physics, is likely to do a poor job. The net is effect is to correct less well than it should, resulting in larger uncertainty intervals than if one does a phenomelogically sound correction.

  83. Lucia

    Talking about a different post. Showed the recent GISS temps against the IPCC model runs in a post called “Models”. Might be the most recent.

    Had a look around in Chapter 10 of the IPCC, couldn’t find the tables you were talking about. Got a link handy?

  84. Nathan

    You’ve got things mixed up.

    1. one vote one model is my gut feel about the right way to average. annan agrees. FWIW. I dont know, I have an argument
    why I think that. but I’m reading and listening first to others
    who may know better.

    2. I think the model estimate is biased high.

    If the IPCC says 2.C I’m bettting for at least 15% below that. I’m happy with a 1.5C per century number.

  85. Nathan

    Dr. Curry has the strength of her convictions to post at CA and answer questions.

    Tammy

    1. wont let lucia ask questions on his site
    2. wont come here to defend his work
    3. relies on people like you to carry his water

    and your bucket has holes.

    you dont understand what he says or what she says, you just
    confuse the discussion.

    Nick at least speaks for himself. I suggest you listen to him for the best arguments and learn. (carrick too)

  86. lucia (Comment#30944)
    Nevertheless, peer reviewed papers published within the past year specifically assume the contrary in order to claim the models are consistent with the observations.

    Can you give an example?

  87. Steven

    I’m not here for anyone else.

    Luca doesn’t really answer my questions, for example
    “I commented on that too. Tamino has chosen a worthless method that hides the discrepancy.”

    That’s not an answer. That’s a dismissal.

    She claims the IPCC is for some reason talking down their models, when they say to expect 0.2C/decade. Sounds like they’re just being conservative to me.

    She says she knows that Annan thinks the bullseye method is flawed, yet goes on to use it herself in this post. What’s the point in that? The reason I bought it up was to see if agreed with his sentiments. I still don’t know. To do what she has done here I think she first needs to demonstrate that the Multi-model mean should approximate reality. If she can’t do that, then why bother with the post?

  88. I would like to see a model run ignoring the biased land-based data and just using satellite data, since all the model runs using land-based data from GISSTemp have warming bias.

    I would like to know what numbers are coded in various models for:
    * climate sensitivity parameter
    * net annual radiative forcing

    If they don’t hard code these numbers, I’d like to know which data they use to derive them.

  89. I don’t know much about models and how they are utilised, but I can use the IPCC scientists own words to find out. e.g.

    Filippo Giorgi, Senior Scientist and Head of the Physics of Weather and Climate Section of The Abdus Salam International Centre for Theoretical Physics in Trieste, Italy,writing to fellow IPCC lead authors…..

    “We said that one thing to look at was the agreement with the old data and thus I noticed that relaxing the criteria determining what “agreement” means would yield a greater agreement”.
    “…but the fact is that in doing so the rules of the IPCC have been softened to the point that in this way the IPCC is not any more an assessment of published science (which is its proclaimed goal), but the production of results. The softened condition that the models themselves have to be published does not even apply, because the Japanese model, for example, is very different from the published one which gave results not even close to the actual … version …. Essentially, I feel that at this point there are very little rules and almost anything goes. I think this will set a dangerous precedent which might undermine the IPCC’s credibility, and I am a bit uncomfortable that now nearly everybody seems to think that it is just ok to do this”.
    “Do we soften our requirement, i.e. from “all the models except one need to agree with each other” to “all the models except two need to agree with each other” agreement? I do not feel strongly about it but am more in favor of not softening the criterion. We are looking for confidence and model agreement and should have stringent requirements on it”.

    In short, the authors had set a standard for model agreement previously. But upon preparing the report, they find their models do not “agree”, so they decide to change the standard AFTER THE FACT.

    Borders on fraud doesn’t it? Kudos to Filippo Giorgi

    Any thoughts?

  90. Lucia,
    Chad has a post which correctly emphasises the role of the variability of the model results. Here is a typical plot, showing the error bands of the weather noise, and the ranges of the various model results. The error bands don’t seem to use AR(1) type corrections, which would make them wider. And there are clusters of results from each of a few model types. But the idea is there.

    You can see the importance of the model variability. Your earlier kind of analysis would draw a horizontal line on that plot corresponding to the model average and say that because it lies outside the bands, the “IPCC central tendency” is falsified. Your latest kind would do something similar. But the reality shown here is that the models have a spread, and the observed (GISS) band is within that spread. That’s what your analysis leaves out.

    Now one might say that the models had too much spread to be good predictors. Comparison with recent weather doesn’t change that – they are what they are, and you can only falsify them as they are.

  91. Models just solve a whole lot of PDE’s with physical modelling assumptions. They do it differently, and so will always come up with somewhat different estimates of specific trends etc. That’s universally acknowledged by anyone who knows anything about models.

    Oh NO !
    Models don’t solve any PDE even in the craziest interpretation of the word “solve” .
    Especially if it is implied that the “whole lot” contains the N-S equations .
    Would they do that and assuming that the N-S equations have a unique solution they would all agree with each other within the bounds of the chaotic regime .
    THIS is what is universally acknowledged by everybody who knows something about PDE in general and N-S in particular .
    It should be forbidden to propagate fairy tales like that .

  92. Baa Humbug (Comment#30963) “In short, the authors had set a standard for model agreement previously. But upon preparing the report, they find their models do not “agree”, so they decide to change the standard AFTER THE FACT.”

    I hear an echo! And I could just scream. (Little old me just said this the other day when discussing the scientific method.)

  93. Nick

    Can you give an example?

    Easterling and Wehrner. There are more, but I’ll get them after I get back from Highland Park. (Popsie is in the hospital again.)

  94. Nathan–
    You asked me

    Over at Open Mind Tamino clearly shows that the present data lies well inside the spread of models, you don’t think this shows that the models are doing a reasonable job of representing reality?

    I answered

    “I commented on that too. Tamino has chosen a worthless method that hides the discrepancy.”

    I could have made id clearer by saying.

    “I commented on that too. Tamino has chosen a worthless method that hides the discrepancy. No. I don’t think what Tamino did shows the models are doing a reasonable job at representing reality.”

    I would have thought in context, you would have inferred that if I think he chose a worthless method of comparison, and that it hides the existing discrepancy, that no, I don’t think his comparison shows the models are doing a reasonable good job at representing reality.

    That is, unless you define “reasonable” as “Off by roughly 40%, and with that difference to be statistically significant.”

  95. Nathan

    She says she knows that Annan thinks the bullseye method is flawed, yet goes on to use it herself in this post. What’s the point in that?

    You are a very confused person.

    Annan is not the IPCC. He is not even invited to the elite group of people who get to officially evaluate models. There is no rule against my happening to agreeing with Annan that the bullseye is flawed. That if you use the bullseye– which the IPCC does, you get flawed results.

    Annan’s paper does not change the fact that the bulleye is the currently used consensus method.

    You didn’t like the fact that I was discussing the bulleye of the models being off before Annan conceived up, wrote and published his paper. You’ve been insisting I am wrong for whatever Nathan reasons you have. Now that his paper concurrs that the bullseyes is flawed, you somehow think I don’t get to point out the the models are off because… I’m right?!

  96. I don’t mean to disturb the colloquies but I see Autocad 2010 has a new parametric feature. It looks like you can take an idea like, “henceforth I want these two lines to be parallel in this gadget I’m drawing” and it will do that. I wonder if that kind of capability is something modelers use when they tune their models?

    I also see “parametric” is kind of a buzz word in certain designer circles.

  97. Nick–
    I fully understand that and have done the Santer analysis. That said
    1) Tests in the literature this past year have treated the spread of all trends as nothing more than weather– not component due to model bias. So, this is consistent with the assumptions in those test.

    2) If I add the variability due to model spread, it’s not going to be as large as you think.

  98. Hank

    I wonder if that kind of capability is something modelers use when they tune their models?

    No. Parameter is a very generic word. Modelers do not tune their models that way.

  99. Nick–
    If I add in the variability across models, I then reject starting in ’50, ’60, and ’70. Later years don’t reject.

    I’ll post details later– probably next week.

  100. Nick Stokes (Comment#30965) offers links to an interesting set of graphs of the predicted trends of the models as plotted individually rather than an ensemble average.

    Is there any significance to the fact that even though error ranges mostly overlap with the GISSTemp observations, 75-90% of the means of the trends predicted in respective model are higher than the GISSTemp observations? Isn’t Chad’s graph an even more effective illustration of lucia’s essential point that the models (taken individually or averaged together) overpredict warming?

  101. George–
    It’s difficult to detect statistical significance from those graphs. IF the illustrated error bars overlap by nearly 30% may translate to differences that are statistically significant at the 95% confidence intervals. Or not. It depends on the relative size. When I do the santer type analysis, I’ve taken to plotting a variable he calls “d*” which is the ratio of the difference between the means to the pooled standard errors for each quantity. That way people don’t need to try figure out the pooled standard deviation by eyeballing two uncertainty intervals, and than mentally computing the square root of the sum of the squares of the two lengths.

    Chad has posted several time. The page Nick linked to mostly shows individual runs. But no, the fact that most the runs are high doesn’t necessarily tell us much. It can just mean the earth trend is in the low end of the range expected for “weather”. Since everything is compared to that one value, a lot of the model runs are high. Santers paper does discuss how to roll up all that to say something about the mean. Chad discusses that.

  102. Lucia

    “Annan is not the IPCC. He is not even invited to the elite group of people who get to officially evaluate models. There is no rule against my happening to agreeing with Annan that the bullseye is flawed. That if you use the bullseye– which the IPCC does, you get flawed results.

    Annan’s paper does not change the fact that the bulleye is the currently used consensus method.

    You didn’t like the fact that I was discussing the bulleye of the models being off before Annan conceived up, wrote and published his paper. You’ve been insisting I am wrong for whatever Nathan reasons you have. Now that his paper concurrs that the bullseyes is flawed, you somehow think I don’t get to point out the the models are off because… I’m right?!”

    He’s an author for the IPCC.

    No, you can’t say the models are off because using the bullseye method (as you are doing) is flawed. That’s what Annan is saying. He’s saying that you can’t use the model mean as some sort of index of how well the models are doing. So you can’t use the bullseye method to demonstrate the models are off. All you are doing is showing that the bullseye method doesn’t work.

    Consider your work and compare to his. You show that reality and the bullseye model mean don’t match. Then you stop and say ‘The models are flawed”, whereas Annan shows that the bullseye method is flawed, goes on to propose and alternate method, shows that reality better fits this alternate method and then even gives a go at estimating climate sensitivity. That’s science. Note the difference, you get to a problem and throw your hands in the air and claim it’s all wrong. He gets to a problem and works through the problem to find out what it means.

    My main (probably only) complaint that I bring up here is that your work doesn’t actually mean anything. You get to a problem or issue then stop. You do it time and again – it’s a political tactic.

  103. My main (probably only) complaint that I bring up here is that your work doesn’t actually mean anything. You get to a problem or issue then stop. You do it time and again – it’s a political tactic.

    QFT. McIntyre has invented, and refined, the method to an art form.

    Advancing the science means sitting down and doing the hard yards, but McIntyre has instead resorted to harrassment, bullying and the public pillory. Take away all that and seem him publish a paper, and the inconsequential product he creates sinks without a trace.

    And for all his demands for openess and auditing, where were he and his friends when the AR4 was created? There was a red flag in there about Himalayan glaciers, but they all missed it. Instead it was one of the ‘team’ who found the error and raised it. Team Audit fails again.

  104. Nathen and bugs,

    The first step to solving a problem is realising you have a problem. Lucia and Steve Mc keep showing the problems but The Team, and you guys, stay in denial. Once you admit there is a problem with the way ‘IPCC ‘climate science’ is done we can all move forward but until then your just speed humps.

  105. Carrick,

    there are computations then there is reality.

    I didn’t read the full comps. Do they include the increase in cost of energy MANDATED by the Federal Gubmint??

    I have read NOTHING that indicates that wind is going to break even in the near future. NOTHING installed is coming near to the expectations for output so the computations based on estimated output are not useful.

    As I mentioned before, the British farms produced near zero power during the recent cold snap. Previous years have been little better. What good is a power supply that only has rated output off-peak??

    http://www.telegraph.co.uk/finance/newsbysector/energy/6957501/Wind-farms-produced-practically-no-electricity-during-Britains-cold-snap.html

    The vaunted Danish wind farms would be a total disaster if they did not have power interchange where they can sell their excess production at discounted prices and have their NON-production replaced by hydro and nuclear from neighbors!!

    Their Wind is ooperating at a loss. The Danish success comes from their BURNING pulp and other bio-replaceables very effectively!!

    http://masterresource.org/?p=4839#more-4839

    On this page they claim that wind provides 20% of domestic energy supply. This is a misdirection as most of that is exported at a discounted price as it is generated when not needed!! This site has a number of surprises if you dig into it!!

    http://ens.dk/en-US/supply/Renewable-energy/WindPower/Facts-about-Wind-Power/Key-figures-statistics/Sider/Forside.aspx

    I find this masterresouce blog to be an excellent source of information.

    http://www.masterresource.org/2009/10/industrial-wind-technology-interview-of-jon-boone-by-allegheny-treasures/#more-5501

    CONSUMPTION of power by wind turbines!!

    http://www.aweo.org/windconsumption.html

    The future of Wind Turbines:

    http://nanok.com/vin/

    http://www.nbclosangeles.com/traffic/transit/Out-of-Control-Windmill-Blows-Plan-for-Sunday-Drive.html

    http://www.youtube.com/watch?v=HKkTUY2slYQ&feature=related

    http://www.youtube.com/watch?v=ZbMO7ufATBc&feature=related

    http://www.youtube.com/watch?v=4EmYe2u6J6g&feature=related

    http://www.youtube.com/watch?v=iDJpvzoh1Iw&feature=fvw

  106. AndrewK

    “Lucia and Steve Mc keep showing the problems but The Team, and you guys, stay in denial. Once you admit there is a problem with the way ‘IPCC ‘climate science’ is done we can all move forward but until then your just speed humps.”

    This is not actually true. Compare what James Annan did and what Lucia has done. Annan pointed out the problem, came up with a solution, and checked what the impact that had. What Lucia and Steve do is point out problems and stop. Which is pretty well useless. They don’t show what impact the probelm has, if any. They only want to show the IPCC is wrong, they don’t seem to care about how rong, whether it matters, what the impact of the error is etc.

    It’s not sufficient to say there is an error without analysing what that means.

  107. Nathan,

    Well if Lucia has shown the same (or at least similar result) as Annan then she has done something almost unique in IPCC ‘climate science’ — confirmed a result. This should be a matter of celebration not derision.

  108. Andrew Kennett

    She hasn’t done the same thing. James Annan pointed out that comparing reality to an ensemble model mean was not a good way of judging whether the models were accurately refelcting reality.
    So she did what James Annan claims is not good.

    perhaps what she has done is proven his point that the ensemble model mean is not a good approximation of reality. BUT that doesn’t mean the “models are wrong” as she stated.

  109. Nathan–

    James Annan pointed out that comparing reality to an ensemble model mean was not a good way of judging whether the models were accurately refelcting reality.

    You are very, very confused.

  110. Lucia

    Moving from the Truth centred paradigm (what you are using) to a Statistically indistinguishable paradigm.

    Form their paper:
    “We consider paradigms for interpretation and analysis of the CMIP3 ensemble of climate model simulations. The dominant paradigm in climate science, of an ensemble sampled from a distribution centred on the truth, is contrasted with the paradigm of a statistically indistinguishable ensemble, which has been more commonly adopted in other fields. This latter interpretation (which gives rise to a natural probabilistic interpretation of ensemble out-put) leads to new insights about the evaluation of ensemble performance.”

  111. Lucia

    ok, what is so special about the multi-model mean and why should reality reflect it?
    You cannot say “the models are wrong” until you demonstrate that reality has to follow the multi-model mean for the models to be correct.

  112. Nathan

    ok, what is so special about the multi-model mean and why should reality reflect it?

    There is no reason why reality should reflect it. People have been saying that here since the blog was established. The multi-model mean is, – as Annan notes “The dominant paradigm in climate science, “. He’s published a nice paper, but mere publication has not caused the community to instantly abandon that paradigm. It remains to be seen if they do. (They should– but who knows what will happen.)

    Regardless of what happens in the future, that the paradigm of the truth centered ensemble was the one chosen by the IPCC to make their projections, and since I test their projections as they made them that means I test the multimodel mean. If the IPCC had not chosen that paradigm for making projections, I would not test it.

    You cannot say “the models are wrong”

    I don’t know why you think this is a problem for me. After all, I what I say is that statistical tests indicate the IPCC projections from the AR4 are wrong. There is a difference between the projections and the models.

    Notwithstanding Annan’s perfectly reasonable observations, in the AR4, the IPCC used the multi-model mean to create their best estimate of projections. So, if one is going to test their projections, one will test the multi-model mean. The alternative is to test something that is not the IPCC projections.

    The fact that the IPCC may, in the future, come to admire the method Annan describes in his paper, does not change the fact that they did not use that method when creating projections back in 2006-2007. So, if one is to test their projections, one tests the multimodel mean. The reason is: That’s the basis for their projection.

  113. Nick Stokes,

    Chad has a post which correctly emphasises the role of the variability of the model results. Here is a typical plot, showing the error bands of the weather noise, and the ranges of the various model results. The error bands don’t seem to use AR(1) type corrections, which would make them wider. And there are clusters of results from each of a few model types. But the idea is there.

    I scaled the standard errors to account for lag-1 serial correlation.

  114. Lucia
    I brought the Annan paper up to see if you agreed with it – and I think you probably do, but you rarely actually say what you think is ture.
    You claim that you are testing the IPCC claims, now if you understand that testing the ensemble mean doesn’t actually mean anything, why do it? It just looks opportunistic.

    You did actually claim the models were wrong above

    “Now that his paper concurrs that the bullseyes is flawed, you somehow think I don’t get to point out the the models are off because… I’m right?!”

    You still keep stopping at the point where it gets interesting:
    “So, if one is to test their projections, one tests the multimodel mean. The reason is: That’s the basis for their projection.”
    So why do you just stop when you find it doesn’t work? That’s the point I keep making, as soon as it gets interesting, you stop. It’s not enough to just say ‘it’s wrong’, what does it mean for the multimean model to be wrong? Why is that important?

    It happens with other ‘findings’ here too. I seem to remember you agreeing with Mosher about a sensitivity of 1.5 and that you are both Luke warmers. Nowhere have I seen you define what this means, how it is different from higher sensitivities, what it means for proposed legislation etc. It would be very interesting to see if there is any difference at all in terms of what needs to be done.

  115. Lucia

    “…but mere publication has not caused the community to instantly abandon that paradigm.”

    Well, it has only JUST been published – or rather accepted for publication – so it’s a bit early right now, but if you look around you can see it’s caused a bit of excitement.

    heck you have been, apparently, going on about this for years – haven’t really caused much of a stir – so calling it a mere publication is doing it a dis-service.

  116. Hey Lucia — do you think that if you tested something other than what the IPCC used for its projections Nathan would attack you for using a strawman?

  117. Nathan–

    You claim that you are testing the IPCC claims, now if you understand that testing the ensemble mean doesn’t actually mean anything, why do it? It just looks opportunistic.

    I test the multi-model mean because that is the basis of the IPCC projection. Because it is the basis of their projection, when you test their projection, if one wants to test their projection, one tests the multi-model mean.

    If the IPCC used astrology to create their projections, and I wanted to test the projection that could be called the IPCC’s projection, I would test a projection based on astrology.

    I don’t need to think the basis the IPCC selected is good to test their projections. I just need to recognize what they did and test htat.

    So why do you just stop when you find it doesn’t work?

    Huh? If I’m testing the projections, I test them. If I’m cooking pancakes I can stop when the pancakes are cooked. I’m not required to fry bacon or make a souffle for people to say “Lucia cooked pancakes”. If they want they can even evaluate the quality of the pancakes. And guess what? They can stop at evaluating the pancakes and don’t have to say “Why didn’t you cook bacon too?”

    There are very few stupid questions in the world, but you seem to manage to ask them.

    It’s not enough to just say ‘it’s wrong’, what does it mean for the multimean model to be wrong? Why is that important?

    If the IPCC projections themselves had any importance in the first place (and their are those who claim they did), then the fact they are wrong has the same importance. If the projections were never important, then the fact they are wrong is umimportant.

    My impression is that policy makers would like to have some estimate of future temperature changes to make policy decisions of various sorts. If they do, then knowing the simulations are over projects warming by 40% since 1950 would be useful information.

    As for your memory on other things, it appears to be as bad as your reading comprehension, processing of simple logic, and understanding papers you read.

    It would be very interesting to see if there is any difference at all in terms of what needs to be done.

    The fact that you think a topic might be interesting does not obligate me to expend one iota of energy crafting posts for your express enjoyment. We have sometimes discussed these things on other threads, but it’s not really what I discuss most. If you honestly want to read discussions about that sort of thing, put on your big boy pants, stop whining and find a blog whose focus interests you.

    I discuss what’s interesting to me at this blog. That’s the way the world is.

  118. AndrewKennett

    Hey Lucia — do you think that if you tested something other than what the IPCC used for its projections Nathan would attack you for using a strawman?

    Who knows? But obviously, it make no sense to create alternate projections, call them “the IPCC projections” and test those, and then draw any conclusions about the accuracy of IPCC projections.

  119. Andrew Kennett (Comment#31008) January 21st, 2010 at 8:01 pm

    Nathen and bugs,

    The first step to solving a problem is realising you have a problem. Lucia and Steve Mc keep showing the problems but The Team, and you guys, stay in denial. Once you admit there is a problem with the way ‘IPCC ‘climate science’ is done we can all move forward but until then your just speed humps.

    You are confusing rejecting bad science from deniers with thinking they know all the answers and there is nothing left to learn.

  120. bugs:

    You are confusing rejecting bad science from deniers with thinking they know all the answers and there is nothing left to learn.

    The only one I know who says there’s nothing left to learn is Al Gore. “The debate is over,” he claims.

  121. Nathan,
    you said : “What Lucia and Steve do is point out problems and stop. Which is pretty well useless.”

    Why useless?
    When somebody (usually a reviewer of a paper) points to me a problem in my work, then it is a reason for me to look at it and redo the analysis or rethink the outcome. I surely don’t expect that the reviewers will set a different experimental take on the problem, completely of their own and send me their results.

  122. Re: Carrick (Jan 20 15:42),

    Wrt biodiesel and scaling with the price of petroleum, I was employed for a short time before I gave up in disgust by a since defunct local biodiesel plant. Their big complaint was that soybean oil from ADM was scaled directly to the price of petroleum. Since almost all biodiesel facilities are small, they have no leverage at all with the few suppliers of bulk vegetable oil. Other fat sources than pure vegetable are much more expensive to process because they usually have high levels of free fatty acids. Biodiesel transesterification (replacing glycerol with methanol) uses base catalysis so any free fatty acids kill the catalyst. Acid catalysed esterification to reduce the FFA content is much slower and hence more expensive than base catalyzed transesterification.

    As far as solar powered homes, we’re a very, very long way from that. Current price for a 246 watt (peak power) kit including 110 V inverter and battery control module at Amazon is $1,400. That doesn’t include the price of batteries for 24/7 operation or factor in the reduced total power capability for 24/7 operation. The average home uses about 50 kwh/day if they don’t use electric heat. For about the same price you can buy a 6.5 kilowatt diesel generator that can run 24/7. If you want an expensive but quiet way to keep your RV batteries charged, then these panels might work for you. But don’t expect to be able to run your AC and you’ll still need propane for the refrigerator, stove and heater.

  123. bugs,

    “You are confusing rejecting bad science from deniers with thinking they know all the answers and there is nothing left to learn.”

    Please show where you have found the information that Steve McI and Lucia are deniers. I am a denier and resent the comparison. Why they, they, they’re positively RATIONAL compared to me!!!

    Not to mention they are EXTREMELY rational compared to YOU!!

  124. LOL, Golly-geez DeWitt! Take an anti-depressant. Nothing ever gets accomplished if you start with the assumption it can’t be done. A bit more aggressive strategy than that is required here.

    Farm to co-op sized biodiesel is probably a good way to start… small presses go for about $5000 for on-farm biodiesel production. The farmers/co-op grow their own soy beans, and aren’t affected by market forces.

    You also shouldn’t start an argument over affordability with systems meant for RVs from amazon.com.

    Here are the numbers. Going rate for solar panels is $3/W. Average power draw for a home in the US is 1.4 kW, using the standard 3x multiplier gives 5 kW requirement. Costs for equipment is $15k for panels, 5000 W DC inverter plus batteries is $5k (inverters have gone way done in price since the last time I bought a one). We don’t need long-term storage batteries if grid is an alternative, so we store excess energy on grid (sell it during the day, buy it back at night). Factor in another $2000 if you want to have 24-hour storage capacity on the batteries.

    Current going rate in US is $0.12/kW, so average monthly energy expense is $120 (it’s worth noting that this number is HIGHLY SUBSIDIZED already, you aren’t actually competing with the grid on an even playing field). The average inflation rate over the last 12 years for energy in the US is 9%.

    Putting the numbers, and you break even with your investment in about 9 years. If you factor in the 30% tax credit (why not? You are already paying for part of that “cheap” fossil fuel through your state and federal taxes), break even occurs in 7 years.

    This is an easy do-it-your self project, but if you want to factor in paying a contractor, with the tax credit you are still talking break-even in 10 years.

  125. Re: Carrick (Jan 22 23:31),

    The farmers/co-op grow their own soy beans, and aren’t affected by market forces.

    They can ignore market forces, but that doesn’t mean they aren’t affected. The low EROEI on soybean based biodiesel means it isn’t a very good primary energy source, if it is one at all, not to mention the nitrous oxide that’s produced in the process.

    As far as selling solar or wind power back to the grid or net metering, the only reason utilities do this is because the government makes them. The costs are passed on to the rest of the rate payers. There’s also a cap on the total amount of solar power the utility has to buy. In California the cap is 0.5% of a company’s total generating capacity. PG&E is about to reach that cap. So a few individuals have been able to take advantage of the rest of us by using our tax money to subsidize the capital investment and having utility companies buy their excess power at a loss. Wonderful.

  126. DeWitt and Carrick – I’m sure you guys are aware of this but the biggest impact on energy demand is through behaviour change and efficiency improvements. In the context of EROIE reducing the amount you use is pretty hard to beat.

  127. Isn’t there more to it than EROEI? Cars need fuel in liquid form. I am guessing that you get a lot more energy per dollar when your dollar is spent on coal rather than crude oil or refined crude oil simply because crude oil is involved in making our cars move while coal isn’t – at least presently.

    Sure one of the drawbacks of ethanol is that you have to use heat to distill alcohol but it’s not like you don’t have to heat crude to refine it into a usable form. One other thing about ethanol from corn starches. You have to remember that there is a surplus of carbohydrates in the nation’s corn harvest. Why do you think farmers grow so many soybeans? It’s mostly because corn has too little protein in it and too much corn starch. Also bear in mind that corn is inherently a more productive crop than soybeans by a factor of more than three and more than four (on some soils) in terms of bushels per acre. So when we convert corn to alcohol one of the net effects (provided you can get the mash to the feedlot fast enough and without spoilage) is that you need less of the inherently less productive crop – soybeans. Of course, I also recognize though that soybeans also have an additional use as a fuel in the form of biodiesel which competes with other liquid fuels. All these considerations is why I always thought that commodity traders refer to grains as the “grains complex”. We’d probably need a good model to sort out all the economics of it.

  128. DeWitt:

    The low EROEI on soybean based biodiesel means it isn’t a very good primary energy source,

    Of course it’s a net energy source, especially if you are using biodiesel to produce it. Even the low-ball estimates from Pinmetal admit to that… You get something like a ratio of 1.5 energy produced to energy used not including byproducts, and a ratio of 5 including byproducts. If you want me to respond to the NO comment you tossed out, you need to be a lot more quantitative about what your objections are. I’ll note that the main source of NO is from the fertilizers used, they weren’t always used, and the switch to them was done at the behest of the federal government to start with.

    As to selling the energy back to the grid… Our taxes help pay for it, hopefully we can be allowed to use it too.

  129. Carrick,

    when speaking of reprocessing farm and food chain detritus as input for bio-diesel, has anyone sat down and computed the REAL cost of doing this??

    In the past most fo this material was reused as feed, returned to the soil, and other interesting uses. By redirecting them to bio-diesel production you are removing micro-nutrients and minerals from the food system for one issue. You also make more expensive all the prior processes that were using these by-products.

    A gross example of this has been the food cost increase, and higher land use for farming with associated fertiliser use etc, of corn for ethanol. Bio-diesel in any real quantities will cause upheaval also. Whether it would be cost effective in the end is impossible to estimate at this point.

  130. kuhnkat:

    when speaking of reprocessing farm and food chain detritus as input for bio-diesel, has anyone sat down and computed the REAL cost of doing this??

    That’s the nice part about farm-scaled “experiments”. You have a closed budget process, so you can track whether it saves money or not. I’m in a “wait and see” on this.

    In the past most fo this material was reused as feed, returned to the soil, and other interesting uses

    This argument makes no sense. You’re only using the oil from the soy beans for biodiesel, and the other products would be used for commercial soy bean farming in any case.

    A gross example of this has been the food cost increase

    Actually that’s an example of gross exaggeration on the Y2K, climate catastrophe scale. I see you aren’t any more immune to it than the AGW fear mongers.

    Food cost increases rather ironically are tied to increases in fossil fuel costs firstly and secondly to increases in demand from developing nations. The main impact has been ecological, with the irony of that not lost on me either: We have increased farming land usage in this country, with farmers taking land that was part of the NRCS Conservation Reserve Program out of the system for use in growing corn for ethanol.

    At the moment, the unused land area in the NRCS CRP is roughly the size of the state of New York, so we have a long way to get before we hit a real practical limit.

    It’s already cost effective in Brazil and Argentina (countries that have about 1/10th our oil reserves), so it’s not exactly “impossible to estimate at this point”.

  131. Carrick,

    “This argument makes no sense. You’re only using the oil from the soy beans for biodiesel, and the other products would be used for commercial soy bean farming in any case.”

    Removing the oil from the soy bean means you can not use the bean for many other typical uses. You are ignoring the fact that there is little of a plant in modern techniques that is “thrown away.” What is “thrown away” typically isn’t, as it is returned to the land. If it isn’t returned to the land you increase the need for petroleum based fertiliser. You will still need to increase the farmed acreage by huge amounts.

    “Food cost increases rather ironically are tied to increases in fossil fuel costs firstly and secondly to increases in demand from developing nations.”

    Let me think now. Brazil changed some food production to ethanol production. They alledgedly also converted more rain forest to bio growth for ethanol. This increased their fuel supply, decreased their food supply, and food costs went up there and in countries they export to. Greenies in the US were quick to use Brazil as a shining example of how we should do it. Please explain.

    Now, I agree that the cost of biodiesel and ethanol are partly based on the cost of fossil fuels. How you can ignore this as a problem with bio-diesel and ethanol escapes me. If their production can not reduce the cost of the fuel supply it would appear to me that it isn’t a particularly good idea. If their production does not pay for itself why are we wasting resources to produce them??

    In other words, why would I utilise enriched uranium to produce a product that could not be utilised to create at least a comparable amount of energy as a nuclear reactor or something else of more worth to society???

    In San Francisco, a city of many restaurants and tourists, if there were any concerted effort for say 20% of the population to create their own bio-diesel, they would run out of the oils now picked up for free. If many people took their yard clippings and clippings from the parks for the same, the need for petroleum based fertilisers would increase and they STILL wouldn’t come close to filling the need. If they actually did this anyway, instead of getting free oil and other waste to produce their bio-diesel, they would soon be PAYING for those wastes based on supply/demand.

    One of the environmentalists goals has been to remove land from farming and other unnatural uses. Bio-diesel and ethanol production IS pulling back a lot of idled farmland and we haven’t dented our need for portable energy.

    There IS NO ECONOMY OF SCALE with bio-energy as currently available. The closest I have seen is with the gene engineered bacteria producing fuel for the military. The congress and others have stopped funding the successful pilot plants. You still have to feed the bacteria SOMETHING that is not available for other uses.

    http://www.forbes.com/forbes/2008/1124/058.html

    Why soybeans???

    http://articles.orlandosentinel.com/2007-04-17/news/BIODIESEL17_1_biodiesel-sayers-billion-gallons

    1600 gallons per acre. Great, except, figure out how many acres of land would be required to replace current US gas and diesel use and tell me that won’t be a problem!!!!

    2006 annual US use 50 billion gallons of diesel
    2008 140 billion gallons of gasoline

    As bio-diesel has less energy let’s say 200 billion gallons a year needed (underestimate). Say two crops a year from each acre, you’ll only need about 62,500,000 acres to grow the crop plus all the facilities to process it!! Total US farmland is about 922,000,000 acres. Do you really think taking this chunk of land out of the available won’t affect prices??? As you claim, what about the cost of fertiliser?? The total will have to be larger if you allow the crops to rotate or the land to rest (go fallow).

    Now, imagine the same calc at 200 gallons per acre instead of this experimental crop. It is ridiculous.

    Supply and demand rule.

  132. I don’t have time to respond to your full comment, but they weren’t addressing any point i was making in any case.

    My observation about your original point making no sense was that if all you are doing is squeezing the oil from the bean/seed and burning it, that in itself isn’t going to rob the soil of micronutrients.

    Beyond that we are so far from sustainable agricultural practices in the US, it makes no sense to stick a barb into biodiesel without addressing the stampeding herd of elephants flattening the village which is conventional agriculture.

Comments are closed.