How are AR5 models doing? (End of 2013)

HadCrut, GISTemp have now reported their 2013 annual average temperatures (NOAA/NCDC seems not to have done so). It seems as good a time as any to start looking at how the very recently official AR5 “projections” compare to data. For this post I will perform comparisons beginning 1990 (because that’s a nice round year.) This year was used in one of the graphs in my discussion of incorporating Cowtan and Way into evaluation of model trends against observations posted in November. (Note: graphs in this post will show “Cowtan and Way” but I have not updated those to include Cowtan and Way data beyond what was available in November, 2013. So it should mostly be disregarded.)

For purpose of comparison, I will compare the observed trends from GISTem, HadCrut4 and NOAA/NDCD to model mean trends of models contributing more than 1 run to the AR4. For each models the model mean is shown with a open circle. It possible to estimate the ±95% range spread of trend due to “weather” based on that model by computing the standard deviation and assuming trends are normally distributed. That spread is shown with a verticle trace and delimited by the inner most blue cross hatches. I have added an estimate of measurement error based on information from each of the measurement groups and added colored cross hatches. GISTemp, HadCrut4 is purple, NOAA/NCDC is green. The observed trend from each of those groups is shown with a long horizontal trace in the corresponding color. When the long horizontal trace falls outside the corresponding cross hatch for that measurement group, the observation falls outside the ±95% range one would expect for “weather” for that model.

The graphs will also show similar information multi-model mean based on the 17 models with more than 1 run, all models used in the AR5 (including those with 1 run). Further details for the multi-model mean comparisons will be provided when I discuss the comparison.

Comparison of Modeled and Observed Trends in Global Temperature since 1990.
Observed trends from NOAA/NCDC, GISTemp and Hadcrut to trends from AR5 Models are shown below:

AR5 Trends since 1990
Starting from the left: Compariong Observations to the CanESM2 model we see that HadCrut4, GISTemp and NOAA/NCDC all fall below the multi-model mean for that model. The also all fall well outside the lower red, green and purple tick marks indicating the spread of trends one would expect in 95% of “weather+measurement error for that model.” That is, using traditional statistical tests using ‘frequentist statistics and the ±95% confidence level applied to this model individually, one would reject the hypothesis this model is correct. This situation holds for the next two models to the right. However, we would not reject the hypothesis CSIRO is true if the test involved comparison to GISTemp or HadCrut4. We would reject it if we tested using NOAA/NCDC.

I’ll let readers eyeball the graph to figure out other close calls. The tallies of number of ‘reject’ results out of 17 model comparison with GISTemp, HadCrut4, and or NCDC are 15, 16 and 16 respectively. Accounting for autocorrelation in the test results (which arise because all comparisons involve the same earth realization) I find that if all models were “correct” both in their mean trend and in their variance of trends, this sort of result would occur in 1.4%, 1.5% and 1.4% respectively. (Note: the uncertainty in measurements is also involved in this calculation.)

So, this result strongly suggests that we should ‘reject’ the probability that model runs are somehow drawn from models that correctly model the earth’s trend since 1990.

We can now turn to a comparison of the multi-model mean including all models in the AR5. This is indicated by the trace that is 2nd from the right listed as ‘MM (all models) + MM weather’ For this trace, I estimated the standard deviation of trends due to ‘weather’ using the pooled standard deviation from all models with more than 1 run. If we examine this trace, we see that that the NOAA/NCDC and HadCrut4 trend falls outside the ±95% range of ‘weather plus noise’ estimated this way while the GISTemp trends still falls inside. This suggests that based on two out of three of the official temperature sets, the deviation between the “observation” and the “model-mean” cannot be explained by the magnitude of “weather noise” in a typical model.

That is: This disagreement is not due to “weather” but rather due to the model mean being biased high. (I believe this view is the correct one. It is also consistent with the notion that the upper rage of the models is inconsistent with the observations.)

However, if one wants to make the argument that GISTemp is “right” and the others “wrong”, then using GISTemp, one would not reject the notion that the multi-model mean is within a “weather noise” of the observation.

Further to the right I have plotted a comparison to the previous multi-model mean but increased the spread to include not only the spread due to “weather”, but the spread arising form the disagreement between the models. That is: it includes the difference between the mean of say CanESM2 and FIO-ESM). Using this spread, the observations from GISTemp, HadCrut4 and NOAA also fall outside ±95% confidence intervals. Cowtan and Way may fall inside– but as I noted: this has not been updated to include any 2013 data. I believe Cotwin and way have posted new information but I have not incorporated it in any manner. I’ll try to snag that sometimes soon– likely after NOAA reports their annual average for 2013.

Update Jan 25: I wanted to put the link to Cowtan and Way monthly temperature series so I can find it later. The links page has a link to monthly data. Currently, monthly data go through Nov. 2013.

92 thoughts on “How are AR5 models doing? (End of 2013)”

  1. Not to nitpick but it is Kevin Cowtan not Kevin Cotwin. This is a recurring issue for certain websites so it would be best to correct it here to avoid it being repeated by others.

  2. Robert,
    Thanks. Fixed. I thought I saw something where you guys updated to have 2013 data? Am I mistaken? If not, I’ll hunt so I can include.

  3. Go Whitecaps

    Could you tell me what your values would be for RSS and UAH?

    No. I’m not going to the trouble of adding these to the graph because they are for the lower troposphere and should be compared to lower troposphere model runs not land runs. It’s some work to add/subtract and keep track of things. I don’t think it’s worth the effort to add oranges to a plot showing apples.

  4. Not sure if it’s too much work, but in what year/month did those numbers cross the 5% rejection threshold for good? How many months have those numbers been sub-5%?

  5. still says Cotwin, not Cowtan at 8PM Eastern.
    Cotwin and Way may fall inside– but as I noted: this has not been updated to include any 2013 data. I believe Cotwin and way have posted new information but I have not incorporated it in any manner. I’ll try to snag that sometimes soon– likely after NOAA reports their annual average for 2013.

  6. Ken,
    I don’t know, we don’t know if it’s permanent, and I don’t think it’s worth the effort to fish through and figure the answer out.

  7. Go Whitecaps!

    Sorry, way O/T (and wrong season), but my son was asked to play for a local 16U whitecaps team last summer, so got my attention.

    To get slightly back on topic, how are the models working up in Michigan!

  8. Hi Lucia,

    How do you do this anyway? I’m guessing you’ve downloaded the data from runs of the models from someplace and have some R code and/or scripts or something that you run to compare the ‘current’ modeled values and +/- standard deviation brackets to the observations? Where does one go to download the various models output runs?

  9. Lucia,

    Thanks for updating the comparisons. It seems the continuing model/reality divergence is beginning to have some influence. John N-G seems to pretty much agree (based on a recent post) that the reality-model divergence lends support to the hypothesis that the true climate sensitivity to GHG forcing lies below the CGM diagnosed values. A small step, but a step in the right direction. Perhaps another decade of very slow warming (combined with certain influential climate scientists riding off into the sunset…. eg Kenneth T and his ilk) will move the field from the wild-eyed hysteria of the last 20 years to more-or-less calm and rational analysis. Maybe not, but I’m trying to be optimistic. 😉

  10. How do you do this anyway? I’m guessing you’ve downloaded the data from runs of the models from someplace and have some R code and/or scripts or something that you run to compare the ‘current’ modeled values and +/- standard deviation brackets to the observations? Where does one go to download the various models output runs?

    I downloaded model data from the climate explorer. http://climexp.knmi.nl/ I rebaselines get standard deviations &etc. One can also download gridded data and mask it oneself. But I don’t d that.

    SteveF
    At some point, if it doesn’t warm, people have to recognize it…. doesn’t… warm. Sure warmth can be “hiding” in the ocean– but how much? And why now when it just happens to be inconvenient for the theory an alarmist favors?

    Of course, the converse is if it suddenly starts to warm, one has to admit it was “weather noise”. Or something. But at least for now, warming has not resumed. And how long have I been saying it’s not warming as fast as the AR4 projected? March 2008?
    http://rankexploits.com/musings/2008/ipcc-projections-overpredict-recent-warming/
    And there were “those” who insisted that warming would resume ‘any moment now’– the next El Nino? The sun would warm up. what have you.

    The the multi-model mean of AR5 models ‘project’ even faster warming!

    Some are starting to admit the upper end doesn’t seem plausible. But we still read others insisting “we” need to take that seriously. But why? The high end of model projections just look wrong. We don’t need to take it seriously just because it spit out of a model.

  11. Lucia,

    Typo? ” But at least for now, warming has resumed.”

    Should be: ” But at least for now, warming has NOT resumed.”

  12. Lucia,
    “Some are starting to admit the upper end doesn’t seem plausible. But we still read others insisting “we” need to take that seriously. But why? The high end of model projections just look wrong.”
    .
    Yes, the models (at least the ‘hotter’ ones) have been looking very wrong for a very long time. The weird thing is that when you point this out to some of the ‘very climate concerned’, they reply that accuracy of projected warming is not an important issue. Say what? If the consequences of warming from GHG forcing are threats, how can accuracy of warming projections NOT be the MOST important issue? IMHO, this sort of arm-waving rational is simply crazed.
    .
    Some (like John N-G) can find their way past the nutty arguments, which is a hopeful sign. We will know there is real progress in the field when 1) more models are ‘dialed back’ in sensitivity, like GISS Model E-R, to come closer to reality (tweaking cloud feed-back, while simultaneously adopting lower assumed historical aerosol offsets, would likely do it), and 2) the worst of the models are forced out of the CMIP collection. Alas, the latter is only likely when the public purse strings are being drawn tighter. The former is only likely when continued divergence begins to motivate chuckles and winks when climate modelers make presentations; I doubt we are more than 5-10 years of continued divergence away from those chuckles. It will be interesting to see the continued kicking and screaming in the field.

  13. Well heck, I think it’s been known the upper end is not realistic from the beginning. It’s just politically very important to be able to use those magic words “up to” and then some outlandish number.

    What’s really happened is perhaps an increasing, reluctant acceptance of the reality that intellectual dishonesty is less effective politics than you would think.

  14. Andrew_FL,
    Been known by whom? I think there are plenty of people ‘out there’ who think the upper end is plausible. Certainly, even proclamations by climate science groups treat them as if they should be taken seriously. We are starting to see journal articles that note the upper end is unlikely– and yet, somehow, the same papers will not acknowledge that this– logically– even means the mean is probably too high. They must be because the higher end is included in the computation of the mean and given equal weight. But the high end is improbable while the low end is not improbable!

  15. Lucia,

    Very nice!

    By the Way (ha-ha), at the top it says Cowtan but at the bottom it still says Cotwin.

  16. The response of the model-using community has been remarkable. Rather than questioning their assumption that the greenhouse effect of carbon dioxide is highly amplified by mysterious forces, they go on a frantic scavenger hunt for the “missing heat.” While they are at it, they should help O.J. find the real killer.

  17. Lucia, since our last exchanges here concerning how best to compare climate model and observed temperature trends, I have done some analyses where I assume the climate model and observed temperature series can be modeled with a deterministic trend, not necessarily linear, included in an ARMA model of the residuals.

    Those analysis lead directly to the trends from these climate model series with multiple runs belonging to a normal distribution as you have assumed in your analysis/comparisons. I understand and appreciate the thought you have put into the comparisons of climate model and observed series that you present at your blog. You do that without assuming a fitted model like ARMA plus deterministic trend can be applied to the observed and climate model series. There is not much in the peer review literature that supports modeling these climate model series with a stochastic model, like ARMA.

    The limitation of not assuming a model for these series is that the “weather” noise cannot be quantified for the models without having an extensive number of runs and cannot be quantified for the observed series since we have only one realization of the earth’s climate. Fitting an ARMA model allows one to do Monte Carlo simulations and from those simulations to estimate the weather noise model trend variability. In my analyses I used a smoothed spline trend (it works more efficiently than using a segmented linear approach with breakpoints) to extract the residuals which were in turn fitted to an ARMA model. In my mind the climate models all have the underlying assumptions of a deterministic trend and stochastic/chaotic weather noise. This assumption leads further to multiple runs of the same climate model all having included the same deterministic trend that can be obscured and varied by weather noise.

    Now differences between trends from a single of even the means of a limited few model runs and trends from observed series can be caused by the difference in deterministic trends and/or weather noise – at least over relatively short time periods. I think the best way to analyze these differences and the sources of these differences is through attempts to model the time series.

    I have found that the resulting ARMA models for climate models and observed series can be different than those fitted to the observed series. I have also used the Kolmogorov-Smirnov function in R to determine whether the spline trend residuals of the climate models and the observed series come from the same distribution. I used the observed GISS modeled series as the standard for comparison and in turn then used the ks.test score of the GISS series to the other 2 observed series to compare with the ks.test score of the GISS series to the CMIP5 model series. I found from these ks.test scores that most of the CMIP5 models are shown to come from different distributions than the observed series. There were 5 CMIP5 climate models that stood above the remainder in ks.test scores and these were: FGOALS_g2, GISS_E2_HP1, GISS_E2_HP2, GISS_E2_HP3 and GISS_E2_RP1.

    FGOALS had 4 runs in the Historical period and only 1 run in the rcp series. The GISS models all had 5 or 6 Historical runs and 5 of 6 rcp runs. Based on those runs and the analysis above, I would probably use only the GISS models noted above in a comparison with the observed series.

    I am wondering if a Bayesian analysis here might be applicable and an inclusion of the Way_Cowtan observed series.

  18. Kenneth

    Fitting an ARMA model allows one to do Monte Carlo simulations and from those simulations to estimate the weather noise model trend variability.

    The difficulty is there is no certainty the ARMA model is “right”. So, it’s not clear you have a better estimate than merely computing the standard deviation of the runs themselves. In anycase, if you do fit the ARMA, you have a series of tests you need to do: For example, if you know the ARMA_true, and you fit ARMA_est, is the average trend variability for ARMA_est equal to the one for ARMA_true. (The answer is generally ‘no’.)

    I have also used the Kolmogorov-Smirnov function in R to determine whether the spline trend residuals of the climate models and the observed series come from the same distribution.

    I think the details that follow translate to “You found they come from different distributions”. I think the do to. But I find merely testing individual parameters (like standard deviation of trends) is simpler and does not requireing the complication of fitting splinds, extracting residuals, fitting ARMA to the residuals, running monte carlo, then doing a kolmogorov smirnov test and so on. I’m not entirely sure what benefit there is in the more complicated test particularly since both the spline step and the ARMA fitting step are sort of “heavy” statistical modeling steps, while merely finding residuals to a line is “light”.

  19. Kenneth Fritsch:

    The limitation of not assuming a model for these series is that the “weather” noise cannot be quantified for the models without having an extensive number of runs and cannot be quantified for the observed series since we have only one realization of the earth’s climate.

    I agree with your point about the weaknesses of using ARMA to model natural variability.

    However, if you can assume your noise distribution is stationary over time, then you can divide the time series into multiple subsegments. Of course this doesn’t help with really long-period stuff, but it solves the problem of only one realization of the actual Earth climate.

    My own method is to assume that natural fluctuations are a series of quasi-periodic (“tonal”) components plus some form of red noise. What I do is fit to the spectral peaks in the time series and model the remainder as 1/f^n noise (the value of “n” gets fit too). I use a simple phase diffusion model to reconstruct instances of the the quasi-periodic portion of climate fluctuations. These explain most of the variance so the 1/f^n portion is normally neglected.

    Here’s an example … I believe the top is real data, bottom is simulation.

  20. Two things stood out for me. One is that FOI-ESM is the only one that is close to the measured values. I’m curious about how well it tracks on a yearly basis and what is it about that model that is different from the others? The other thing that stands out is that the average of the models is about .3 deg/Decade whereas the IPCC is “projecting” .2. Sort of implies that they don’t even believe the models that they are using for their report. I’m sure there are weasle wording statements that try to explain that away but that is damning IMO.

  21. As I mull the replies of Lucia and Carrick, I was thinking that if GISS is the preferable observed series and the 4 GISS climate models are the preferable climate models and then it really is GISS all the way down.

  22. BarryW

    The other thing that stands out is that the average of the models is about .3 deg/Decade whereas the IPCC is “projecting” .2.

    0.2 was AR4. These models are AR5. The AR5 doesn’t quite use the multi-model mean. I’m showing it because even if the don’t use it, it exists.

    The authors in the AR5 did have to consider that papers indicating the high end is implausible have been published. But they still show the full spread, and so on.

  23. I’ll start with Carrick. You make a good point on segmenting a time series to obtain a measure of variability from a single realization like the earth’s climate. I have not attempted that approach, but it might be interesting to see how it would compare to an ARMA model – or a spectral model for that matter.

    I think that ARMA models can handle what you call quasi-periodic (“tonal”) components but I am not sure how well.

    I have noted that the ks.test is more discriminating in differentiating the observed series from the climate model series than any of the spectral analysis tools I have applied – or perhaps misapplied.

    Lucia, I have run some tests in attempts to determine how well the ARMA models plus deterministic trend fits the series data. I look primarily at the Box.test on the ARMA residuals to determine if the model removes all the autocorrelation. Actually I can fit an ARMA model well to most of these series if I am allowed to use nearly unlimited orders of autocorrelation. Some series require low orders while others can require up to 20 orders to obtain a Box.test statistic of 0.70. I am not recommending that method for fitting.

    I am not sure I understand what you mean by: “For example, if you know the ARMA_true, and you fit ARMA_est, is the average trend variability for ARMA_est equal to the one for ARMA_true. (The answer is generally ‘no’.)” I fit the ARMA model to the residuals of the data minus the trend. If you use a linear trend for extracting the residuals and the linear fit is not very good the ARMA model will attempt to compensate for that poor fit usually by increasing the ar and/or ma coefficients. I think it is important, if not critical, to look for the trend not limited to linear when ARMA modeling the residuals. The recent plateau in temperature trends can cause this problem for ARMA modeling if the time period for doing a linear trend starts several years prior to the plateau.

  24. Kenneth

    I am not sure I understand what you mean by: “For example, if you know the ARMA_true, and you fit ARMA_est, is the average trend variability for ARMA_est equal to the one for ARMA_true. (The answer is generally ‘no’.)” I fit the ARMA model to the residuals of the data minus the trend.

    What I mean is this:
    1) Pick some ARMA model with know parameters. This is “ARMA_true” for the test.
    2) Create a series.
    3) Pretend you don’t know the parameters, but estimate by fitting ysing whatever method of your choice. These are ARMA_est (whatever they are.)
    4) Will the parmeters you get in (3) be exactly those in (1). (The answer is very, very rarely.)

    5) Now using ARMA_est, generate a shit-wad of runs .
    6) Using ARMA_true, generate a shit-wad of runs.

    Do whatever tests you do to determine whether ARMA_est has the ‘same’ properties as ARMA_true. The answer is they almost certainly won’t because the parametesr for ARMA_est will not be exactly equal to ARMA_true. What’s worse: large the number of runs, the more likely you are to decide that the parameter differ to a statistically significant degree. This is because 10000* “a shitwad” gives you greater statistical precision than 1*”a shitwad”, and so no matter how close ARMA_true is to ARMA_est, if they are not exactly perfectly totally and completely the same (which they won’t be) there will be some number of test in your montecarlo run that will let you conclude that a property of interest generated using ARMA_est differs from the property in ARMA_true.

    So I really don’t think you can get around the problem of finite numbers of model runs by fitting ARMA and then imagining that running a whole bunch of ARMA runs over comes the finite samples. This is because you will be creating an infinite number of runs for an estimated version of the ‘weather noise’ and that estimated version of the ‘weather noise’ is based on a finite number of runs.

    The fact that you can fit an ARMA model “well” doesn’t mean that the ARMA model is “true” in the sense of being the model that generates the data. Whatever you fit is only an estimate of the “true” model. This is always the case.

    The only real question is: Can you get a more precise or accurate estimate of some property ‘weather noise’ by doing the complicated ‘spline/ARMA/monthe carlo’ method or by just estimating based on the runs themselves. It’s not at all clear to me that you can. But maybe one can. To answer the question it might be possible to do so by testing your “method” using synthetic data and if you want to use this method you should do those tests.

  25. Sorry for the dumb question; but why are the 95% confidence bars symmetrical ?
    I cannot understand why they should be in an auto-correlated series, there should be the tendency to cling to the starting value.

  26. Doc,
    I use a statistical model. This is described in this sentences above.

    For each models the model mean is shown with a open circle. It possible to estimate the ±95% range spread of trend due to “weather” based on that model by computing the standard deviation and assuming trends are normally distributed.

  27. Doc: By the way “autocorrelation” and “normal” are not either/or. They are properties for different sorts of things and have pretty much nothing to do with each other. “Clinging” to an initial value doesn’t affect the probability distribution functions for trends from a type of time series.

    Saying that you are surprised an autocorrelated series has a symmetric distribution of trends is a bit like saying you are surprised someone with red hair is tall.

  28. lucia (Comment #122898)-I am sorry, I should have been clearer. Obviously there are groups of people who “take seriously” the upper end of the models. I thought it was clear though that I meant to say scientists studying the issue, not environmental advocacy groups or ornithologists (or economists, for that matter). My assertion is that virtually everyone studying this issue has never thought 4+ degrees per doubling was even remotely probable. I believe the endorsing of pretending it was remotely probable was politically motivated.

    To be sure, we are seeing an increasing acceptance of papers suggesting the upper end is implausible recently. I don’t think these suddenly appeared ex nihilo, though.

  29. One more observation. From what you said, 1990 was just picked out of a hat for a start point. I ran Hadcru4 for trends ending on 2014 starting at ten years and going back one month at a time. Around 1993 the trends start falling from the value you used down to about half that by 1996 (and never rise above that value). Even by 1994 they’re just .1 deg/Decade which is a 20 year trend. So it appears, to me at least, that the start time you used shows the models in the best light (assuming their trends remain at or above the values you’ve shown).

  30. lucia (Comment #122911)

    “The fact that you can fit an ARMA model “well” doesn’t mean that the ARMA model is “true” in the sense of being the model that generates the data. Whatever you fit is only an estimate of the “true” model.”

    I agree with you and Carrick that fitting an ARMA model to climate data is artificial. I am surprised by the how well the fit is. The climate models and observed series do have autocorrelations and that is what the ARMA model handles. I am not sure I follow your arguments about generating a true ARMA model and then modeling it. I will do the excercise as this is a simple task in R and will assume a shit wad is 10,000 runs.

    I would guess what you are saying is that if I am modeling a single realization and that realization could be in the tails of the distribution of some true ARMA model and that when I model it I will fit a different ARMA model. That can be a problem with the observed series where I have single realization to work with but less so with models where I can have 6 to 10 runs/realizations to look at.

    This problem is something that I have considered and thought about before. It appears to put the chaotic climate in an even more uncertain light and more difficult to compare with model results no matter how you go about that comparison.

  31. Kenneth

    That can be a problem with the observed series where I have single realization to work with but less so with models where I can have 6 to 10 runs/realizations to look at.

    If it can be a problem with a single realization, it can be a problem with a finite number of realizations. You can only get an approximation. There isn’t anything about fitting an ARIMA that makes it easier to get an exact value that it would be to get an exact value for a mean or standard deviation from a finite number of realizations.

    There isn’t any magic way to get around the fact that you have a finite number of realizations. Generally, trying to do something complicated– like fit an ARIMA with multiple parameters– makes the problem worse, not better– because you now have multiple parameters you are approximating all at once.

  32. BarryW (#122924): “I ran Hadcru4 for trends ending on 2014…”

    As it happens, I was just updating my spreadsheet, and this calculation was already done. This is the long-term view, and this zooms in on the last 50 years to illustrate your comment.

  33. Joseph W., I figured people would assume a well-respected journalist generally follows basic journalistic practices, thus when he doesn’t, it indicates a break from pattern.

    In any event, Rather demonstrated his awareness of the standards for a story like this in the segment for it and in his subsequent defenses of it. He specifically said the documents had been authenticated in the segment. He then relied heavily upon that claim to justify the story.

    In reality, the documents had never been authenticated. Two experts expressed doubt. A third saw only two documents and said they were copies with too low quality to authenticate. A fourth saw all the documents and explicitly said the same. He did, however, say he believed the signatures on two of them were real.

    Rather said nothing about the experts who expressed doubt. He said nothing about the copy quality being too low for authentication to be possible. Instead, he portrayed the fourth expert’s remarks about two signatures as an authentication of all four documents.

    There’s plenty more, but that shows why this is an argument for malice. Rather showed he knew the standards he was being held to. He mischaracterized things to act as though he had met them when he knew he had not. His disregard for the standards was either willful or reckless.

  34. BarryW, you wrote “One is that FIO-ESM is the only one that is close to the measured values. I’m curious about how well it tracks on a yearly basis and what is it about that model that is different from the others?”

    Having read a little about the FIO-ESM model here, http://onlinelibrary.wiley.com/doi/10.1002/jgrc.20327/abstract , it seems to me that this model is the only one to include non-breaking wave mediated mixing of ocean waters, which when added to the model has the effect of cooling northern hemisphere SST and warming southern hemisphere SST. So perhaps warming of the deep ocean is where the legendary missing heat really is hiding. 🙂

    There is a limit to how much heat you can hide in the deep ocean, because you are constrained by the thermosteric expansion coefficient of about 150-300ppm/°C, depending on pressure(depth) and temperature, for almost all of the volume. If the rise of the oceans since 1900 at a fairly steady 2mm/year were 100% thermal expansion, with no melting glaciers, etc., then given the average ocean depth of about 4000m, 0.002m/4000m = 0.5ppm/year. That translates to a temperature change of 0.5ppm/(150-300ppm/°C) = 0.0033 to 0.0067°C/year. If you multiply that by the ocean volume of 1.37×10^9 cubic km at 1cal/degree/cc, and divide by the surface area of the Earth, you get (1.37×10^24 cc)(1cal/degree/cc)(4.184watts/(cal/sec))(0.0033 degrees/year)/(31,536,000 seconds/year)(5.1×10^14m2) = 1.18 – 2.36 W/m2.

    So the extreme upper bound on the heat you can hide in the deep ocean that is consistent with the recent level of sea level rise is about 1-2W/m2.

    The IPCC estimates the net anthropogenic radiative climate forcing to be about 1.5W/m2.

    You can heat the deep ocean by 0.0033 to 0.0067°C per year for a long time before anybody will notice. The best digital laboratory thermometers are only accurate to about 0.02°C.

  35. “lucia
    Saying that you are surprised an autocorrelated series has a symmetric distribution of trends is a bit like saying you are surprised someone with red hair is tall.”

    Perhaps I am wrong, but I think I know that warming is more difficult than cooling. A model should resemble Sisyphus pushing his rock up hill, with episodic loss of control.

  36. Doc

    Perhaps I am wrong, but I think I know that warming is more difficult than cooling. A model should resemble Sisyphus pushing his rock up hill, with episodic loss of control.

    Ok. But to me this just sounds like your explanation about why it’s surprising that someone with red hair is tall, is that you know it’s difficult to see things that are far away than see things that are up close so a model “should” resemble someone straining to see far away things with periodically resting their eyes.

    (FWIW: I see no reason why warming should be more difficult than cooling even if you way you “know” this is true.) But even assuming you are right:

    (a) How would warming being more difficult than cooling result in an asymmetric probability distribution function for trends? (Why wouldn’t it just resulting in the mean trend being negative with the distribution about the mean still symmetric. No: the Sysiphus analogy is not enlightening here.)
    and
    (b)Previously, you invoked “auto-correlation” as the reason why you expected asymmetric probability distributions. What does your theory about warming being more difficult than cooling have to do with whether autocorrelation and/or why auto-correlation would result in non-symmetric probability distributions?

    On the other issue: If you think for some reason that some sort of asymetric probability distribution must apply to trends about a mean, your free to look for them. As far as I know, the only reason that we might see these in the long run is volcanic eruptions interspersed with periods with smoothly evolving forcings. But that has nothing to do with your theory that warming is “more difficult” than cooling. It’s just that the ‘shade’ due to the aerosols are suddenly injected. But that’s sort of thing is irrelevant to the distribution for these model runs which correspond to cases where all runs share the same forcing. In fact, as far as I can tell, model runs distributions look more or less normal– though it’s difficult to test with such a small number of runs.

  37. Re: UnfrozenCavemanMD (Jan 25 23:01),

    You can heat the deep ocean by 0.0033 to 0.0067°C per year for a long time before anybody will notice. The best digital laboratory thermometers are only accurate to about 0.02°C.

    But the best thermometer, specifically a standard platinum resistance thermometer with a high precision resistance bridge calibrated by NIST is accurate to ±0.001 degrees from -200 to 1000 C. You’re not going to put one of those in an ARGO float, but the thermometers in the floats are nearly as accurate. IOW, a trend of 0.003C/year would, in fact, be noticed right away. The actual trends that ARGO measures are less than that.

  38. Lucia, knowing what you know about models, recent ECS work, ENSO, ‘pause or reversal’, where is your ‘Lukewarmer Meter’ these days?

  39. From KNMI climate explorer it looks like the FIO-ESM model gives us about 1.7oC warming from pre-industrial to 2100. Below the 2oC disaster limit (apologies for the alarmism).

  40. The disaster limit actually corresponding approximately to “peak benefits.”

    As it happens.

    DeWitt Payne (Comment #122946)-Yup, the platinum resistance thermometers are all on board the satellites. Not aware of any other climate applications of the tech.

  41. Lucia, ”

    Bob

    where is your ‘Lukewarmer Meter’ these days?

    I don’t understand your question.”

    Simply, has the degree of your ‘lukerwarmism’ changed in the last year?

  42. Lucia, ” No. My degree of “lukewarmism” hasn’t changed.”

    I am curious. Since the models are running very hot (my guess is because of the way they wrongly attribute clouds amplifying CO2), there has been no warming for 17 years, and oceans (ENSO, PDO, AMO) most likely playing a larger role than previously imagined, what would it take to modify your lukerwarmism? Of course, you need not answer.

  43. This is driving awfully close to trollish territory with obnoxious, loaded, rhetorical questions. It’s not the sort of thing I would tolerate on my blog.

  44. FIO-ESM is the “odd man out” in your figure, so I was wondering what its TCR is. Oddly, FIO-ESM was not one of the models listed in AR5 WG1 Table 9-5, nor in Forster et al. (2013) or Andrews et al. (2012) analyses of CMIP5 models. At a guess, its TCR is ~60% of the mean of the others’, or around 1.1 K.

  45. Regarding models and the state of global warmin’

    I note that actual temperature trends are lower than modeled:

    http://climatewatcher.webs.com/SatelliteEraTemperatures.png

    even though emissions trends are high:

    http://www.columbia.edu/~mhs119/CO2Emissions/TimeBombFig16.gif

    One might say the sensitivities are too high or the models are somehow goofy in other ways ( and they may be ).

    But when one looks at estimated radiative forcing, it’s actually quite low:

    http://ej.iop.org/images/1748-9326/8/1/011006/erl459410f5_online.jpg

    So, everyone should be a lukewarmer because RF theory says the warming should be low!

  46. Bob

    what would it take to modify your lukerwarmism?

    The temperature trend moving well above the mean would modify it to the high side. Unexplained sustained negative trends sufficient to return to pre-industrial temperature levels would modify it to the low side. (Unexplained would be, for example, without volcanic eruptions. Sustained would be, for example, a temperature falling to levels of the 1900s and remaining there for.. so the 5 year mean was typical of the beginning of the 20th century.)

    I’m pretty sure I’ve more or less said both before.

    There are other possibilities that could reverse my views either in the direction of “hell fire brimstone” or “cooler”. But it’s pretty impossible to list all possible hypotheticals.

  47. Andrew Fl, ” It’s not the sort of thing I would tolerate on my blog.”
    It was not intended to be a loaded question. That you interpreted it that way probably explains why no one goes to your blog.

  48. Lucia, I was thinking of the modeled lineshape of temperature following a volcanic eruption, a rapid drop in temperature followed by a slower rebound. I imagined that the periodic drops, with slower recoveries, would be present in the ‘projections’.

  49. Doc

    Lucia, I was thinking of the modeled lineshape of temperature following a volcanic eruption, a rapid drop in temperature followed by a slower rebound. I imagined that the periodic drops, with slower recoveries, would be present in the ‘projections’.

    This behavior would be irrelevant to the spread in trend estimate in my graphs. This is the spread in trends for cases where the data starts in Jan 1990 and ends with Dec. 2013. All runs from one model share the same forcing. If a volcano erupted in the model data, then all runs have that same “decine/recovery” and that’s in the mean. The variance is the spread around that mean.

    You are mixing this up with some idea about a histogram of trends for overlapping time periods all drawn from 1 run (e.g. GISTemp) of the earth’s data. That’s not what these are.

  50. No Bob, nobody goes to my blog because I am a nobody who asks questions nobody cares about. But asking questions of the nature “isn’t it obvious by now that you are wrong” is something that puts people on very thin ice with me. I’ve never banned anyone but I know what it would take.

    Well, if you really weren’t being a smartass, I apologize.

  51. lucia (Comment #122911)

    In order to estimate the variability of estimating confidence intervals (CIs) for trends in climate model or observed temperature series, given the realizations that derive from the chaotic nature of these series, I started the process by doing 10,000 simulations using the arima.sim function in R for ARIMA(1,0,1) with n= 600, ar=0.65, ma=-0.25, mean=0 and sd=1.

    From each of these 10,000 simulations, I in turn derived a new ARIMA model by determining the best fit using AIC scores. The most frequently fit model turn out to be,as expected, an ARIMA(1,0,1) model with different coefficients than in the original simulated one. Lesser occurring fitted models were ARIMA(1,0,0), ARIMA(2,0,0) and ARIMA(0,0,2). From experience I know that these different models can produce CIs for trends that are different from one another but not by large amounts.

    From the pool of 10,000 new ARIMA models I did 4 further treatments in order to determine the variability in CIs, given the realizations, and using the ARMA model to determine CIs with a single realization and three realizations and by withdrawing samples of three and ten at a time and calculating a standard deviation. In all cases below the models withdrawn from the pool of 10,000 were converted to a series using the arima function in R with n=600 and then from these series a trend was calculated using the lm function in R. All trends in the four Cases below were multiplied by 1000 for better presentation.

    In Case1 I randomly withdrew 1000 samples from the pool of 10,000 models and for each of these samples I in turn did 1000 simulations and for each simulation calculated a trend by the process noted above. From each of these 1000 trends I calculated the 95% CIs and then calculated the standard deviation. The distribution of the 1000 CIs was indicated to be normal by application of the shapiro.test in R. The results for Case1 are as follows:

    Mean 2.5% CI = -0.85; Mean 97.5% CI = 0.85; Stdev 2.5% CI = 0.095; Stdev 97.5% CI = 0.095

    For Case2 I did the same as I did for Case1 except here I randomly withdrew 3 samples instead of a single sample and calculated the mean of those three samples. I then proceeded to calculate 95% CIs for the 1000 mean of 3 trends ending with a distribution of 1000 CIs. The results for Case2 are as follows:

    Mean 2.5% CI = -0.49; Mean 97.5% CI = 0.49; Stdev 2.5% CI = 0.022; Stdev 97.5% CI = 0.022

    For Case3 I randomly withdrew 1000 models from the pool of 10,000 models and converted the models to series and trends as described above. From these 1000 trends I withdrew randomly 3 samples at a time 1000 times and calculated a standard deviation for the three samples. From these 1000 standard deviations I calculated 95% CIs. The results for Case3 are as follows:

    2.5% CI = 0.066; 97.5% CI = 0.863

    For Case4 I did the same as for Case3 except I withdrew randomly 10 samples at a time and calculated a standard deviation of the 10 resulting trends. The results for Case4 are as follows:

    2.5% CI = 0.238; 97.5% CI = 0.647

    Putting these cases together and assuming my calculations are correct and the approach I used is valid, the ARMA simulations of temperature series realizations would appear to provide tighter CIs than calculating the standard deviations of the realizations. In addition the ARMA simulation can provide CIs for a single realization.

    Note above that CIs in Cases 3 and 4 are for 1 standard deviation while in Cases 1 and 2 the CIs are for the 95% of the mean trend or approximately 2 standard deviations and the variability on top of those CIs is given by the CI standard deviation.

  52. Andrew_FL (Comment #122996)
    January 27th, 2014 at 10:11 am

    “Apples and oranges”

    Also it was the first question on the thread, asked and answered!

  53. However, if you’d *really* like, detrended annual average anomalies of the average of HADCRUT, GISS, and NCDC (detrended over the period 1979-2012) can be regressed against similarly detrended annual averages of UAH lower troposphere, a procedure which suggests that we should reasonably expect that short term trends, at least, will be about 1.44 times as large in the lower troposphere as in the average of surface products. Well, at least in reality. In models I think the global figure is probably closer to 1.2. The difference is mostly due to, I think, the reduced interannual variance of GISS. At any rate, this suggests that they would probably, after normalization, fall at about the .1 degree per decade line. If we use a smaller normalization factor, they’d be a bit larger, but still smaller than the surface trends. This is Klotzbach et al’s “divergence” of “actual” surface trends from “inferred” surface trends from the warming of the lower troposphere, for which there are a number of possible explanations:

    The first is that the lapse rate feedback switches sign over decades compared to interannual timescales. So the warming of the surface over the long term is a real signal, that just translates to much less warming of the whole atmosphere than you would think.

    The second is that either the surface, or the satellite data is just wrong.

    The last possibility is that a significant portion of the surface warming is due to a factor which is climatically real, but nevertheless only heats the boundary layer, it does not heat the entire atmosphere. This implies that the lapse rate feedback is different for different types of forcing.

    If greenhouse gases somehow act in the manner of possibility three, then our current climate models do not capture the real nature of the response to greenhouse gases. Therefore to the extent they match the late twentieth century warming, they are *right for the wrong reasons*.

    My thinking is that this is mostly an exaggeration of the surface temperature trend (ie possibility 2) but I am open to arguments for the other possibilities.

  54. Not to nitpick but it is Kevin Cowtan not Kevin Cotwin. This is a recurring issue for certain websites so it would be best to correct it here to avoid it being repeated by others.

    There’s a good line here about extrapolating / in-filling data, but I won’t make it 🙂

  55. Lucia,
    This work seems to beg for a wider audience.
    Do you have plans to take it further than where the climate talk gets hot?

  56. Lucia, I take your point. I am always confused and inarticulate.
    I had a look at the directionality, not amplitude, of the signal measured by HADCRU4 Global S&L.
    Here is the global temp anomoly and rate, measured over 25 months, centered on 23th month.
    http://i179.photobucket.com/albums/w318/DocMartyn/HADCRU4andrateover25months_zps74cbbd26.png

    I gave every positive slope a value of +1 and every negative one a value of -1. The overall sum of the 1943 individual +1’s and -1’s comes to 1.
    Then beginning with the first, I cumulatively added the monthly numbers, so a sustained cooling spell gives a downward slope and a warming spell an upward slope. I was just interested in the directionalality of trend, and not the trend itself.

    http://i179.photobucket.com/albums/w318/DocMartyn/DirectionofHADCRU4Globalrate25monthwindow_zps89ba9497.png

    The ‘directionality’ was toward cooling until the mid-60’s.
    I shall have a play with rate distributions and slink off into the corner.

  57. You are missing a big piece of the picture. Climate models cover the entire globe while the temperature record does not include the polar regions where a lot of the warming is taking place:

    “Incomplete global coverage is a potential source of bias in global temperature reconstructions if the unsampled regions are not uniformly distributed over the planet’s surface. The widely used HadCRUT4 dataset covers on average about 84% of the globe over recent decades, with the unsampled regions being concentrated at the poles and over Africa. Three existing reconstructions with near-global coverage are examined, each suggesting that HadCRUT4 is subject to bias due to its treatment of unobserved regions.

    Two alternative approaches for reconstructing global temperatures are explored, one based on an optimal interpolation algorithm and the other a hybrid method incorporating additional information from the satellite temperature record. The methods are validated on the basis of their skill at reconstructing omitted sets of observations. Both methods provide superior results than excluding the unsampled regions, with the hybrid method showing particular skill around the regions where no observations are available.

    Temperature trends are compared for the hybrid global temperature reconstruction and the raw HadCRUT4 data. The widely quoted trend since 1997 in the hybrid global reconstruction is two and a half times greater than the corresponding trend in the coverage-biased HadCRUT4 data. Coverage bias causes a cool bias in recent temperatures relative to the late 1990s which increases from around 1998 to the present. Trends starting in 1997 or 1998 are particularly biased with respect to the global trend. The issue is exacerbated by the strong El Niño event of 1997-1998, which also tends to suppress trends starting during those years.”

    http://onlinelibrary.wiley.com/doi/10.1002/qj.2297/abstract

    In addition, many climate models seem to be getting the timing / intensity of the ENSO cycle wrong, but this will average out over periods of 10 – 20 years. The current dominance of La Nina, a weak sun and uptake of a lot of heat into the deep oceans are causing a temporary depression in warming, so your conclusions about climate models are premature.

  58. sault (#123052) –
    I agree that Cowtan and Way make a good point that coverage of the observational datasets is not complete. Rather than reconstructing temperature where there are no observations, thereby increasing the uncertainty, it seems a more reliable method would be to compare observations and model predictions only where there are observations. Ed Hawkins has done this. It doesn’t make the models look any better.

    As for the comment about ENSO etc., perhaps you overlooked that the trends in this post were calculated since 1990 (hence 24 years), and the error bars include “weather” (principally ENSO).

  59. HaroldW
    I didn’t neglect to mention the year. The post says “since 1990.” and the graph itself says both since 1990 and 24 years. I also didn’t neglect to say the spread of trends includes weather. I wrote

    It possible to estimate the ±95% range spread of trend due to “weather” based on that model by computing the standard deviation and assuming trends are normally distributed. That spread is shown with a verticle trace and delimited by the inner most blue cross hatches.

    I suspect you learned what the dates and lengths were and the fact that the spread of trends includes weather by reading my post. 🙂

    The issue of comparison: I don’t think the answer to the best way to compare is either/or. Both ways are good. However, leaving out grids to compute only where we have data requires downloading gridded data and doing something much more computationally intensive. If you’d like to do that, you’re welcome to do it.

    With respect to Ed graphs: they are all well and good. But, like the AR5, he’s shifting to a new baseline that makes comparisons look “better” than if they’d stuck to some constant one over time. Now this is “fair” of Ed in the sense that he is showing comparison of what the AR5 authors wrote. But it’s not entirely fair of the AR5 because by constantly shifting baselines, the anomalies will generally tend to “seem” to agree merely because the difference between anomalies and observations are subtracted during some very recent point in history. So it’s not really a test unless we wait a long long time. That would be ok if the need for waiting was based on physics. But it’s not: it’s based on an arbitrary choice to “decree” that comparisons will be made using anomalies and the arbitrarily chosen baseline will be very recent.

    (The agencies reporting anomalaies do mostly stick with the same baseline over time. It’s just those comparing models who start rebaselining and shifting.) And he shows anomalies not trends.

  60. Lucia (#123091) –
    I didn’t mean to imply that *you* weren’t clear about the start year, that sentence was intended as a response to commenter sault. He seemed to be claiming both that the models vs. observations discrepancy in your graph was at least partially due to the models’ inability to predict ENSO, and that “this will average out over periods of 10 – 20 years”. This seems to me to be inconsistent; a graph of 24 years would have “averaged out” the ENSO and therefore ENSO would not be a reason why the models’ trends diverges from observations.

  61. Looks like we can reject everything but the Chinese model FIO-ESM. Why do we continue to fund the other models?

  62. FGOALS_g2,

    interestingly doesnt do volcanoes in the correct way

    but it is one of the better performers

  63. Hm, yup, quick check reveals the ratio of the trends in percentage precipitation to temperature change in the last 20 years of the historical runs average about 1.1% per K.

    Compare to the observed:

    Trend (real world) from 1987-2006 in precipitation: 1.4% per decade. Average surface trend over same period: .2 K per decade: ratio of 7% per K.

    It gets much worse if I infer the surface trend by using de-amplified LT satellite data.

    Given how important evaporative heat flux is to surface energy balance, and by extension the energy balance at the top of the atmosphere, since the two must follow one another, if FIO-ESM gives realistic temperature trends, it can’t be doing so for the right reasons, ie having realistic feedbacks and hence sensitivity. More likely it just has crazy internal variability.

  64. Kenneth Fritsch (Comment #122802),

    A continuation of the discussion from the ‘Liljegren blizzard effect’ thread.

    I took a first look at the dataset you referenced after downloading the station_cc and station_tg files.

    My thought was to look first at trends in cloudiness. I took a somewhat different approach from yours and looked for stations that had at least 15 days of reported cloud cover conditions every month during the years 1980-2010 (a la Cowtan & Way’s treatment of the gridcells). This gave me 199 stations. I have not looked at their geographic distribution.

    I then calculated the linear least squares trend for each station over this time period and looked at the distribution of the results. This of course assumes that the cloudiness scale is roughly linear in something that might later on relate to temperature. The mean trend was -0.001 cloudiness units per year with SD 0.024 units per year. I didn’t try any fit tests (and would welcome your thoughts on what more to possibly do) but visually that is very close to no trend in cloudiness.

    I don’t think this “proves” that there’s “no trend” in cloud cover but it seems less likely that there’s a major change in global average annual cloud cover over this period that would be causing changes in temperature, but I think I would like to do the trends monthly and in some sort of regionalized analysis before moving on to try to look at temperature trends on cloudy vs. clear days.

    <Do you think that makes sense? For example, let's say there was a global trend toward increased cloud cover in the wintertime in a given hemisphere, that decreased the annual average temperature on "cloudy days" more by increasing the average # of cloudy days in the winter than by decreasing the temperature on the average cloudy day in any given month. This would require doing something like a monthly anomaly by broad latitude ranges to find this type of effect, which I think is physically plausible especially for internal variability. I'm tempted to think one should check pretty exhaustively for this sort of thing – e.g. a "local cloud forcing" before moving on to temperature trends.

  65. Steve Mosher (#123113):
    “FGOALS_g2 … is one of the better performers”

    Forster et al. (2013) does not list FGOALS_g2, but it shows FGOALS_s2 with an ECS of 4.2 K and TCR of 2.4 K. AR5 WG1 Table 9.5 does not show FGOALS_s2, but has FGOALS_g2, with a TCR of 1.4 K (no ECS given). That’s a huge difference in TCR. Do you know anything about what distinguishes those two models?

    And what happened to FGOALS_s2? There’s a brief note here: “Withdrawn by the FGOALS group in early 2013.” Politically incorrect results?

  66. HaroldW,

    If TCR 2.4 / ECS 4.2 was replaced with TCR 1.4 for political correctness, that’s some politics I’d be interested in.

  67. bill_c,
    FGOALS is a Chinese effort. China is set on accelerating their development, primarily with fossil fuels. It does not seem to be in their political interest (at this stage, anyway) to agree to aggressive mitigation. Hence, a lower sensitivity might well be politically correct.
    Of course, this is purely speculation on my part, absolutely zero facts. It might be that they discovered a great big bug which called the project’s results into question, or that the project was terminated for budgetary reasons. The panda ate their homework. All sorts of possibilities.

  68. HaroldW (Comment #123202)

    When I was previously studying the literature on the CMIP5 models I found that the FGOALS-g2 was unique in working with the land, ocean and atmosphere separately and then combining that data with a coupler process. When I evaluated all the CMIP5 model temperature series I found this model to score the highest in the Kolmogorov-Smirnov tests when compared to the GISS observed series and in the Box.test for independence of residuals when I ARMA modeled the CMIP5 climate model temperature series. In other words these tests indicated that this model was closest in character to the observed temperature series.

    I preferred using the GISS climate models, which scored second highest, when comparing climate models to observed temperatures, since FGOALS-g2, while having 4 Historical model runs up to 2005, had only 1 model run in the RCP series where the series extended beyond the current year.

    http://159.226.119.58/aas/EN/abstract/abstract2177.shtml

    “FGOALS couples the ocean, atmosphere, land, and sea ice through a coupler that coordinates the component models and passes the exchange of energy, momentum, and water among them…

    ..FGOALS2 has two versions, FGOALS-s2 and FGOALS-g2, which share the same coupling framework, ocean and land components, but adopt different atmospheric and sea ice components. Although the development of FGOALS2 is based at LASG/IAP, it is a cooperative effort among several Chinese research centers/institutions. The First Institute of Oceanography, State Oceanic Administration, has contributed to the setup of the coupler used in these two versions. The Center of Earth System Science (CESS), Tsinghua University, has devoted itself to the code optimization of FGOALS-g2. More than half of the FGOALS-g2 CMIP5 experiments were performed on the supercomputer at CESS. The Sate Key Laboratory of Atmospheric Boundary Physics and Atmospheric Chemistry (LAPC), Institute of Atmospheric Physics, has been dedicated to the ocean carbon cycle module of FGOALS-s2. The School of Atmospheric Sciences, Nanjing University, has long been involved in the assessment of the model. The Supercomputing Center of the Institute of Atmospheric Physics and Supercomputing Center of the Chinese Academy of Sciences, have provided most of the computer resources for the FGOALS CMIP5 experiments.

  69. Based on %precip/Temp, I’d guess FGOALS_g2 is somewhere close to ECS ~2 K.

    Now, it’s ratio is still *way* lower than the observed ratio (which I expect to be roughly inversely proportional to the sensitivity)

  70. bill_c (Comment #123116)

    I have read a paper on cloudiness (it may have been the one on the temperature trends on cloudy versus clear days in the Arctic that I linked to in our previous discussion) where the authors were attempting to separate the temperature trends into the parts that were due to effects of clouds and the changes over time in numbers of cloudy days.

    What I did was to test directly the Weng implication that over oceans clear conditions lead to greater temperature trends in the lower troposphere and further that that evidence implicated the unadjusted satellite measurements of introducing a bias towards loer trends due to an artifact of that measurement as effected by clouds. Perhaps land temperature trends cannot be translated to ocean and vice versa but the greater trends I saw with clearer conditions agreed with the Arctic paper and would suggest that the Weng implication of the satellite measurements without corrections might merely be confusing the effect of cloudiness on temperature trends.

    I was hoping to find more analyses in the literature on the effects of cloudiness on temperature trends and was rather surprised that I did not. The effect is probably more complex than simply a measure of cloud cover as the kinds of clouds are probably important.

    Remember here that I compared trends using the daily mean temperatures for localized areas (stations) and used the temperature that fit one of two cloudiness levels. I may have a blind spot here but I do not see how these trends would be affected by a change over time in cloudiness. That change, the way I see it, would only provide fewer clearer and more cloudy days in the later years but not change the basis for measuring trends. If I had merely measured trends without reference to cloudiness then, yes, I can see where the trends could be affected by both changes in cloudiness over time and the effect of cloudiness on the trend.

  71. Kenneth Fritsch (Comment #123206)

    I should be careful to note that while FGOALS-g2 temperature series is characterized favorably visa visa the observed temperature series in my analysis above, that characterization is essentially of the weather noise.

  72. I’d really be interested knowing a bit more of the history behind CanESM2. This is a Canadian model, yes? It’s interesting because there seems to be a pattern there. There was a bit of a controversy in the early 2000’s over:

    Climate Change Impacts on the United States: The Potential Consequences of Climate Variability and Change, A Report of the National Assessment Synthesis Team, 2000. U.S. Global Change Research Program.

    Which relied very heavily on a Canadian climate model, CGCM1, that was a clear outlier amongst all other models, showing the most extreme warming.

    So I’m asking whether CanESM2 is some kind of descendant of that model, or if it’s a coincidence, or possibly just as sign that our Canadian friends are a bit far out on the issue?

    No offense to Steve or Ross, of course, or any other Canadians I can’t remember at the moment. I just imagine that’s the way things are, politically, in Canada.

  73. Kenneth Fritsch (Comment #123209)
    January 30th, 2014 at 12:57 pm

    In my thought experiment, if winter cloudiness increased and summer cloudiness decreased at a given station, then the “average cloudy day” would be more influenced by winter conditions than previous to the change. This would lead to a trend where average annual “cloudy-day temperatures” appeared to decrease. I follow what you did. The question I was looking at is whether the different trends you found (0.3 for sunny days, 0.2 for cloudy days, excluding “partly cloudy” type conditions) could be caused by an increase or decrease in cloudy days in a particular season.

    Then I discovered the stations were all in Europe! (obvious oversight on my part). That cleans up my question about geography. Also, I looked at the trends in cloudiness by month for the different stations and didn’t find anything interesting. I guess I could look at the trends in conditions 0-1 and 7-8 that I think you used for the temperature analysis.

  74. Kenneth Fritsch (#123206)
    Thanks very much for the link. It surprises me that changing only the atmosphere and sea-ice components can have such a striking effect on TCR. [Presuming, of course, that both versions of the components are plausible representations.]

  75. HaroldW (Comment #123230)-I suspect it’s more the effect of the former than the latter. As I understand it, ice albedo feedback should not be very important going forward into a warmer climate from a climate without mile thick midlatitude continental ice sheets.

  76. Andrew_FL (#123249) –
    I can well believe that ice albedo feedback isn’t particularly critical. I recall a paper giving a change in forcing of ~0.1 W/m^2 for the Arctic sea ice change to date, so on the order of 5% of the greenhouse-gas forcing. Not much of a feedback, in other words.
    But thinking about it now, the atmosphere component would contain the cloud feedbacks, so of course that can make a big swing in sensitivity. I was thinking that heat uptake is mostly in ocean, which is true as far as that goes, but … well, no excuse really for writing something silly.

  77. HaroldW (Comment #123257)-Yes, but one should be careful about generalizing statements about this particular feedback-it is assymetric and nonlinear. It’s very important for the Ice Ages, I think we know that much.

    Which is one reason (among several) I never really bought that one could determine the sensitivity to further warming from current conditions, on the basis of large changes from much colder conditions.

    BTW, for comparison, between 1987 and 2006, I estimate an increase in surface evaporative cooling on the order of ~2.42 W/m^2, which is a lot bigger and a feedback in the opposite direction.

  78. sault (Comment #123052) says:

    “The methods are validated on the basis of their skill at reconstructing omitted sets of observations.”

    I think there is a logical error in this – the assumption is that you can take any place where observations exist, and use this as a test.

    But the falacy of this is that the real regions that have no observations, such as the polar regions, are extremes, and therefore you cannot use any non-extreme region for verification purposes – apples and oranges.

    Example: take central India and Mongolia as two regions, and extrapolate what lies between. Then compare that with the Himalayas where we actually do have observations of an extreme environment. The extrapolation isn’t going to match very well.

    So the reason that we have no observational cover of extreme regions is exactly the same reason that extrapolations cannot be trusted.

  79. I always enjoy these posts, it’s skeptics revenge. Ya know, those complete moron, idiot deniers who are anti-science, anti-truth, anti-planet or whatever doing the evil business of plotting model trends with confidence intervals. I almost feel bad for the team.

    Then I think back to the Santer paper which gathered together a bunch of authors to use obvious statistical fakery (shortened timeframes) to conclude models are matching observations. I feel better now.

    And as to the analysis of noise, there is no proof presented here of what the actual distribution of the noise is so arima or whatever might not present good CI’s. If this gives comfort to any “scientist” that there is still hope for models with trends so amazingly far out of whack that they are proven wrong with basic methods, then they are simply not acting as scientists. I don’t see how a rational being can look at those plots and say “well, that is running 19:1 odds that it is a bad match (instead of 20:1 proof) so it is still ok”

    And in the previous comments, just because some goofball has just started to change his tune about the models now, well after they have been shown to be complete failures, does not make him or her a rational or reasonable person. While it is a superior result to the climatologically insane medicine men of AGW doom who haven’t changed their tune yet, many of us had this VERY simple concept worked out a long time ago.

    Thanks again Lucia, the post was awesome and fun as usual.

  80. Andrew_FL (Comment #123217)

    “No offense to Steve or Ross, of course, or any other Canadians…”

    None taken by this particular Canadian. My guess is that CanESM2 is an evolution of AGCM1. I doubt if we have the resources to do multiple blank slate development project.

    You might be able to find more info using this page:

    http://www.cccma.ec.gc.ca/diagnostics/cgcm4/cgcm4.shtml

    It doesn’t discuss the history, but maybe the links will lead to something.

  81. If we removed the GISS adjustments done to data during the time frame analyses would the models then be wrong as well?

Comments are closed.