Back when Easterling and Wehner 2009 was published, I wrote a comment which was bench rejected (meaning not sent out for review). I didn’t post– figuring we could just wait to see what temperature did. But Paul Matthews asked me to post the comment. As some know, I suffered a hard disk crash, so I’m not entirely sure this is the submission. But for what it’s worth: Comment on Easterling_Liljegren.pdf.
The comment was short (as required by word limits.) It was bench rejected. I didn’t save the wording of the letter– but my impression was that the associate editor represented the position that they didn’t really accept comments at all. So, I would need a full submission. So, I wrote something longer (which I can post if someone wants. It’s not much longer.) That was also bench rejected.
I can’t necessarily claim it is a gem. But it does discuss the problems with EW, which are very real.
The power of editors as gate keepers of science shows up again. If they want it published, they’ll publish it regardless of the critical nature of comments on the manuscript. If they don’t want it published, they can game the system to stop it from being published on editorial grounds.
Of course that strategy can lead to black eyes. Dana Nuccitelli may not be sporting “shiners” for his cartoon-world 97% paper, but he’s been getting them.
Since I assume your income does not depend upon it, I think more important than publishing is to sit down and write a potential paper/comment that you judge is worthy of publication and more importantly one that you judge makes a proper argument with good evidence.
I write letters to the editors of the Chicago Tribune just to able to demonstrate to myself that I can make a proper argument and articulate it. I have had a few letters published which in my mind were not the best I wrote and the same goes for ideas, implementations of ideas and patents I have received – the best were never patented.
Related to this topic is what I see as a two headed explanation of the recent pause in warming, namely 1) the subject which you wrote about and the appearance of longish pauses in the warming in the historical record and climate model simulations and 2) the warming of the deep oceans. What I see as lacking here is whether these people throwing out these explanations see a connection between these 2 conjectures. It is very difficult for me to believe that there is a connection between these seemingly random and chaotic swings in model temperatures and those swings being related to model considerations of changes in ocean warming. Or is in more like if you have trouble accepting a given explanation here is another – and maybe we could come up with a third one?
Perhaps they bench rejected because they felt the title meant you are commenting on Easterling and Liljegren, a paper they didn’t publish.
Lucia,
I sent an email to Easterling at about the same time, commenting (respectfully) that the model used in the paper showed much higher variability than the Earth’s surface temperature, which made the paper’s conclusions questionable. I never got a reply. I concluded that it was mainly a political advocacy article, not one really related to climate science. (This is a recurring theme in climate science it seems. 😮 )
.
I do wonder if well known scientists like Easterling can appreciate how much damage they do to the credibility of climate science by publishing such tripe. My guess is they are so blinded by their progressive/left/green politics that they can’t see the damage… or perhaps just don’t care any more about the technical validity of their publications. ‘ Saving the world’, a la James Hansen, seems more important to them than ‘the science’.
SteveF–
The paper is a disgrace. But it seems to be accepted and cited. Yet, anyone who reads the paper and looks into the details of the arguments can only marvel at the cojones that someone writing it must have.
The lack of a clear logical description is what muddles most thinking I find, both in business, computing and climate science. I suspect that this follows that trend.
To follow that up let me offer the same proposal as I did on Tamino’s and was summarily dismissed for (as you can see if you pop over there).
Take any of your climate temperature data series you wish and strip off any meta data that says it is climate.
This is in order to make it a ‘blind’ test.
Then submit it to an audio lab, telling them that all you know is that the main ‘cycle’ is 1461Hz and you would like a summary of the power as it is distributed in time and RMS along with any sub-harmonics that are present.
Mention it is a very short tape.
Ask if a bandpass splitter circuit in digital would help the solution and mention 1/1.3371…. and see if that rings any bells. Then pass them my photobucket story url and ask them what they see.
I know what my Uncle would have said (as he taught me what this all means).
I have a lovely, peptidyl-targeted, nanosyringe drug delivery system that allows me to anchor a drug delivery vector at a cancer cell surface. We have a patent-pending and have published half of the system. I am getting tumor shrinkage at 1/20-200th of typical drug concentrations.
The NIH grant I submitted was unscored and not discussed at panel.
It is sometimes the case that its not what you know, but who you know.
I’m assuming it’s GRL you are talking about. One can’t complain about bench rejection of the comment. It is true that they changed their policy in 2009 to not publish comments.
As to the later bench rejection, they say:
“Editors may reject papers without review if contents do not fulfill GRL’s criteria for high impact, broad implications, innovation, and timeliness.”
I guess we’d have to see it before deciding whether that rejection was reasonable.
Nick Stokes,
“I guess we’d have to see it before deciding whether that rejection was reasonable.”
Fair enough. But we don’t have to see Lucia’s second document to judge that the Easterling and Wehner paper was technically very poor… some might even say ‘a disgrace’.
.
A little digging shows that Easterling got his PhD (1987) in geography from the University of North Carolina… perusing that department’s home page I get an impression of a touchy-feely-caring-about-people focus combined with an apparent lack of focus on hard science. Kind of makes sense in light of the technical quality of E&W: fluff.
Here is Easterling’s paper.
Link.
And here it is as printed.
I must say that Easterling and Wehner have an impressive list of references in their wobbly legged paper 🙂
Is Wehner the same guy texting his picture to young girls? 🙂
Nick Stokes,
Firefox warns that the link you provided is an “untrusted site” and recommends people not go there.
Nick–
Yes. Their published policy when I submitted the comment was to permit comments. It changed soon after.
That said: my comment was timely. And it was on a paper they published. Presumably if the paper was “high impact, broad implications, innovation, ” but wrong, the comment showing it was wrong would be “high impact, broad implications”. It might not be innovative but the fact that the original papers bungles were simple would be an odd reason not to correct them.
But the paper is a piece of crap.
Also Nick: I”m not so much complaining. I am posting the submission because PaulM requested when I mentioned its existance on twitter. The EW paper had been discussed there, I criticized it and DougMcNeal asked me what was wrong with it. So, if I am to explain what’s wrong with it, I need to post the submission. In the process I mention the history.
The paper is crap and if GRL has decided to have a policy where crap becomes nearly impossible to comment on or correct, more shame to them.
SteveF (Comment #117875) July 24th, 2013 at 6:06 am
“Firefox warns that the link you provided is an “untrusted site†and recommends people not go there.”
I survived. It’s lbl.gov
Lucia,
“Their published policy when I submitted the comment was to permit comments. It changed soon after.”
.
Coincidence, or causal? It may be hard to say for sure.
TTTM,
Extensive references do not a good paper make. In support of a terrible effort like E&W, lots of references are a bit like argument from authority… references do not address the paper’s problems.
Lots of bad papers get published in most fields. When they are really bad but with “high impact” (eg. Steig et al Nature) one or more papers usually get written which point out the original paper’s problems. If they have little impact, they may just be commented on (negatively) or ignored. The problem with climate science is that really bad papers which support the meme of extreme future warming, like E&W, Foster & Rahmstorf, or Steig et al rarely, if ever, lead to a refutation by anyone working in the field… that is always left to ‘outsiders’, if it happens at all. Any paper published in climate science which suggests warming may not be “as bad as we thought” is ALWAYS attacked by people in the field, often working in concert with Journal editors and publishers (see the UEA emails). This tilt in the filed is as clear an indication as there can be that those involved are far too motivated by political inclinations and policy advocacy, and not motivated nearly enough by a desire to truly understand how Earth’s climate will respond to rising GHG forcing. Sad that so much public money gets spent on this.
MikeN
My title is “Comment on “Is the Climate Cooling or Warming?†by Easterling and Wehner.”
Are you suggesting GRL would think the name of the .pdf file would be taken as the ‘title’?
Nick, SteveF –
The complaint is that the page provides a security certificate for [*.]nersc.gov, but the URL is for the lbl.gov domain. This sort of thing often happens when websites are rearranged, or domains change names.
One can accept the “risk”, but an easier change is to replace the “https” in Nick’s link with “http”, as secure access isn’t required for the E&W paper. That is, use this link instead.
It’s also worth noting that with regard to the abysmal flaws in this paper, it isn’t possible to write a “novel interesting” paper to discuss the errors. The errors are pedestrian, uninteresting issues — the sorts of problems that a person grading an undergraduate lab report would just put a red x next to and mark wrong.
So if comments pointing out the errors are not possible there is no mechanisms for pointing out the errors in a scholarly paper. The reasons are:
1) You can’t write a paper on the topic of “Why the expected value of E(x^2) is always ≥ than ( E(x’^2)) where x’^2 is is (x-E(x)). (This is essentially the statement that the variance of ‘all runs over all models is always greater than the average of the variance in runs. That is: The spread over the ensemble of different models is not ‘the weather’.)
2) Write a paper on: “El Chichon erupted and models say that affects the trend, but E&W used that to claim a down trend during a period of ‘no eruption’ is “expected”.
3) If someone (e.g. EW) is trying to persuade people ≤0 are ‘expected’, they should stick to finding how often trend ≤0 occurred and not count trends that are actually positive as negative and then decree these are evidence of how often non-positive trends are observed.
These sorts of statements while totally mistaken in EW are not the sorts of things that can be joined together to create a in interesting stand alone paper except in the instance where another papers results and conclusions are based on mistaken claims. In which case, the comment pointing out the eggregious bungling should be published.
But evidently, GRL now doesn’t permit people to publish these things. That means absolute ordure can be published in GRL and there is really no mechanisms other than blog posts to “rebutt”.
Our analysis consisted of fitting least-squares trends
to running 10-year periods in the global surface air temperature
time series for: 1. the observed record, 2. an ensemble
of long control simulations, 3. an ensemble of 20th century
simulations, and 4. an ensemble of simulations forced with
the Special Report on Emissions Scenarios (SRES)
[Nakicenovic and Swart, 2000] A2 forcing scenario for
the 21st century. This resulted in probability distribution
functions of decadal trends for each of the 4 sets of time
series.
Summary:
We took some sub-sampled (which was already spacially and temporaly undersampled in the first place) data from thermometer records and concluded that it fits within the rather large error bands that all the theories so far proposed about global warming give.
This is all based on the assumption that no natural cause could be doing it.
The science is proved.
And you wont be able to really tell for at least 30 Years.
ECHAM’s simulation of ENSO is laughable. It has a record-breaking El Nino or La Nina almost every other year. On the rare occasions it turns up a normal El Nino you’ll get zero trends for the next 20 years every time a preternatural Nina comes around.
SteveF–
It’s difficult to believe my submission would have caused the policy to change.
Well the biggest problem with Esterling’s paper is hit on in your conclusions–the premise that the data support a claim of 10+ year period of cooling is itself false.
If you begin your paper by falsifying that premise, there isn’t much else to write on.
Pointless to repeat the above. If you try to estimate from sub-sampled data when you have higher resolution data available (for the same period and place) then I am always going to suspect any conclusions drawn.
Lucia,
“It’s difficult to believe my submission would have caused the policy to change.”
.
Why is it difficult to believe? Igor asks: Dr. Frankenstein, the villagers are gathering at the gate with pitchforks… what should we do? Answer: Lock the gate. I do think reasoned arguments like your comment on E&W are precisely the ones which ‘consensus’ climate science does not want to address. Skydragon slayers and other assorted nutcakes are easy to discredit; politically better to talk only about the crazies and ignore all ‘skeptics’ who present reasoned arguments. Same reason Tamino bands your comments at his blog; it is just ‘lock the gate’ again.
Accept it Lucia. Admit it. It’s OK. You broke GRL. 🙂
Carrick–
Yes. The title suggest the purpose of the paper is to rebut the contention the earth is cooling. That could have been rebutted without discussing models, history or anything.
But the authors chose to do a while bunch of stuff- much of which is simply crap. And most of those citing the paper cite it for the crap content. The paper is thought to “show” that long term pauses are consistent with (i.e. not improbable given the random nature of ‘weather’) brisk (i.e. 0.2C/dec) warming. It shows nothing of the kind. It is just crap.
lucia (Comment #117895)
July 24th, 2013 at 4:27 pm
“It is just crap.”
Short form of my comment above I think. Still correct.
Lucia,
Don’t hold back, tell us how you really feel about the paper. 😉
.
You are correct of course, it is a very bad paper. But the more interesting questions are how such a terrible paper a) makes it into print, and b) why it is not laughed out of the room by main stream climate science. The field of climate science is not well, infected by the disease of green/left politics.
Was that Easterling paper one of Cook’s 97%?
Lucia, thanks for this. I recall that you discussed this very misleading paper on your blog, but I do not recall hearing that you had written a comment until you mentioned it on twitter the other day. It’s a useful resource.
I think that in some ways the E&W paper is even worse than you say. The fact that they do not not mention the 1982 El Chichon eruption raises two equally worrying possibilities- either they are not aware of it and it’s established influence on forcing, or they are aware of it but chose not to mention it.
Your figure 1 is very good and shows just how wrong the model they chose is, a fact obscured by their different scales in their figs 1 and 2 that the casual reader might not notice.
Stevef is right, the real scandal (as with climategate and Marcott) is that they got away with this unchallenged by the mainstream climate science community, while a paper blowing the whistle is bench rejected.
SteveF (Comment #117897)
July 24th, 2013 at 6:01 pm
Lucia,
“The field of climate science is not well, infected by the disease of green/left politics.”
To ascribe such motives to people is a political comment, not a scientific one. Being Left or Right does not make you correct. Abiding by the facts does.
RichardLH,
Noting behavior among climate scientists which appears politically motivated is not necessarily expressing a political opinion. What I object to is green/left politics influencing the field, and I think the (objective) evidence for that is strong.
Following Lucia’s comment #117885 about how it’s not possible to write an interesting paper discussing the errors, Richard Tol said something very similar this morning at BH regarding his comment on the Cook 97% paper
“I am glad that Hulme too recognizes that some research is so bad that it is beyond a constructive response; and that if such research is published, destructive comments should be in public.”
Lucia, I have looked at the RCP45 temperatures series in the manner you discussed in your comment on the Easterling paper. What I did was compare the SEs for trends that were adjusted for autocorrelation using Monte Carlo simulations between the model runs and the Observed series of HadCRU4, GISS and GHCN.
Using the lower CI limit as an indicator of how easy/difficult it is to obtain a trend not statistically significantly different than zero, it is obvious how those variations can very much favor some models over other models, and additionally over the Observed series, in obtaining zero trends. The results are rather dramatic and I’ll post a link to them when I get the results in good form.
If we can use the adjusted CIs of the Observed series as an indication of what these variations should be in reality, I think it can be shown that some models (and the ones for a propensity towards zero trends) are significantly different.
Kenneth,
Why don’t you write a guest post when you do significant work?
SteveF (Comment #117978)
July 27th, 2013 at 3:56 pm
I would rather link the data and let others decide if it has any value. I do not expend sufficient effort to always get it right the first time through.
I would like to get some feedback from Lucia on this particular issue as she has done considerable background work in this area and she does not pull punches.
In attempts to compare the RCP4.5 CMIP5 model runs with the Observed series of HadCRU4, GISS and GHCN I have come up against the issue of how to handle the noise in the model series. Since the models are constructed from deterministic considerations I have noticed that the noise in models is not handled in most of the literature the same as the noise in the Observed series. I find this a bit disconcerting since the models are supposed to emulate the Observed series. Be that as it may, I have compared of the RCP4.5 model runs with the Observed series with two different approaches.
In the first approach I looked at the combined RCP4.5 model run data over the instrumental period of 1880-2013 May and determined reasonable periods into which I could divided that series for comparison with the Observed series. Breakpoint analysis gave a natural divide at 1880-1963 and 1964-2013 May. The strategy here was to avoid dealing with the noise factor of the models and using a large array of model data so as to average out most of the higher frequency noise. I attempted to do this using the approach of Santer et al. (09) where the authors compared the model mean trends to the observed series trends for the surface and lower troposphere temperatures. Here the model means yield a variation with can be estimated from the standard deviations of the trends around the mean trend for all the models. The standard deviation of a given Observed series is determined from the trend standard errors that are adjusted using an appropriate method by the more direct means that Santer et al applied or by modeling the Observed series with an ARMA model and then doing Monte Carlo simulations to obtain the confidence intervals and thus the standard deviation. I chose the latter approach and obtained the following with trends in degrees C per decade:
Period 1880-1963:
HadCRU4 Trend=0.0449 Stdev=0.0129 n=1
GISS Trend=0.0276 Stdev=0.0191 n=1
GHCN Trend=0.0402 Stdev=0.0147 n=1
RCP4.5 Trend=0.054 Stdev=0.0195 n=42
t values
RCP4.5 vs HadCRU4 = 0.69
RCP4.5 vs GISS = 1.37
RCP4.5 vs GHCN = 0.92
Period 1964-2013 May
HadCRU4 Trend=0.155 Stdev=0.0180 n=1
GISS Trend=0.144 Stdev=0.0145 n=1
GHCN Trend=0.155 Stdev=0.0145 n=1
RCP4.5 Trend=0.210 Stdev=0.0449 n=42
t values
RCP4.5 vs HadCRU4 = 2.84
RCP4.5 vs GISS = 4.05
RCP4.5 vs GHCN = 3.41
From these results it can be seen that for the period 1964-2013 May, a period of nearly 50 years, that the RCP4.5 models yield a significantly larger mean trend than do the Observed series. It should additionally be noted that even individual runs from the same model can have large trend differences over this long period of time. Averaging out the noise for a particular model must require large numbers of runs rather than long periods of time for an individual run.
In my second approach I look at individual models and noise that I treat as stochastic and at the same time deal with the issue of models producing long time periods with zero or negative trends. The results are summarized in the two links below that contain a table in two parts. In obtaining these results I calculated and present in the table for the RCP4.5 model runs and Observed series, the 1964-2012 trends, the unadjusted standard errors for the trends, the t.value and r.square for the regression of the models series versus time, the standard deviations of the regression residuals, the longest period in the series where the trend was zero or less, and the p value from a Brown-Forsythe/ Levene non parametric test for equality of variances where I used the GISS and GHCN series compared against one at a time the HadCRU4 and each of the RCP4.5 series. In addition I found an ARMA model that best fit all the data ARMA(2,0) and used that model to do 10,000 Monte Carlo simulations to determine an adjusted standard error for all the series. The Box.test shows that some of the series do not produce ARMA residuals that are independent and that result may have some effect on calculating an adjusted standard error. Some of the models could produce better Box.test scores using a seasonal component – but not all. I chose not to use a seasonal correction in order to keep things simple.
From these results it can be seen that the model series variances are mostly larger and different than those of the Observed series. Even within the Observed series, the HadCRU4 series has a different variance than that of the GISS and GHCN series. What is notable is that a number of the RCP4.5 series have longer lengths of time with trends of zero or less than the Observed series. This can be seen even when the model series have a long term trend for the period 1964-2012 that is larger than the Observed series. Most of this effect is manifested by those models with larger variances and higher long term trends having a slow ramp up to the 1970s and 1980s and then climbing more steeply from there to to current time. Obviously the larger variation in those model series plays a major role in obtaining lengthy periods with zero or less trend. To that end I attempted to model those effects by linearly regressing the maximum length of zero or less trends versus the 1964-2012 trend and the series variation as expressed either in the adjusted standard error or the standard deviation of the regression residuals. The trend and the variations of the series were determined not to correlate and thus can be used as independent predictors in the model.
Obviously the adjusted standard error would include the effects of the standard deviation of the regression residuals, but could add to the variation by the effects due to the auto regression. Below I give the adjusted R^2 values for the given regressions:
Length~Trend+Adjusted SE: Adj. R^2 = 0.55
Length~Trend: Adj. R^2 = 0.25
Length~Adjusted SE: Adj. R^2 = 0.26
Length~Trend+Stdev Residuals Time Regression: Adj. R^2 = 0.51
Length~Stdev Residuals Time Regression: Adj. R^2 = 0.05
These results show rather conclusively that the variances in the model series are related to length of periods of zero or less trends. The detrended variances are actually an indication of the model/Observed series excursions away from the regression line. While these excursions will probably be less in most models when examining the predicted series past the present time, there will be excursions and those excursions plus the long term trend should remain good predictors of the maximum length of zero or negative trends.
It would appear that the favored comparison of the modeled and Observed series would be that used in the first approach from this analysis where a large number of model runs trends can be averaged in order to reduce the “weather noise”. The second approach in this analysis does, however, present some interesting questions about how well we should expect a model series to follow the Observed series excursions from a straight regression line. If the Observed series are merely a single realization of many possible ones generated chaotically, it would fit the observation of the models where multiple runs can differ considerably in the excursions of the series from the straight regression line. It would also put a large uncertainty into predicting future temperatures and even in hindcasting previous ones.
I might on further analysis look in detail at those models that had variations similar to the GISS and GHCN series, and particularly when the series are projected into the future. Those model series appear to have trends more in line, but somewhat larger, than the Observed series.
http://imageshack.us/a/img854/2285/cn85.png
http://imageshack.us/a/img809/936/karj.png
I have noticed that my imageshack links in my previous post would not enlarge so I am trying the links again below with a bigger original image.
http://imageshack.us/a/img541/7098/pesw.png


http://imageshack.us/a/img690/6934/cp3c.png
Kenneth
There are formal non-parametric tests that can be done to test whether any feature of models are different from each other. I need to dredge them up. (I also need to organize stuff for a formal methods paper.)
We could compare this. But if the goal is to simplyshow that models are different from each other, it makes more sense to do something lower order. For example:
1) Show rms from linear trend is different.
2) Show lag 1 correlation from models is different in different models.
3) Show trends for matched time periods are different.
And so on. Each of these can be a test statistic in it’s own right, and I think they are all less complicated than the computed standard error in runs trends based on Quenouille correction. That means explaining the test statistic is simpler, and one isn’t like to get into an argument of a confused person thinking that if you test the SE based on Q correction that means you are either claiming or implying the ‘noise’ is AR1.
Mind you– I think there is nothing wrong with testing whether computed SE’s are different across models. It would be rather odd for rms to be different and lag 1 correlation to be different while somehow, magically the computed SE ended up identical across models. But the latter are simpler so it seems to be better to test the less involved test statistics.
I have a module for these things btw — I just haven’t discussed them all.
BTW: It really would be good if you wrote some stuff up as a guest post. Scattering in comments makes it difficult for readers. But if you prefer not to guest post, that’s ok too. (I’ll now click through the images and try to make them appear in the comment for others.)
Kenneth–
Viewing the spread sheet, I guess my questions are:
1) What did you apply the leven test too. (I know a variance. But you have a levene test for hadcrut4. So…it’s variance is tested to be equal to what other variance?)
2)Generally, I tend to be seeing lots of “answers” with lots of detail to explain why you think your method of finding that answer is sound. It may very well be a sound method. But I’m not sure I know what “the questions” are so I”m a bit lost.
I find I’m much better at grasping things if I can read a clear exposition of what basic statistical question you are asking rather than seeing a lot of detail about how you went about answering it. So: What are you trying to discover? The maximum length of a trend in a model? Whether the residuals to trends are the same in models or earth? Are there several questions?
I know you might think “I’m trying to answer the question Lucia was answering. So she should know what it is.” But it would really be easier for me if you restated what the question you think you are investigating is.
lucia (Comment #118037)
August 2nd, 2013 at 11:26 am
The levene.test was applied to the residuals of regression with time of the observed and modeled series. As it turns out the residual distributions are not normal and thus I used the non parametric levene test to determine whether the variances of the GISS and GHCN residual series (which are the same by the levene.test) are the same as the HadCRU4 and the 106 RCP4.5 model run residual series. I did this by pairing the GIS and GHCN residual series with the HadCRU4 or one of the individual RCP4.5 model residual series.
I have not intended to claim I have any particular answers – but I do have questions based on my analysis. Most of this analysis of the second approach in looking at individual models revolves around measuring (and in non conventional manner) the non random excursions of the series away from the straight regression trend line. Obviously one can claim that the trend should not necessarily be a straight line, but here I use that regression only as a means of comparison of the Observed and model series and further since it is a comparison I am not as concerned about the non conventional approach.
My first approach was one that I see as more conventional where the mean of model trends and the variation around it are compared to Observed series and its adjusted standard error in a linear regression. Obviously this same process would be appropriate for use with assumed non linear temperature series. One question with that approach would be whether there is a lower limit in the decade range on the time period that can be used to make this comparison. Another question is how valid are the variations in trends that can arise from a model with multiple realizations and the varying excursions from the trend line for these individual model runs. If these variations are valid and due the chaotic nature of climate, then comparing the single earth based realization that is our historical climate becomes an almost futile exercise in validating climate models – as that single realization could have had different manifestations just like the models.
The second approach in my analysis merely points to these differences that can occur between models and individual model runs for a given model. A question arising out of that analysis is at what point can one state a significant difference exists between an Observed series and a model run series based on such metrics as variances in the series residuals or the number of zero or negative trends of a given length of time appear in the series or the maximum length of a zero or negative trend in the series. Or again are we here merely looking at some manifestation of the chaotic nature of the climate and further a chaotic nature captured by the models?
A more general question that these analysis bring to the fore, at least for me, is how stochastically can a deterministic process like the result climate modeling be treated in statistical analyses. Since the models are supposed to be closely emulating the Observed series it would seem contradictory to treat the Observed and model series differently. What I found in this analysis was that that the observed series could be very readily and well fit to an ARMA model while only some of the modeled series runs could be fit well. Some models could be fit using a seasonal ARMA component while some others could be fit with a seasonal component that required an ARIMA model with d=1. Some others could not be fit with any variation of the ARMA or seasonal component. I am puzzled why this might be the case, although I suspect it might have to do with the series excursion from the trend line.
I should be clear that when I talk of excursions from the trend line I do not me the large random fluctuations but those that are less random and more those expected from an auto correlated series or a series with long term persistence.
I should also be clear that I think we can all agree that the long term (how long?) trends in temperature series, whether linear or non linear, are driven by the increasing GHG levels in the atmosphere. Some call that the secular trend and others call it the very low frequency component of a temperature series. Those long term trends in the Observed series can be confused by low frequency cyclical events that could be purely or mostly from natural processes and from longer lasting “weather” noise. In the models we see long time period excursions of the series from the long term trend line.
We thus have these long term trends in the Observed and modeled series that could be used to determine how well the models can track the historically observed temperatures in response to increasing GHG levels. The problem in my eyes is separating that long term trend from the effects that cause longer time period excursions away from that trend. i.e. how much is that long term trend distorted by those excursions and make estimating that long term trend a difficult proposition.
Kenneth–
Ok. This one I get.
Now for the rest:
Right now, my difficulty is that with respect to questions, you seem to be describing questions you have about the bark on trees. I am trying to find out what you are trying to learn about “the forest”. Could you state a (or the) null hypotheses your test are designed to test? A simple statement should take less than 1 sentence. (I ask because I’m reading a lot about your “approaches”, but I don’t know the null hypothesis you want to test.)
Lucia, at this point in my analysis I do not have a null hypothesis to test because the questions that I pose would have to be addressed before doing so. The obvious null hypotheses are that (1) the models collectively and the observed temperature record are in agreement with regards to response to changing GHG levels in the atmosphere and (2) some individual models are in agreement with the observed temperature record with regards to changing GHG levels in the atmosphere. I am not at all sure how I would go about testing these hypotheses without making some large assumptions.
I also think that if the null hypothesis cannot be rejected, the resulting agreement of models and observed temperature trends would require very large confidence intervals. In other words, when climate scientists point to models that have very lengthy pauses in warming and at the same time imply that models can capture (and predict) the observed temperature series, to me, that is merely another way of saying the CIs are very wide.