The David Rose of the Daily Mail wrote a post with the title Global warming stopped 16 years ago, reveals Met Office […] has set off quite a snit. In turn, the Met Office posted a their answers to questions David Rose asked the Met office. Naturally, this spawned quite a kerfuffle, Anthony has posted twice!
I was in the middle of a “script-reorganization” so I couldn’t comment quickly. I’m at a stopping point where I can discuss things so I’ll pause to comment a bit claims in Rose’s article and statements in the Met Office’s response to David’s question. I’ll probably post again next week to engage the issue “cherry picking”. (That discussion requires the planned script functionality extension which motivated the “organization” activity.)
First: I want to state claims a reader coming across Rose’s article might infer he is making. Three of these claims appear to be:
- Global warming has occurred. He writes: “… global warming is real, and some of it at least has been caused by the CO2 emitted by fossil fuels”.
- Warming may be slower than the level catatrophists (whoever these people might be) have claimed. He writes, “the evidence is beginning to suggest that it may be happening much slower than the catastrophists have claimed”
- Global warming stopped 16 years ago. This claim is in the title presented in bold to anyone who comes across the article.
Given the kerfuffle, it’s worth nothing that the three claims taken collectively are entirely in agreement with this “clarifying” graphic the Met Office shows:

Unfortunately, if presented as a rebuttal or response to any question Rose actually asked, that graph is an utterly irrelevant red herring.
Because the Met office graph is irrelevant as any sort of “rebuttal” from the Met Office, I’ve pondered a bit on organize my comments on these claims. I’ve decided the best thing to do might be to simply show the answers I would have given to Rose’s first two question. The third is such a strangely worded leading question that I don’t know how it really can be given a good answer. (I think the Met Office’s answer is rather silly– but I’ve developed some sympathy for that after I tried to think of a good answer to it. Possibly, the best answer would be “I don’t understand what you are trying to ask. Could you clarify?”)
Now, on to my answer to Rose’s two questions:
Q.1 “First, please confirm that they do indeed reveal no warming trend since 1997.â€
I would have understood this to mean no real warming trend since Jan 1997. So my answer would have been: No, I cannot confirm “no warming trend since 1997”. Using HadCrut4, the trend from January 1997 – Aug 2012 is positive. That is: There has been a slight amount of warming during that period.
I might have supplemented that with a graph. The one below shows the trends based on HadCrut4 and two other agencies:

(Note: circles are expansion to include measurement uncertainty. I haven’t updated HadCrut4 — it uses HadCrut3 info.)
The HadCrut4 trend is shown in purple. At +0.05C/dec, it hovers above the solid black line indicating “no trend”. So, it shows warming. Slight warming– but still, warming.
No more blathering required for this question.
Q.2 “Second, tell me what this says about the models used by the IPCC and others which have predicted a rise of 0.2 degrees celsius per decade for the 21st century. I accept that there will always be periods when a rising gradient may be interrupted. But this flat period has now gone on for about the same time as the 1980 – 1996 warming.â€
My answer to this would be that if Jan 1997 was not a cherry pick, and if HadCrut4 contains no error, then the trend since Jan. 1997 suggests temperatures are rising more slowly suggested by the multi-model mean in the AR4. That is: The mean warming in the AR4 models is faster than consistent with the earth weather since Jan 1997.
I diagnose this using the trace circled in red below:

The trace outlined in red shows the multi-model mean over models that used A1B to project into the 21st century. The blue error bars indicate the ±95% uncertainty in trends estimated using a “Model centric” method. That is: Examining models with more than 1 run projecting into the 21st century, I estimated the variance in 188 month runs for that model. I then found the average variance over the 11 models that permitted that operation, and took the standard deviation and used that as the “multi-model” estimate of the standard deviation of 188 month trends in a particular model. The ±95% spread in weather is illustrated by the span of the dark blue uncertainty intervals inside the red oval trace. Notice the bottom of that span lies above the HadCrut4 188-month trend.
This means that based on that particular choice of start year, when compared to HadCrut4, on average the AR4 models are running warm. This is true even though the HadCrut4 shows a warming trend.
At this point, I might have discussed cherry picking a bit. A case can be made that in this particular case, Jan 1997 is a rather inapt start point. But… if I were the Met Office suggesting that, I know perfectly well, the next question is:
I’m asking about 15 years because that’s a claim in a peer reviewed paper. What about that old claim that 0C/dec could last up to 15 years?
These papers use a method that permits the Met office to write misleading things like this:
So in that sense, such a period is not unexpected. It is not uncommon in the simulations for these periods to last up to 15 years, but longer periods are unlikely.
The analysis method used to justify claims like that involved taking standard deviation of trends over all models runs ignoring the fact that different models show different trends. Using that standard deviation around the multi-model mean shows the HadCrut4 trend falls inside that spread of runs; it also shows 0C/dec very near the lower boundary of the ±95% confidence intervals. It is on this basis that (some) people justify stating that we should “expect” to see 1 in 40 “earth” trends as low as 0C/dec even if the underlying trend matches the multi-model mean.
As I observed at the time: If you are trying to detect whether the ensemble is biased that method is wrong. Moreover, if you are trying to estimate the variability of earth trends (or even the spread of due to weather in a typical model) that method is just wrong.
It’s wrong because with respect to deciding whether the difference in the multi-model mean and the earth realization is due to weather we need as estimate of variability due to “weather” or “internal variability”. And real difference in mean warming from model to model have nothing to do with “weather”, or “internal variability” or anything of the sort. That is the structural difference in mean warming in different models. Not. Weather.
An estimate of the ±95% spread of trends inside “weather” enhanced by the structural uncertainty in the model mean trends is shown to the right of the uncertainty intervals I circled above. That is: this is the spread of all “weather in all models”. Note that spread is larger than the spread of “weather” in a typical model and the bottom of that trace just grazes 0C/century and HadCrut4 lies inside this trace.
It is that spread grazing zero that justified the claim of “15 years” in the papers that made that claim.
So what does the earth trend falling outside the spread of “weather” but inside the spread of “weather + structural uncertainty” mean? It means
a) if we estimate weather based on models, that HadCrut4 trend is outside the range consistent with the multi-model mean trend but
b) the earth trend is inside the full range of weather enhanced by the spread due to structural uncertainty.
How can this happen? Some models show less warming than the multi-model mean; others show more warming than the multi-model mean. Given this situation, it is possible for an earth trend to fall inside the range of weather consistent with “all weather in all models” because they fall inside the spread of weather associated with models that are– on average– warming more slowly than the multi-model mean.
Now, before I end this, I’d like to observe a few things:
- In this particular case, 15 years is a cherry pick. We know that 1998 was one of the largest El Ninos seen in a long, long, long time. So, in some sense, we know that’s not a “good” year to really start. I’m not going to show results over a range of start years– if I did, I won’t reject the multi-model mean if I pick a large variety of years. (It’s close– but no cigar.)
Rather than show a bunch of different start years, I am extending my script to create a method that is more “robust” against cherry picking. That is: It will compare the deviation in absolute anomalies along with the deviation in trends. That metric is harder to cherry pick because if you pick the start date in a “warm” year, the trend is low, but the mean difference is high. In contrast, if you pick the start date in a “cool” year, the trend is high, but the mean is low. I’d show that, but it requires more coding. I’ll be showing this a bit later this week. I don’t know if using the “new” method will result in a reject or accept since 1997. But since cherry picking is rampant (on all sides), I want to budget time on creating a “harder to cherry pick method”.
- My focusing on Rose’s questions means I’ve failed to slam him for “global warming stopped”. There is no evidence global warming stopped in any meaningful sense.
First: The 15 year trend is positive. That’s “slow warming” but not “stopped”
Second: we know the trend over the past 50 years or century are both positive. Rose says so himself. And we know temperatures are variable. So, there is ample evidence that we have been in a warming trend; it should take quite a bit of cooling to decree we have evidence it actually stopped. Specifically, if we are estimating weather using the “model centric” method I use here, if we took the uncertainty intervals from models and placed them around ‘0’, we need to see a 188-month trend of approximately -0.13 C/decade before we can reject a hypothesis that warming is positive. That level is the ±95% window (one sided) that would correspond to a forced trend of 0C/decade. A +0.04C/dec is certainly well above -0.13C/dec.
To be sure, it may be that when he wrote “stopped” Rose meant “paused” or “slowed”. But generally, “X stopped” gives the impression X is not about to resume at any moment. When applied to random variables, it gives the impression one believes the expected value for X has “stopped” being what it was before. The fact is: there is zero evidence that we should expect the trend in the upcoming 15 years to not be positive. Even without any consideration of models — merely resorting to extrapolation– there is every reason to expect a positive trend over the upcoming 15 years.
This was pretty rambling. But the highlights:
1) If picking 1997 was “fair”, the HadCrut4 trend falls outside the ±95% confidence window defined by the multi-model mean and “typical model weather.
2) This suggests the multi-model mean is warmer than consistent with earth weather.
3) The earth trend falls inside the ±95% spread of “weather + structural uncertainty” of models. This suggests some individual models are not too warm.
4) The Met office conflates (2) and (3) in a misleading way. To be fair, I think they likely aren’t doing it on purpose. They are collectively a bit confused and they have mislead themselves. Nevertheless. both (2) and (3) are true.
5) There is no evidence “global warming stopped”– or at least there isn’t any evidence if “stopped” is defined as “stopped and we should not expect it to resume”.
6) There’s a good chance Met office is going to continue to have difficulty defending the model mean. After all, if the anticipated El Nino is either weak or non-existent, next year Rose may well announce that we’ve had 16 years 0C/dec!
7) It really would help if they “discover” that the multi-model trend is biased high and stop trying to suggest the discrepancy can be explained by “weather” (or worse- volcanoes!)
It will compare the deviation in absolute anomalies along with the deviation in trends.
========
Why not compare cooling rates against warming rates?
After all, if they are similar, then you can’t claim rapid warming. It looks like natural variation, rather than anthropogenic.
If it warms at 0.05C over 10 years, but has cooled more rapidly, you can’t claim anything unusual. Particularly because the GW claim has to be that cooling is not as rapid as warming. Think natural variation overlaid over a trend. Warming is more rapid than cooling.
Nick–
I don’t know what “comparing cooling rates against warming rates” means. Currently, I use a test that is kinda-sorta similar to test with a ‘z’ that is
z=dif/s.d. with “dif” the difference in trends and the ‘s.d.’ an estimate of the standard deviation of trends I expect. I compare the “z'” relavive to the spread I expect for “zs”. (These are dimensionless.)
I’m going to create one for trends and one for anomalies.
So z_test= average( z_trend, z_anomalies) will be compared. This is a bit robust to cherry picking. I think it retains power. Some other methods of using both anomalies and trends don’t retain power — (or at least I think they don’t.) So… less good.
For HADCRUT4 I get .033C / decade for 180 months of data using monthly anomalies.
HADCRUT3 is -.016C/Decade for the same 180 months.
For 11 years (132 months) HADCRUT4 is -.044C / decade.
As I’ve said before, 1980 to 1998 versus 2000 or so to 2012 has one variable that explains it (there may be many).
Man made SO2 dropped one Pinatubos worth from 1980 to around 2000. From 2000 to 2012 SO2 increased in the atmosphere.
The drop should have caused .5C of warming and rise should have caused some cooling.
The earth heats up +0.05C/dec?? Well…butter my buns and call me a biscuit, who carez?? Only instruments are able to notice the difference…geezz
If I wanted to say global warming stopped in 1998, over what time frame should I calculate a trend to test this?
The problem I have with the trend calculation is that a warmer 1999 and 2000 makes the trend lower, while cooler early years makes the trend higher, exactly the opposite of what I would expect to make my argument.
“a warmer 1999 and 2000 makes the trend lower”
The coolest two years from 1997 to 2012 are 1999 and 2000.
1997 0.390
1998 0.523
1999 0.298
2000 0.291
2001 0.433
2002 0.485
2003 0.496
2004 0.438
2005 0.534
2006 0.491
2007 0.478
2008 0.383
2009 0.489
2010 0.540
2011 0.399
2012 0.419
For amusement, 1944 was 0.153.
And 1878 was 0.032.
MikeN–
Is that question rhetorical? I’ll assume it’s not. You’d start computation in 1998. As it’s currently 2012, based on the present trend, if you compared to spread of trends expected, you would conclude you have no basis to make that claim.
Cooler years at the beginning of the period, make computed trends lower. Cooler years at the end make it higher. That’s the way trends work.
If this contradicts “your argument”, I can’t begin to guess what your argument might be.
I agree that the Met Office media blog could have answered the questions more directly. Especially the first; the trend just hasn’t been zero (not even close).
Rose’s original article was about Hadcrut 4 (though he seemed to have difficulty saying (or understanding) what it was about). There wasn’t much about models at all, though they are referred to in his second question.
Could a cold year make a zero trend? My rough calc says the anomaly would have to be about 0.22°C. Not impossible, but coldest since 1996 (next, 2000 was 0.29°C). 1996 was 0.18°C – you can see why they pick 1997.
Nick
How cold? And zero trend in models or earth?
Anyway, either way, if we pick Jan 1997 as a start year and use “weather” as that typical of models, the trend since 1997 is outside the 95%. But… 1997 is a “pick”.
But that said: the 15 year quote is “out there”. And even if it’s ENSO adjusted– that was their stupidity to “ENSO adjust”. Because someone is just going to ENSO adjust HadCrut… and….
Lucia,
“Because someone is just going to ENSO adjust HadCrut… and….”
Someone did!. Results are here. And up it goes.
How cold? How cold would Hadcrut 4 have to be for next 12 months to make a zero trend since 1997? About 0.22°C.
Nick #104948,
I trust (hope?) those links to Tamino’s… ahem… work were intended to be humouous. ‘Adjusting’ for ENSO since 1979 in a physically reasonable way makes virtually no difference in the overall trend. A reasonable adjustment for volcanoes drops the trend since 1979 by about 15%.
Nick Stokes (Comment #104948) —
Does the source you link “call ’em fairly”? Or does he have a history of reporting findings he likes, whilst disregarding analogous findings that displease him?
Mathematical prowess and trustworthiness are two distinct attributes.
Sad that this is the state of affairs, at least as seen by some observers. But there it is.
Nick—
You do realize I think Tamino cheated on his error bars? And…. also…Even Tamino can’t ENSO correct 2012 values in Dec 2011. He can’t ENSO correct 2013 values in 2011.
Tamino’s adjustments are fairly straightforward. There’s no great influence of volcanoes since 1996. And yes, it’s open code and peer-reviewed 🙂
Full picture of trends and significance here. 0.1268 °C/decade from 1997 to end 2010. And fairly steady, so significance is not borderline and 2011 won’t make a big difference.
FWIW: I always thought the “State of Climate” paper puzzling.
a) used 1 model but with different parameterizations. Runs’ weren’t AR4 case.
b) text suggests different runs would come from parameterizations with different means. So… that “weather” variability still isn’t teased out from “structural+weather’ variability.
c) I don’t have any clue whether that model has high or low variability of “n-month” trends relative to typical for AR4 models.
d) ENSO corrected… how? Does that model have an ENSO? How much of that models n-month variability “explained” by ENSO? And… well.. how? Did they fish out pressure over DArwin? Create an MEI? Or what? How does MEI-GMT correlation in those models compare to earth MEI-GMT correlation?
Who knows?
It was all very sketchy– as one might expect for a bunch of papers “invited” from NOAA/MET office by BAMS. Similar things happen with “invited” contributions from conferences– but “invited” from conferences at least has conferences rejecting some people wantint to present at conference. In this case, agencies write sketchy versions of stuff that should just be agency reports. Reviewers give a cursory pass. And BAM! It’s all in BAMS. 🙂
I don’t really think it’s worth trying to track down the answer to all the questions. We can just wait for the earth weather to skid ways from the model mean. As. It. Has. Done. since that paper was published.
Nick, there are two reasons why HADCRUT4 has a positive trend from 180 months ago.
1) 1999 and 2000 were cold.
2) They added .218C onto Jan 2007 from HADCRUT3.
H4 2007/01 0.818
H3 2007 01 0.610
His error bars may be peer reviewed and the code available, but they are still based on a cherry pick that doesn’t hold up if you don’t cherry pick sd ar, ma by selecting start year pre-volcanic eruption. And that makes his estimate for uncertainty now wrong.
I think you know that was discussed.
Also: Weather estimate in my graphs is based on AR4 models. ENSO correction not required.
Whether or not Tamino’s ENSO etc. correction is “right” (to whatever extent an ENSO correction can be “right”, it’s unlikely to match “state of the climate” method — which is not specified. You need to use their method to put trends in context of their spread. Otherwise… apples to oranges.
Oh… I clicked a bit more. I see that your graphs end in 2010 (?). Also, rereading, I assume by “significance is not borderline” I assume you mean to say that “the trend is clearly < 0.2C/decade and addition of 2011 (which already occurred) and 8 months of 2012 (which already occurred) is not going to change our “reject hypothesis models are right”.
If so, I agree with you. You show the models are clearly too warm and adding a year or two isn’t going to make any difference to that finding.
That’s what you mean… right?
Lucia,
The bottom plot shows upper significance levels. It’s slightly confusingly titled, but the upper limit is 0.208 °C/decade. The color is a clearer indicator than the writing. That’s the OLS estimate.
Lucia
“You need to use their method to put trends in context of their spread.”
Lucia,
If you’re talking about the Peterson report, that’s what they were doing. Here’s the para of which WUWT quoted just the last part:
“Ten of these simulations have a steady long-term rate of warming between 0.15°and 0.25ºC decade–1, close to the expected rate of 0.2ºC decade–1. ENSO-adjusted warming in the three surface temperature datasets over the last 2–25 yr continually lies within the 90% range of all similar-length ENSO-adjusted temperature changes in these simulations (Fig. 2.8b). Near-zero and even negative trends are common for intervals of a decade or less in the simulations, due to the model’s internal climate variability. The simulations rule out (at the 95% level) zero trends for intervals of 15 yr or more, suggesting that an observed absence of warming of this duration is needed to create a discrepancy.”
Nick–
What’s your point?
On the spread issues:
The paper says they use models with a range of parameterizations and also says:
“Ten of these simulations have a steady long-term rate of warming between 0.15° and 0.25ºC decade–1, close to the expected rate of 0.2ºC decade–1.
That appears to be a difference in the trend of multi-model meansmean for the ensemble. Your quote merely reitterates the fact taht there is a spread in the trend based on structural differences in models.
That’s the spread I mean when I write “You need to use their method to put trends in context of their spread.”
Of course if you have a model with a trend of 0.15C/dec we’ll see 0C/dec in runs more frequently that if we have a model with a trend of 0.2C/dec or 0.25C/dec.
On the ENSO correction issue (which is what I was referring to in the bit you quoted):
Their spread in model trends is supposedly enso corrected. The details of ENSO correction is not described. You seem to be suggesting that TAMINO’s is somehow suitable. But you need to use peterson’s method of ENSO correcting to compare to Peterson’s claimed spread.
If you can find the details of how Peterson did their ENSO correction. Tell us where to find it. Or code it into R and so we can see how it applies do data. And while you are add it– apply it to the model simulations they used so we can see those.
Until you can do that, I stand by
Stuff in [] added to clarify “their” in context.
The bottom plot shows upper significance levels.
Significance for the test of which null hypothesis based on what assumptions?
Lucia,
“Significance for the test of which null hypothesis based on what assumptions?”
In the first plot I show the estimated trends. In the second I show the lower 95% OLS CI for that estimate. In the third I show the upper.
Estimated trends based on what assumptions? Did you use Peterson’s ENSO correction?
CI intervals based on what assumptions?
Nick–
Clicking is a bit confusing
Is the trend through Feb 2016 -0.5470C/century? Maybe you should update to avoid confusing people.
Nick–
I’m finding your fancy stuff frustrating. I don’t give a hoot about trend ending any time other than “now”. I find it very difficult to click to get end dates and start dates I am interested in. I also find it very difficult to tell whether an end date I chose is eligible. (Obviously, you do not have data from feb 2016 even if that script tells me some trend.) I don’t think any choices let me pick ending in 2012.
I also don’t know if you used AR correction (which the text says) or ARIMA (which Tamio used in his paper.) And, I unless I download and try to run Tamino’s code I can’t discover what the actual fiddle factors for his “correction” would have been with the data you have– much less how they might change for now.
But regardless– it seems to me unlikely that your ENSO correction is Petersons and that’s the relevant correction for comparing to Petersons claimed “15 years”.
I’m also puzzled… why do your “Tamino” results for UAH seem to not match these:

Example: I clicked “UAH”. Your upper ci seems to be 0.220. But Tamino’s paper clearly shows it below 0.2.
Temperature Anomaly trend
Jan 1980 to Nov 2010
Rate: 1.3953°C/Century;
From -0.211°C to 0.220°C
For GISS:
Temperature Anomaly trend
Jul 1979 to Dec 2010
Rate: 1.6795°C/Century;
From -0.260°C to 0.269°C
Are you saying the warming since 1979 is not statistically signficant?! I’m mystified.
Once more into the breach….
The center graph (GISS):
Temperature Anomaly trend
Jan 1997 to Dec 2010
Rate: 2.0985°C/Century;
From 0.008°C to 0.301°C
The lower graph
Temperature Anomaly trend
Feb 1997 to Dec 2010
Rate: 2.1037°C/Century;
From 0.009°C to 0.302°C
(Try as I may, I can’t get Jan 1997),
The upper graph
emperature Anomaly trend
Jan 1997 to Dec 2010
Rate: 2.0985°C/Century;
From 0.008°C to 0.301°C
These all seem to be equal. That doesn’t match Tamino at all. What gives? How do I find your estimates of the uncertainty intervals using your method.
Warming of the atmosphere has slowed in the past decade or so – that much is obvious to the untrained eye. Global warming on the other hand is unabated as the ocean (0-2000 m) continues to accumulate heat: http://imageshack.us/photo/my-images/827/ohc02000pentadalnodcnos.jpg/ , and land ice continues to melt: http://imageshack.us/photo/my-images/703/greenlandgraceicemelt20.jpg/ , http://imageshack.us/photo/my-images/36/mbmlossoflandice2011.jpg/. The atmospheric response will ultimately have its say.
Lucia,
“I find it very difficult to click to get end dates and start dates I am interested in. I also find it very difficult to tell whether an end date I chose is eligible.”
The best way is the linked plot on the right. You can click on the red line to get the start date you want. You can also use the red nudger, bottom right. Using the controls in the middle of the nudger shifts a month at a time.
Lucia,
“These all seem to be equal. That doesn’t match Tamino at all. What gives? How do I find your estimates of the uncertainty intervals using your method.”
Yes, there are some things not quite right in the written statements. The first plot is right – it gives the trend. The line below isn’t the CI but the before/after change.
The second plot gives the lower CI as reflected by the colors. I had intended that it should print the numerical value, but it actually prints the trend (I’ll fix). At least that corresponds to what the heading says it is doing.
The third plot (upper CI) also has the correct colors, and does do what I intended (prints the numerical value of CI). But I should have a more appropriate heading.
Lucia,
On the significance test used, as said in the text
“.The significance test was as in the previous post – using the Quenouille (AR1) adjustment for correlation reducing dof.”
It isn’t the same as Tamino’s, from memory. He was using an “almost AR(2)” Quenouille style adjustment. I say more about that here.
The GISS calc does seem odd. I’ll check. I was quoting from the CRU choice, which made sense.
Lucia,
The printout mechanism seems to be broken. It just seems to print the trend from the first choice you make. It updates the plots correctly. I think my statement of the upper CI for CRU above was correct – it seems to match the color. But I’ll check. Apologies for that. Hope to have it fixed tomorrow.
It looks like models 7 and 9 show a fair bit of skill. Or am I misinterpreting this?
“Global warming” is a meaningless phrase and as such it is just a political campaign based on a false belief about the feedback effects of CO2. Indeed, The warming it was based on is a psychological phenomena much like the butterfly ink blots … people read into meaningless climate noise what they want to see and it really has nothing to do with science (unless you count psychology as science).
Lucia,
I’ve checked the code and see what is happening. The main thing is that the JS code in this version does not calculate CI’s interactively. What is printed is what the headings say – just the trends and the endpoint values. The CI’s were calculated in the R progran which made the colorful triangle png’s. So the plots show them correctly, but the only information is in the colors. I was confusing it with the most recent version of the gadget which does calculate and print CI’s (with headings) – but I hadn’t put that into the Tamino post.
There is a glitch in that changing the dataset doesn’t trigger a recalc of the trend immediately. But whenever you change the endpoints, the correct trend for that dataset is calculated.
That tricked me in #109458 when I said that the upper limit was .208. In fact, I had switched from GISS and read the trend from Dec 1996. The color seemed about right, but I see on rechecking that it’s only about 0.175 °C/decade.
Lucia,
Sorry about all the messages. But I think everything is upgraded now. CI’s are calculated interactively and the trend now updates when you change data. The output has better headings, and reads, for cru Jan 97 to Dec 2010:
Temperature Anomaly trend
Jan 1997 to Dec 2010
Rate: 1.268°C/Century;
CI from 0.760 to 1.776;
Temperature range from 0.075°C to 0.252°C
CI means 95% limits on trend estimate, using Quenouille (AR1).
Nick–
Don’t worry about all the messages. I was asleep anyway. 🙂
He uses ARIMA(1,0,1) with his own idiosyncratic method of estimating ar,ma, and standard innovation of residuals based on…. some fixed early year. (In his posts it 1975 for GISTemp, but in the paper it can’t be for UAH.) Aspects of what he did are clarified by his code; others not so much. It doesn’t seem the be “all” the code– that is you can’t just run and create every single graph. The one with the uncertainty bounds for each year is a bit of a mystery.
On your things: ‘ll go back and look at your post triangles etc. As for picking dates, I understand the concept. But I just have a very, very difficult time picking the precise start and end dates I want. One of the “right left” thingies would move both the red and blue dot. The other moves one– but in neither case did it move only 1 month. And I can’t seem to get the resolution to hit the month I want with my mouse. So… I ended up with a “triple” starting in Jan/Feb/Jan.
I know the triangles and the move on mouse-click is sort of whiz-bang, I think the sort of entry forms with submits at “Wood for Trees” is easier for a user who is interested in a specific range immediately– as one might wish if they were comparing what one person got to what another person got as a result for a particular year.
Overall though…. I don’t know that you should invest a lot of time updating old post providing old Tamino results. Most people interested in those would rather just know things like the correction coefficients for MEI, lags etc. That way they could stuff them in and recompute to compare to somethign else.
But it was confusing that the numbers on your scripts weren’t matching Tamino’s.
Scott
The might. To really know you’d probably need to repeat over a number of forecast periods. This isn’t quite possible with with climate– so modelers substitute multiple hindcast periods– which isn’t quite the same.
Lucia
Thanks for the discussion.
I thought that your graphs were showing that: “multi-model trend is biased higher” than the observed temperature trends.
Please clarify “multi-model trend is biased low”.
Models are biased high.
Lucia, small suggestion. Could you make a comma-separated-value (csv) spreadsheet of which model names correspond to which number and link it?
A fun test would be a scatter plot of model sensitivity against Z-score (difference between model and data divided by 1-sigma error).
Lucia,
There’s a system with the nudgers (arrow things). Red moves red, blue moves blue. There are 4 arrows in each direction. The centre ones move by one month, next outward by 2, next by 4 etc.
The double mover lets you keep a range of constant length, or of constant centre (to get trends about a point).
As a check I ran arima on cru 1997-2010
arima(x = y, order = c(1, 0, 0), xreg = x)
ar1 intercept x
0.4860 90.1687 1.2571
s.e. 0.0677 24.5049 0.2508
It wasn’t a problem updating – I already had the code for a more recent post.
Carrick– It’s a good suggesting. I’m just trying to figure out how to do the analyses I want and make the script also spit out all useful information as I update.
What I’d like to do is put the modelnames sideways on the axis. That way I don’t have to remember to a) upload a file and b) hypelink in each post. Also, it will automatically add when I throw in the A2 models– which I eventually want to do. Do you know how to do that in R? I’m using “axis(side=1, at=1:10, labels=modelNames, cex=0.1)” but when the names are long (as in more than 2 letters long), R just auto-skips. So… I’d like to turn the text sideway. I’ll try “hadj” and “padj” to see what I get.
Right now, I’m trying to extend the test to include both mean and trend. That will be a “harder to cherry pick method of equal power to trend only method”. So.. with luck, the axis labels will appear on the graph for that.
For now, the order is.
modelNames
[1] “period” “ncar_pcm1” “ncar_ccsm3_0” “giss_model_e_r”
[5] “giss_model_e_h” “giss_aom” “cccma_cgcm3_1_t47” “echo_g”
[9] “miroc3_2_medres” “mri_cgcm2_3_2a” “mpi_echam5” “iap_fgoals”
So ncar_pcm1=1.
Do you know the climate sensitivities? I could just add a file and create add that info to the axis label (once I figure out how to turn the names sideways.)
SE=0.2508 is this se estimated for trend after correction for ENSO? in C/Century? So +2 sigma would be 1.2571 + .5016 = 1.7587? < 2/century? Is that what you get? (But AR1–which we can both agree is too small. I just want to know if that’s what you get.)
I suppose I could read your javascript–but
a) are you pulling MEI? Which source?
b) What’s the lag and coefficient for MEI? I could get around to replicating near the end end of the week to compare what I get for trend and arima(1,0,1) doing arima(1,0,1) using my montecarlo method rather than Taminos “pick a year to freeze coefficients” method.
I do want to use the my not fully explained (average of dif/sigma for means and trends) — I think it’s got an advantage than having slightly different ENSO corrections in various papers (i.e different lag, choice of MEI, ONI, etc.). So I’m coding my idea. But there will be no harm in ENSO correcting temperatures even within that idea. Especially since Peterson’s 15 years is “ENSO corrected”. We just don’t know the nuts and bolts of how he did that.
In pooled test…. getting result i don’t believe….. I better make some “reality check” plots and so on!!
whew… found the bug. That bug was going to make David Rose very happy. 🙂
Re: Owen (Oct 16 21:10),
The 0-2000m data has been discussed here and it’s highly suspect. Ocean heat content is not increasing exponentially as your graph indicates. Look at the 0-700m data and then see if you really believe that the heat content from 700-2000m could really increase that much faster than 0-700m heat content. See this graph for example. There are other anomalies in the ocean heat content data as well. This graph illustrates one problem.
and curiously the amount of co2 increase the last 15 years has been the most by far, and that’s when we see this flattening? or is flattening not the political correct term?
MrE:
Be careful. Response times aren’t immediate so two things that happen at the same time aren’t necessarily directly related. What you’d need to do is look at the emissions over a certain period in the past.
Lucia,
The javascript just uses output from Tamino’s R program, as graphed here. It doesn’t deal with MEI.
“So +2 sigma would be 1.2571 + .5016 = 1.7587? < 2/century? Is that what you get”
Very nearly:
Rate: 1.268°C/Century;
CI from 0.760 to 1.776;
By the way, one thing I find interesting is not only has CO2 been increasing faster than before, but methane started increasing again. About the same time temperatures flattened out, atmospheric methane levels flattened out. About ten years later, they started increasing again.
That means all the major greenhouse gases are increasing now (NO2 has risen regularly), not just CO2.
Nick–
Ok. I was hoping for the MEI etc. so that one could update using the correlation he got. I realize one should in principle, redo the correlation. But it would be fast to just update using the MEI — just to see.
The other thing that might be interesting is to write to the people who enso corrected in “State of the Climate”, get the 10 simulation runs both pre-enso corrected and post enso corrected and compare the spreads. but… not…today…
(Today, I’m trying to figure out if I’m getting much stronger “rejections” of models withe the pooled d*s because that’s what you get or if I’m computing something wrong.)
Models can’t get ENSO right, and some models don’t even bother. Therefore, “weather” derived from model simulations can not properly account for ENSO. So if you’re going to compare models to observations, ENSO correction seems very much required.
You don’t need the error bars to address David Rose’s statement. The not-stopping trend alone means that, at the very least, you can’t say that global warming has stopped.
Lucia:
I don’t offhand but I was planning on backing those out of the given temperature response and assumed forcings.
Lucia, a very fair analysis, as ever.
Me, I would have just gone:
1) 0.05C/dec. Pfui!
2) The met office says this is “not unexpected”. By whom? When did they say that in the 1990s? Pure historical revisionism.
3) A Venn diagram of Tamino and reality would not overlap at all.
btw, Nick Stokes (Comment #104952): “peer-reviewed :)”. The perfect way to describe most climate papers…
But then I’m not very fair about all this nowadays! 🙂
I do not like interjecting comments that seem off topic – even though in this case the paper I refer to here has been mentioned – but in this case I need a sanity check.
From the Grant Foster and Stefan Rahmstorf paper linked below it states the following:
“The influence of exogenous factors can have a delayed effect on global temperature. Therefore for each of the three factors we tested all lag values from 0 to 24 months, then selected the lag values which gave the best fit to the data.”
http://iopscience.iop.org/1748-9326/6/4/044022/fulltext/
That seems almost unbelievable to me that you can get away with fitting data like that in a peer reviewed paper. If one were so disposed to do that fitting at least would not reviewers/editors want to see sensitivity tests on what happens with other selections. I could not find any such tests and I am therefore going to do some of my own. To compound this problem I see a table which leads me to believe that the lag times were different for the various temperature data sets. I can see perhaps a difference between the surface and the troposphere sets, but not within each of these two groups.
Warming (°C/decade) Lag (months)
Raw Adjusted MEI AOD TSI
GISS 0.167(25) 0.171(16) 4 7 1
NCDC 0.162(22) 0.175(12) 2 5 1
HadCRU0.156(25) 0.170(12) 3 6 1
RSS 0.149(40) 0.157(13) 5 5 0
UAH 0.141(44) 0.141(15) 5 6 0
Is this a case once again of not acknowledging the differences between in-sample and out-of-sample testing or am I missing something important here?
Kenneth–
I also have no idea how one ought to quantify the effect of hunting for the ‘optimal’ lag on the estimate of the uncertainty in the residuals. Here we have different lags for AOD for GISS/NCDC and HadCrut. Really?
For that matter, I couldn’t really tell whether Tamino just “corrected” and then computed his uncertainty based on the residuals of the corrected time series– ignoring the uncertainty in the coefficients used to account for MEI, AOD and TSI. That’s not an operation Nick Stokes would ever let me get away with. 🙂 (In fact, it wouldn’t be right. It shouldn’t be done. Even if co-linearity is small, it should be accounted for when estimating the error in the determination of the trend base on the remaining residuals..)
But then, I guess it’s not that surprising I didn’t find that. I didn’t really look. After all: The “main findings” are:
1) If you do a massive curve fit, to explain some of the “noise” you reduce the amount of noise in the series.
2) After you do a massive curve fit, the trends are more or less what they were before the massive curve fit.
3) The ends don’t look as flat anymore.
What’s really to dig into and rebut? If the procedure works the ends will stop flattening, right? (Heck, that could happen even if the procedure doesn’t work. 🙂 )
My main puzzlement is why the reviewers didn’t say, “Don’t publish because nothing interesting.”
Lucia, Kenneth,
“I also have no idea how one ought to quantify the effect of hunting for the ‘optimal’ lag on the estimate of the uncertainty in the residuals.”
In effect, the lags are regression coefficients in the model. Their effect is non-linear, but if you linearize, you’ll just get a larger
covariance matrix between regression parameters. Of course, turning that into CI’s is not straightforward. And I don’t at the moment know whether Tamino did it properly.
Re: Brandon Shollenberger (Oct 17 11:59),
It’s N2O, nitrous oxide, not NO2, nitrogen dioxide. Nitrous oxide is relatively stable and not particularly soluble in water so it’s not rapidly swept out of the atmosphere and therefore qualifies as a well-mixed ghg. Nitrogen dioxide reacts with water and oxygen to form nitric acid, so it’s highly soluble and doesn’t stick around or travel far from its sources like lightning bolts and internal combustion engines without catalytic converters. In fact, one usually gets mixtures of NO and NO2 from these sources which are collectively referred to as NOx. NOx is one of the key components required for the formation of photochemical smog along with sunlight and branched chain hydrocarbons. The dimer of nitrogen dioxide, N2O4 is used as the oxidizer in some liquid fueled rockets like the thrusters on the Space Shuttle. It’s used with a fuel like unsymmetrical dimethylhydrazine or monomethylhydrazine because they react on contact (are hypergolic) so the rocket engine doesn’t require an ignition source.
DeWitt Payne, that’s a lot of effort put into pointing out a typo. I like it!
Nick–
There is a discussion of ARMA11 uncertainty intervals in their appendix. I don’t see anything about the effect of uncertainty in determination of the coefficients in the multiple regression. Dealing with the ‘non-lag’ type ones– like the magnitude of the multiplier on lagged-MEI I know how to do. But I really don’t know how the fact that you could chose between a range of lag values is dealt with. Presumably someone knows. It also ought to affect any assessement of whether the correlation between R=E[MEI(lag_i)* GMT] is statistically significant– after all, if you get to try a half dozen lag values the chances that one will look significant when it’s not will rise. (Mind you, I think MEI is well established as having a correlation. But, still, there has to be some statistical issue there. )
So what the Foster paper is all about is being able to say, that by fitting the lag between the 3 exogenic variables and temperature and doing it separately for the 5 temperature data sets, and further doing it on exclusively in-sample data, that instead of the lower 2 sigma CI limit generated by taking trends, starting in the 1975 to 2010 and incrementing the start date forward by one year, hitting 0 trend between 1994-1995, the model generated by this fitting will extend the trend lower CI limit hitting zero beyond the year 2000.
Certainly given the state of climate science and its view of accepting in-sample testing on the same level of importance as out-of-sample testing, I would suppose that the results of any curve fitting of this nature and extent should be reported with the hopes that it will motivate others to work this magic on other topics.
lucia (Comment #104982)
“Aspects of what he did are clarified by his code; others not so much. It doesn’t seem the be “all†the code– that is you can’t just run and create every single graph. The one with the uncertainty bounds for each year is a bit of a mystery.”
Right, the code for figure 6 is not in the code provided by Tamino.
Here is my attempt at replicating that figure:
http://i45.tinypic.com/1z1cvmt.png
I use the adjusted data for this plot and nu (the effective of degrees of freedom pr. datapoint ) estimated from the residuals from the multiple regression. I think it looks rather close to figure 6 in F&R but please remember that this is my interpretation, it is not an established fact that F&R made the figure in the same way.
The code to make the figure is here. Insert this code below Taminos in allfit.r, run it all (if you run it all, and the plot should appear:
### replicate figure 6, forensic coding by SRJ
# define a function for doing the regressions and plot
trendplot <- function(time, data, nu, color) {
temp <- data; last <- length(temp)
timetoplot <- time[seq(1,313,12)]
start.ndxs <- sapply(timetoplot , function(x){findstart(zgiss$t,x) } )
list <- sapply(start.ndxs, function(x) {summary(lm(temp[x:last] ~ time[x:last]))$coef[,1:2] } )
# trends = list[,2], se.trend =list[,4]
CI <- 2*sqrt(nu)*list[4,]
plot(timetoplot ,list[2,], type="b",lwd=1, col=color,ylab="Rate/year (K)",
xaxt="n",yaxt="n", xlab="Start Year",ylim=c(0,0.04) ); abline(h=0.02,lty=2)
arrows(timetoplot,y1=list[2,]+CI, y0=list[2,]-CI,length=.04,
angle = 90,code = 3, col=color, lwd="2" );
}
## get the adjusted data and apply above function in a loop
# columns with adjusted data
get <- c(2,6,10,14,18)
# collect effective degrees of freedom pr datapoint
nu <- c(nu.giss, nu.ncdc,nu.cru, nu.rss,nu.uah)
# define colors to match fig 6 in F&R 2011
color <- c("blue","green","purple","red","yellow")
# plot like tamino, R code from here: http://stackoverflow.com/questions/5479822/plotting-4-curves-in-a-single-plot-with-3-y-axes-in-r
#Set up the plot area so that multiple graphs can be crammed together
par(pty="m", plt=c(0.1, 1, 0, 1), omd=c(0.1,0.9,0.1,0.9))
#Set the area up for 5 plots
par(mfrow = c(5, 1))
# apply function to the relevant columns with adjusted data
for (i in 1:5){ trendplot(zadj$t, zadj[,get[i]], nu[i], color[i])
if (i %in% c(1,3,5)) {axis(2, at=c(0, 0.02,0.04), labels=TRUE) }
else {axis(4, at=pretty(c(0, 0.02,0.04)), labels=TRUE) }
}
axis(1, at=pretty(seq(1980,2005,5)), labels=TRUE)
“(Mind you, I think MEI is well established as having a correlation. But, still, there has to be some statistical issue there. )”
Just a glimpse at a Bonferroni correction here http://en.wikipedia.org/wiki/Bonferroni_correction would infer some major changes to the CIs given that the Foster paper selects from the combination generated by 0 to 20 lags for 3 independent variables and 5 dependent ones. Bonferroroni adjustments actually should correct for other selections like why was the limit of 20 used and why were the particular data sets for the independent variables selected. Just looking at the number of trio wise comparisons of 20 lags (including 0 lag) on 3 variables is, I think, 21!/(3!*(21-3)!) which equals total combinations of 1140. Based on those combinations a CI limit, or better a hypothesis probability of 0.05, becomes 1- (1-0.05)^1140 which for all practical purposes =1. Given an understanding of the Bonferroni correction the authors may well have limited their selection criteria to, let us say, 9 lags and then the correction would be for 10!/(3!*10-3)!) combinations or 120 and the probability is reduced to 1-(1-0.05)^120 or 0.998. Alternatively the CIs could be increased to compensate for combinations which in this case would be 0.05/120 and 1-0.05/120 or 0.00042 and 0.9996.
Lucia,
“But I really don’t know how the fact that you could chose between a range of lag values is dealt with. Presumably someone knows.”
I don’t know how it is done in F&R. But here’s how I think it could be done. You form a sum of squares of residuals from a stochastic model. You can write this as
F(a,e)
where a are the parameters (anything you can vary, including lag) and e is or are the random parts. The data etc are in there too, but don’t change during the minimise.
Then you differentiate wrt a, saying that, lacking info, e=0. That gives the usual trend equations
F_a=0.
The lag is in there as part of the minimisation. Since it is discrete, you have to difference instead of differentiating.
Then to get the uncertainty of parameters a, we say, well, e mightn’t be zero. So the change in a would be:
F_aa δa+F_ae*e=0
and then put in the sd of e to get the sd of a. F_aa and F_ae are second derivatives/differences
That’s what you do for linear regression, but works for non-linear too, and for delay, again differencing where variables are discrete. Including delay means basically twice as many parameters a, and a more complex F_aa, the inverse of which becomes the covariance matrix.
Kenneth has a point that it doesn’t seem right to have different lags for different indices. But you could say the same for the other regression coefficients. Since they do separate regressions for each index, there’s no way to constrain the coefficients to have common values. To do that you’d have to do a big joint regression (I’m sure no-one does); to do that you’d have to model the causes of difference between the indices.
Nick Stokes (Comment #105013)
That is real cute. Nick. You put your selections for lags in the form of the model and then let the model do the (over)fitting for you. If you did that for 20 possible lag values and on three variables I would guess that you would have lag as 20 different factor values for each independent variable. I would just love to see a model like that laid out on paper.
“Kenneth has a point that it doesn’t seem right to have different lags for different indices. But you could say the same for the other regression coefficients. Since they do separate regressions for each index, there’s no way to constrain the coefficients to have common values. To do that you’d have to do a big joint regression (I’m sure no-one does); to do that you’d have to model the causes of difference between the indices.”
You seem to be assuming that the authors of the Foster paper somehow modeled the lag as a factor variable with 20 potential discreet values for each of the 3 exogenic variables. I would guess that they maximized the fit by trying all 20 combinations for each variable.
And, of course, it makes absolutely no sense to have different lags for different temperature data sets (indices). You have to have some a prior reason for selecting the lag just as they did in choosing the exogenic variables. I suppose an even better fit could have been found if other exogenic or other variables could have been selected without bound.
Kenneth,
“I would guess that they maximized the fit by trying all 20 combinations for each variable.”
Yes, that’s what I’m suggesting they should do. I think it’s better to treat lag as a sampled continuous variable. There’s no difference in principle between varying the lag and varying the proportions in which the exogenous variables are mixed in the model, Nor is there a difference in principle between having those linear regression coefficients vary between indices and having the lags vary. They are all supposed to be physically significant quantities.
I haven’t exactly been following the discussion, but you do fit to the lag in the triangulation problem. It’s highly nonlinear and the full fledged version is very slow, and there are lots of technical issues relating to biases in the results.
This case is made complicated by the very coarse resolution of the reconstructions. I also know for a fact that assuming a simple lag between temperature and ENSO is in error.
Oh..I suspect there is a way to do this. It’s just not standard because it’s not done.
I’ve thought about it for trying to figure out things when we do have multiple replicate runs for a (model/period) combination. But…. after thinking about it, whatever I could dream up couldn’t be applied to earth so not worth the effort.
Kenneth, we see this in many paleo papers as well. For example, with Yamal, they first took the correlation values of the proxies to temperatures in different seasons and 5 day groupings, picked the best fit, and declared that the proxies represented temperatures during that time period.
>Cooler years at the beginning of the period, make computed trends lower. Cooler years at the end make it higher. That’s the way trends work.
If this contradicts “your argumentâ€, I can’t begin to guess what your argument might be.
My argument is that global warming stopped in 1998. I think intuitively, colder temperatures in 1999 and 2000 should strengthen my argument, and higher temperatures in 1999 and 2000 should weaken my argument.
Ultimately I am saying that the trend calculation is a flawed way of analyzing temperature changes.
Foster and Rahmstorf test lags ranging from 0 to 24 months.
Quoting from their second paragraph:
”
The influence of exogenous factors can have a delayed effect on global temperature. Therefore for each of the three factors we tested all lag values from 0 to 24 months, then selected the lag values which gave the best fit to the data”
If you look at the code, you will see that they choose the model with the lowest value of the Akaike Information Criterion, AIC.
http://en.wikipedia.org/wiki/Akaike_information_criterion
AIC does not tell you how well the model fits the data in absolute sence. It allows you too choose the model that gives the minimum information loss compared to using the “true” but unknown model.
About my comment #105011
There were some bugs in my code – the bugs were in the part of the code that prepares the data to use in the code snippet I previously posted.
Here is the updated (and hopefully correct) plot of my replication of F&R figure 6:
http://i46.tinypic.com/mukg3s.png
As I see it, the figure is made by doing simple linear regression on the adjusted datasets. That means that uncertainty bounds are larger than to those from the multiple regressions
ooops.
“Larger than” should be replaced with “smaller than” in my comment above (#105021)
SRJ–
Good! The smaller than made me blink. 🙂
Kenneth
Nick’s just doing what you do when you include a parameter in any model. You say y=mx+b, then you let the model find ‘m’ and ‘b’ that fit best (or over fit.) I don’t see that as a huge problem. It’s ok with AIC too– provided the lag is accounted for in the parameter count. It just seems that the uncertainty associated with the other regression parameter may not be included. (On the other hand, they may have computed ar, ma, sigma for the ARMA11 using a “frozen” period that gave values too high. But compensating sub-optimal choices just leave you with error bars that mean “what?”.)
Getting back to the Met Office/Rose issue and the 15 years issue– the only way Foster and Rahmstorf is interlaced is that F&R did an MEI correction and the BAMS paper (special bulletin of BAMS– “State of the Climate”) had the 15 year quote about the length of time we could expect to see 0C/dec and that was based on ENSO corrected models. But… I don’t know that their ENSO correction is “the same” as F&R– especially in the model. The paper is totally vague on what method they used to ENSO correct.)
Lucia
“SRJ–
“Larger thanâ€
Good! The smaller than made me blink. 🙂 ”
To be clear:
The standard errors calculated with lm(adjusted.temperature ~ time) on the adjusted data are smaller than the standard errors calculated in the multiple regression using time, enso, volc, and tsi as predictors.
E.g. for GISTEMP, the standard error of the trend in the adjusted data is: 0.013 K/decade
From the multiple regression F&R report a standard error of 0.016 K/decade (and this is what one gets from running their code).
Of course, simple linear regression on raw temperatures gives the largest standard errors on the trend of these 3 cases, for the case of GISTEMP it is 0.025 K/decade
The Foster & Rahmstorf paper disconnects physical plausibility from their analysis. There is no justification for wildly different lags for different indexes, and it makes no sense that the calculated solar cycle response is a) not lagged by ~1 year, and b) huge in size compared to what would reasonably be expected for a change in average surface flux of +/- ~0.085 watt/M^2 over the 11 year cycle.
.
The strong correlation that F&R “discover” is spurious. I calculated the 133 month centered rolling average for the entire Hadcrut4 series, and the 37 month centered moving average for the same series. Any solar cycle influence should zeroed out in the 11 year moving average, but should be clear in the 3 year average (though slightly reduced in magnitude). The difference between these two averages should show the influence of the solar cycle clearly. I then compared this difference to the 11-month centered rolling average historical sunspot data (as a proxy for the intensity variation due to the solar cycle). This graph shows the results: http://i48.tinypic.com/f3vvqw.png.
The correlation is (surprise!) strong and (shockingly!) without lag for the period “examined” by F&R… but seems to fall apart everywhere else. There is NO physically plausible way that the solar cycle drives a large temperature variation without lag. Curve fits are rubbish… as is the F&R paper. Correlation is not causation.
SRJ–
So you think they did widen for multiple regression?
Yes. I think it should if the coefficients in the multi-regression are statistically significant using any remotely reasonable criteria. (Using AICc ends up imposing this. The penalty for extra parameter kicks in. The parameter won’t ‘explain’ unless it’s significant to at least the 50% threshold two sided— maybe higher? I’ve just noticed this doing things both way a bunch of times.)
DeWitt Payne (Comment #104994)
” Look at the 0-700m data and then see if you really believe that the heat content from 700-2000m could really increase that much faster than 0-700m heat content.”
———————————-
This is what I get: http://imageshack.us/photo/my-images/407/0700and7002000mohc.jpg/ , using the pentadal data from NOOA. the 700-2000 m rate of change in the OHC anomaly is half the rate of 0-700 m.
Lucia,
I do not understand your question about widening, First, they just do a multiple regression. Here is how it can be done in R without using Taminos subroutines, just with lm():
start <- 1979
lag1 <- 7
lag2 <- 4
lag3 <- 1
m start] ~ mei[t>(start-lag2/12)& t(start-lag1/12)& t(start-lag3/12)& tstart]+f1.sin[t>start]+f2.cos[t>start]+f2.sin[t>start])
From this model you can get the standard errors. Then these must be inflated with the factor sqrt(nu.giss), estimated over 1979-2010 to get the standard errors in the paper.
All the standard errors in my previous comment were inflated in that way. Is that what you mean by “widening”?
About AIC, F&R do not include the lags as parameters when the AIC is calculated. Since they only compare models with the same number of parameters, in this case it is not a problem. I tried including the lags in the AIC testing, it changes nothing (I only checked for GISS).
One “funny” thing about the AIC() and logLik function in R is that these methods include sigma from regressions as a parameter of the statistical model. Here is a simple example:
> x y lmx logLik(lmx)
‘log Lik.’ -6.872265 (df=3)
This is just a regression of y on x. The model is normally formulated as: y = intercept + b1 * x
and I am used to think of that model having only two parameters. There is a one more, sigma – the residual variance. That is included by R, as you see, R gives df=3.
Cannot update my comment. This is the call to lm() I wanted to insert:
m start] ~ mei[t>(start-lag2/12)& t(start-lag1/12)& t(start-lag3/12)& tstart]
+f1.cos[t>start]+f1.sin[t>start]+f2.cos[t>start]+f2.sin[t>start])
Owen, be careful about the changing geographical distributions of the buoys over time. You have a big step function in 2002 that’s likely to be artifactual for example.
SRJ, as I mentioned above, the real problem with F&R is assuming a single regression coefficient with a fixed lag. That’s easily shown to be an invalid modeling assumption.
Using the wrong lags would be another.
Over the long-haul you’d expect errors associated with these to balance out, but it doesn’t help your interpretation for 10-year periods, when there are substantive flaws in the modeling of short-period variability.
Carrick,
I don’t really have en axe to grind on the F&R paper, I just find it funny to do stuff like this in R.
” SRJ, as I mentioned above, the real problem with F&R is assuming a single regression coefficient with a fixed lag. That’s easily shown to be an invalid modeling assumption.”
How would you show that this is an invalid assumption? What would you do instead.
If I understand you right, you say it is wrong to model eg. gistemp as (this is just the F&R model with the fourier terms dropped for brevity):
gistemp(t_i) = b0 + b1*MEI(t_i-lag1) + b2*volc(t_i-lag2) + b3*solar(t_i-lag3) + b4*t_i
I don’t see why this wrong, except for the “all models are wrong”-part.
“Using the wrong lags would be another.”
What are the correct lags then? How would you estimate them?
Put another way, determining the magnitude of systematic effects associated with modeling this system incorrectly can only be done by doing it correctly and comparing the results.
(If the noise were simpler, e.g., broadband white, you could readily estimate the effects of incorrect model design, but here the whole point of the model design is to remove large effects from ENSO, TSI etc..)
In the case of ENSO, if you look at correlation and lag by latitude band, you see that the lag is latitude dependent. What this means for regressing against global climate is your ENSO impulse response function is not a delta function with a lag of “tau0”, but a continuously varying function with non-zero lag from 0-1 year.
I suspect aerosols are even more complex, because where the forcing from them changes over time (latitudinal weighting is not constant), e.g., see major eruptions.
I know how to do this “right”, but what tamino has done isn’t close.
lucia (Comment #105023)
“Nick’s just doing what you do when you include a parameter in any model. You say y=mx+b, then you let the model find ‘m’ and ‘b’ that fit best (or over fit.) I don’t see that as a huge problem. It’s ok with AIC too– provided the lag is accounted for in the parameter count.”
Looking again at what the Foster paper authors did we have:
“The influence of exogenous factors can have a delayed effect on global temperature. Therefore for each of the three factors we tested all lag values from 0 to 24 months, then selected the lag values which gave the best fit to the data.”
The authors did not model the lag but rather tested for the best fit by comparing the results for each lag. In that case the Bonferroni correction applies. That the authors did not model with lags is evident in that they did not show coefficients for that variable as they did for the 3 exogenic ones. The authors did not apparently consider the required adjustment for the case where a Bonferroni correction applies or for what would have been required had they used lag in the model.
Overfitting a model is of course the major concern here and including 21 lag levels for each of 3 independent variables would appear to provide a classical example for overfitting.
Of interest to me now is that if one makes the lag comparisons one by one I can estimate the required probability correction. I have not gone through the calculations on how much that correction would be if one included the lag in the model, but one would assume that it should not be greatly different as, in effect, one is carrying out the same process.
Also obvious is that in a regression model treating an independent variable as a factor with multiple discreet levels is not the same as treating an independent variable with continuous change. We have to select the discreet levels to put into the model. If the level is categorical the parameters for the model increase with the number of categories, but with lags I would not think we are dealing with categories.
lucia (Comment #105023)
“Nick’s just doing what you do when you include a parameter in any model. You say y=mx+b, then you let the model find ‘m’ and ‘b’ that fit best (or over fit.) I don’t see that as a huge problem. It’s ok with AIC too– provided the lag is accounted for in the parameter count.”
Looking again at what the Foster paper authors did we have:
“The influence of exogenous factors can have a delayed effect on global temperature. Therefore for each of the three factors we tested all lag values from 0 to 24 months, then selected the lag values which gave the best fit to the data.”
The authors did not model the lag but rather tested for the best fit by comparing the results for each lag. In that case the Bonferroni correction applies. That the authors did not model with lags is evident in that they did not show coefficients for that variable as they did for the 3 exogenic ones. The authors did not apparently consider the required adjustment for the case where a Bonferroni correction applies or for would have been required had they used lag in the model.
Overfitting a model is of course the major concern here and including 21 lag levels for each of 3 independent variables would appear to provide a classical example for overfitting.
Of interest to me now is that if one makes the lag comparisons one by one I can estimate the required probability correction. I have not gone through the calculations on how much that correction would be if one included the lag in the model, but one would assume that it should not be greatly different as, in effect, one is carrying out the same process.
Also obvious is that in a regression model treating an independent variable as a factor with multiple discreet levels is not the same as treating an independent variable with continuous change. We have to select the discreet levels to put into the model. If the level is categorical the parameters for the model increase with the number of categories, but with lags I would not think we are dealing with categories.
SRJ:
I don’t really have any axes to grind here, other that just being disappointed it wasn’t done more properly.
The analysis in my previous comment addresses this.
Probably the best approach is to break out the temperature reconstruction by latitude band, then regress e.g., ENSO34 index or MEI against each band, using the lags for each band that maximizes the correlation When you do this, you have to look at the correlations as a function of lag for each given band, and make sure this assumption is valid. At least for ENSO, it is. (Here’s a typical example.)
You can do the same thing for aerosol forcings (they typically are given by latitudinal bands, e.g., see Amman 2003 for volcano forcings).
Once you’ve computed the residual climate response for each latitude band (that is mean value of temperature from a reconstruction minus the regressed, lagged quantities for that band, you sum over residuals making sure to use the appropriate area weighting (cos(latitude)) and normalization factor (sum over cos(theta) for the bands you’ve picked).
Carrick,
“In the case of ENSO, if you look at correlation and lag by latitude band, you see that the lag is latitude dependent. What this means for regressing against global climate is your ENSO impulse response function is not a delta function with a lag of “tau0″, but a continuously varying function with non-zero lag from 0-1 year.”
Well, couldn’t you take the average lag, and use that? As a simple approach? What is the average lag that you find?
If you let the lag ENSO vary with latitude, I would think that you should also use different temperatures for different latitudes.
And then you end up with a much larger model, with more parameters, and more complicated to fit.
I would be very curious to see how this is done right in your opinion, maybe Lucia could offer you a guest post.
DeWitt Payne,
“There are other anomalies in the ocean heat content data as well. This graph illustrates one problem.”
——————————————————
Interesting graph. I’m struck by tight linearity of the data from OHC 1955-1995. The break is puzzling, but a linear correlation (albeit with more dispersion in the data) continues from 1996 onward. (So it seems that as we develop better methods to measure both OHC and sea level, the uncertainty in the correlation increases?)
My operating assumption (until it is proven otherwise) is that the ARGO floats produce the historically best OHC data, by employing the latest technology, and with enhanced coverage in terms of area and depth. From ARGO I see no evidence of a pause in the growing OHC anomaly, and hence in global warming in the past 8 years
SRJ:
Let me also address this:
It’s wrong because, for example, if you’re going to use global ENSO, the affect on ENSO globally doesn’t have a constant lag with latitude. So you end up with an impulse response function for ENSO that you have to convolute the ENSO index with before performing the regression. Let’s see if LaTeX works
$latex T_{gm}(t_n) = T_{res}(t_n) + a_{mei} \sum_{k} IPR(t_n – t_{n-k}) MEI(t_{n-k})$
Here $latex T_{res}$ is residual after regression, $latex T_{gm}(t)$ is e.g. GISTEMP, and IPR = mean impulse response function.
The mistake being made here is replacing IPR(tau) with delta_{n,k}. What this does is improperly place all of the variability associated with MEI forcing at one time at just one lag, and that has a difficult to determine effect on the residual of the temperature series, other than by doing it correctly then comparing the results.
Kenneth Fritsch (Comment #105034)
“The authors did not model the lag but rather tested for the best fit by comparing the results for each lag. In that case the Bonferroni correction applies. That the authors did not model with lags is evident in that they did not show coefficients for that variable as they did for the 3 exogenic ones. The authors did not apparently consider the required adjustment for the case where a Bonferroni correction applies or for would have been required had they used lag in the model.
Overfitting a model is of course the major concern here and including 21 lag levels for each of 3 independent variables would appear to provide a classical example for overfitting.”
They use model comparison via AIC to find the “best lags”. They are not testing multiple hypotheses and not calculating probabilities so there is nothing to Bonferroni correct.
From Wikipedia:
“AIC does not provide a test of a model in the sense of testing a null hypothesis”
http://en.wikipedia.org/wiki/Akaike_information_criterion
What they do is comparing the models on a relative scale, and then accept the model with lowest score on that scale.
At the end of the day man made global warming isn’t happening. End of story. All of this fancy science maths and whatever is nice but, as I said, it isn’t happening.
SRJ
If your ENSO signal is stationary you might be able to. If the signal is nonstaionary, then assuming constant lag negates the point of the exercise, since you are effectively smearing the ENSO response over time, which is the opposite effect of what you want to do (which is remove it).
The average lag is about 2 months.
Ideally, you should use the time series for each latitudinal (zonal) band.. Instead of just one series to regress over you end up with e.g., separate series for -80, -70, .. +80°. But then you just regress with each of these series separately, then sum them with the appropriate weighting factors to recover global mean temperature.
The model isn’t any more complicated in the sense that for each zonal band there are the same number of coefficients as before.
Carricks,
Thanks for the clarification. What about the solar forcing, is it also wrong to fit that with just a single lag?
If I had a large computer, or plenty of time, I could fit a model with the F&R approach allowing for more than one lag. Every time I add another lag means that I am testing lots more models. F&R use maximum lag of 24, that gives 24^3 fits to evaluate. Adding just one more lag increases that to 24^4, if you want to test all lags for all terms.
With ENSO, I could maybe just allow for 12 lags for ENSO, with the values 1:12 months. Then I actually only need to vary the lags for volcanic and solar, 24^2 regressions.
Could using 12 lags for ENSO be “a poor mans” fix to the problem you point out?
I have tried with fitting a model with 2 lags for enso. It slightly improved the fit, and especially that model captured more of the 1998 El Nino.
Owen:
IMO, the break isn’t that puzzling It owes it’s origin to the turning on of the ARGO network in 2002. Sacrilegiously enough, you might want to look at this.
Especially this figure.
Opposite to on land, there seems to be an increase in global temperature trend as you move to more southern latitudes. When ARGO turned on, the amount of sampling of southern locales increased.
It’s possible to side-step this with more robust reconstruction methods (e.g., EOF), but standard approaches are sensitive to a change in geographical distribution.
E.e., HADCRUT just averages over all non-zero 5×5 cells, so if the number of nonzero cells changes, so does the geographical weighting of cells. If all cells had the same warming that wouldn’t be an issue. Since there is a latitudinal dependence on trend, changing the mean latitude of the measurements introduces a bias.
Hope this is clear enough.
SRJ,
I don’t know it this helps, but the decay from a solar perturbation is ~ 15.5 years. It is confusing with the ENSO and and solar cycle overlap.
https://lh6.googleusercontent.com/-fRgSOyDdr6E/UH7G8nPx5GI/AAAAAAAAE8c/UB8TMwI5R3w/s912/16%2520year%2520recurrences.png
That compares two 187 month periods. When ENSO and solar have complimentary symmetry you get the initial larger El Nino pulse. At least, that is what I have found so far.
SRJ
In principle yes. Latitudinal dependence of TSI is modulated by season of course, and if TSI is non stationary, then this will introduce spurious frequency content into your global temperature.
I think you can just fit to a coefficient for each lag. There’s a related method for computing pole-zero structure of the transfer function (the Fourier transform of the impulse function) that does something similar.
I’m afraid, I’d have to test it on synthetic data first.
SRJ–
Yes. R widens if you use their multiple regression.
Unlike some bloggers, I tend to hate communicating primarily through code where one has to figure out the variable names, sort out functions etc. I prefer documentation in mathematical equations, explicit words and graphs. I prefer code as a supplement only.
Anyway, the result was I was just thinking of their graphs– they have a graph of the “corrected” and then discuss widening error bars. So I didn’t know if they fit their (corrected Temp) is a function of time and then widened those or if they did the multiple regression and widened those. So… sounds like the latter– which is likely fine.
SRJ
Can’t one hypothetically have zero lag as ‘not a parameter’? Or not? (I don’t actually know. But with multiple regression there is generally a choice of something like
y=mt+b
And you do consider the case of m=0 as having 1 fewer parameter.
Kenneth–
Ok. Yes. I see what you mean. I’m not sure Bonferroni is right because the results of tests at each lag are correlated. But some sort of adjustment likely is needed to account for the fact that one hunted around. I don’t know what the “right” way to do this is. But it seems to me that ordinarily there is some sort of “null” — which might be “use without lag”. In AICc, used for most things, that would be something with one less parameter in the system. If one used “statistical significance”, you check whether the parameter is “statistically significant”.
SRJ
I think single lag physically makes no sense in a multiple regression. There are limiting ‘hypotheticals’ to consider:
1) Suppose the solar forcing underwent a step function. In this case, if the earth was a simple system with on physical time constant, the temperature of the earth would not be a step function with a lag relative to the sun. Rather, what happens is that when the sun’s forcing rises, the “equilibrium” temperature for the earth changes, call that Teq. The actual measureable tempeature would rise in a way that caused the difference to decay exponentially: So, if Tdif=T-Teq, the magnitude of Tdif decays exponentially. This would not look like a specific lag in a linear regression.
2) The sun’s energy is constant for a long, log time, and then starts rising linearly. In this case, there is a small period of adaptation, after which the earth’s temperature will rise linearly. The response would not track like a “lag”.
3) The suns energy is a perfect sinusoid for a long long time. (It really has to be perfect.) In this case, the earth’s temperature would look like a lag.
These are all for over simplifications of what the earth’s response would look like. It’s not clear to me that one is likely to find situations where hunting for a “number of months” lag between the earth’s response and solar would be fruitful. But it might be ok if one was looking for “main” 12 year solar cycle. Unfortunately, it’s going to be pretty darn difficult to get a good value over 30 years– you don’t have many cycles.
lucia (Comment #105048)
Without getting into the latitudinal approach Carrick is suggesting, I would think that the proper approach by the authors of this paper would have been to establish the lags that prior evidence showed fit best for each of the exogenic variables and then applied those lags. When the authors use different lags for different temperature sets and chose from a wide range of lags it appears that they are merely casting about and fitting data and that leads to overfitting. I would think that the authors would have to show that the lags selected were statistically different when correlated with temperature than the other choices of lags.
Kenneth–
I suspect you could hunt around and find a variety that way too!
But yes, permitting themselves different lags is an issue.
It seems to me one could try to find the lags that minimized the residuals over all observational sets. (Or at least have one for “land” vs. “satellite” so over all 3 land based for land.) It would require some thought and custom coding, but presumably you can minimize residuals summed over 3 series to find a lag. It might reduce the appearance and risk of overfitting.
Maybe I am being pigheaded here in thinking about using a lag variable as categorical, but I see from the link and excerpts below that the dividing line between continuous and categorical is not always clear. I am thinking that if one considers the variable categorical that the number of parameters increases with levels within a category and thus decreases the degrees of freedom more than if the variable would be considered continuous.
http://www-users.cs.umn.edu/~ludford/stat_overview.htm
“The correct statistical test for an experiment largely depends on the nature of the independent and dependent variables analyzed. For the purpose of choosing a statistical test, variables fall into two classes: Categorical and Continuous. Categorical variable values cannot be sequentially ordered or differentiated from each other using a mathematical method…
With understanding of the basic model for choosing a statistical test, we can add relevant details to the model. First, we need to address two additional types of variables, ordinal and interval.
First, ordinal variables are similar to continuous variables; they can be ordered sequentially. They are also similar to categorical variables because they (perhaps) cannot be differentiated from each other using a mathematical method. For example, education level is an ordinal variable. The levels of educational achievement (high school, some college, undergraduate degree, etc.) can be sequenced in the order in which they are achieved, and when defined as such, cannot be differentiated from each other mathematically. So the question is, using the simple model for choosing a statistical test, is an ordinal variable Categorical or Continuous? The answer depends on how the researcher defines the variable. When education levels are defined as high school, some college, undergraduate degree, etc., the levels are categorical, and the researcher should choose a test for categorical data. The researcher could, however, define education level in a slightly different way. If the researcher instead defined education level as years of full-time education, then the variable takes on the characteristics of a Continuous variable, and the researcher should choose a statistical test for a Continuous variable.
Interval variables also exhibit characteristics of Categorical and Continuous variables. Interval variables fall into equally spaced ranges. For example, an experimenter collects salary levels using the following ranges:
• • $10,000 – 20,000
• • $20, 000 – 30,000
• • $30,0000 – 40,000, etc.
The values can be numerically sequenced, so they are similar to Continuous variables. Because the ranges are equally spaced, though, an unnatural restriction is placed on the values, and thus they are similar to Categorical values. When it comes to choosing a statistical test, there is no hard and fast rule for defining interval data as Categorical or Continuous, and the researcher should use his/her discretion in making the choice. Granularity of ranges is a reasonable guide for deciding how to define the data. For example, when intervals are granular, the researcher may decide to define the variable as Continuous, and for coarser intervals, Categorical.”
SRJ:
Is there really any reason to set the maximum lag to 24? I get doing it for thoroughness, but for practical purposes, I don’t think it’d make any difference to set it to 12. That’d save tons of computation time, and you could always change it to 24 temporarily to test your results.
lucia:
Could you explain this? I’m probably just missing something obvious, but I don’t see why that wouldn’t track like a “lag.”
Brandon–
Here: http://rankexploits.com/musings/2008/why-should-the-mean-temperature-to-increase-linearly-with-time/
That’s the response of a simple 1-lump system to forcing that was zero for a long time and then started increasing linearly.
Kenneth, sorry if this is a bit terse (time crunched today).
The problem with assuming a single lag is that you introduce errors from the unmodeled portion of the exogenous quantity you’re trying to subtract off that can actually lead to larger variability than was there previously. Rarely will you actually get any real attenuation out of the band that you selected for.
To make this maybe easier to follow, let’s take a frequency “w”.
dT(t, w) = A2 sin(w (t – tau2) – B2 sin(w (t-tau1))
Here A2 is the amplitude in your signal “Tgm(t)” and B2 is the amplitude of that signal that you are subtracting off Tgm(t).
“tau2” is the actual lag at that frequency and tau1 is the version that can from your minimization algorithm.
dT only goes to zero if tau1 = tau2 and A2 = B2. Otherwise, you’ll get an amplitude A^2 = A2^2 + B2^2 – 2 A2 B2 Cos[(tau1 – tau2) w]
What this means is that the amplitude at this frequency in your residual can actually end up larger than the signal that was there before you (frankly) corrupted it with this procedure.
This isn’t as big of an issue for long-term trend estimation, but if you want to look at shorter periods (e.g. 10-years), you can’t say that you know the trend with any more certainty than you did before, even though it might look smoother on that time scale, without having a way of estimating the effect of the unmodeled portion of your quantity.
The only way I know to estimate that is to “do it right”. And once you’ve done it right, there isn’t any point for this exercise, to do it wrong (it is almost as easy to produce time series in zonal bands and regress on them as on the total series).
Of course, for these simple physics, the “lag” if solar is a perfect– single– sinusoid is well understood. We could come up with a value.
But you can imagine that if the solar forcing was
F= a sin(w time ) + b(sin W time)
with ‘w’ a frequency of 1/12 years and W a frequency of 1/120 years (just to pull numbers out of a hat). The “lag” could be a rather complicated looking thing.
OK, Global Warming is still there but has slowed down (only noticeable by instruments though), in that case can we speak of “AGW-Lite”?
Brandon Shollenberger (Comment #105053)
October 18th, 2012 at 12:44 pm
“Is there really any reason to set the maximum lag to 24? I get doing it for thoroughness, but for practical purposes, I don’t think it’d make any difference to set it to 12.”
It is just to match F&R’s method. It would make sense to set it lower, especially since the maximum lag reported by F&R is 7 months. The problem is that if you set maximum lag at eg. 8, and find a lag of 8 months, you won’t know if the fit would be improved with a lag of 9 months.
Lucia and Kenneth,
I am no expert on model comparison but I do not think that one needs to correct for the number of models tested when using AIC. AIC gives you the best model, relative to others you consider. You will always end up accepting one model, because it is better than the others (have lower AIC). But it might still give a really bad fit to the data.
SRJ, Carrick,
Carrick – I agree with all your comments so far on the subject of F&R.
One way to demonstrate the problem with F&R is to (a) postulate an LTI, invert all of the temperature variants to flux (easy enough with LTI) (b) run the regression on the original plus inverted flux data (c) forward model the result (d) look at the difference!
You cannot intelligently run MLR on a combination of forcing data at different frequencies. It is a negation of intelligent physics. Adding scaled temperature data to the mix turns it from really stupid to really really stupid.
To make this mudpie perfect, all you need to add to the stupidity is an inference that because the residual variance of the variable of interest is reduced by adding a bunch of proxy TS variables to the MLR mix, then we can see it is obviously working. Duh.
The more I think about the more I am inclined to say that a lag variable in a regression should be categorical. It surely would have nonlinear effects and I would doubt that it has a monotonic relationship with temperature.
lucia, thanks. For some reason, when you said the response wouldn’t track like a lag, I was thinking you meant after the period of adaptation. You can imagine why I was confused at the idea the linear portion of a response to a linear rise wouldn’t track like a lag.
Is it fair to say a single lag only works if the forcing function doesn’t change?
Kenneth,
“In that case the Bonferroni correction applies.”
I think you’re focusing on what would be needed if you really wanted the confidence intervals on the lags. But they don’t – the actual value of the lags is not their prime result. They want to know the effect on the temperatures.
It might turn out that the lag optimum is poorly characterised – perhaps there is no dependence on lag at all. That might need investigation but doesn’t necessarily hurt their temperature analysis.
SRJ:
That makes sense. I was just thinking if processing time was an issue, reducing the maximum lag would be a good way to help mitigate it. That’s especially true since I know I usually have to run a program multiple times to get kinks worked out.
Worst case scenario, you just increase the maximum and test if it changes anything.
Carrick,
I have spent a little time researching a point that you raised on a previous thread, and it is very relevant to this thread, namely the assumption that ENSO is a “zero-sum” game over the short term – the temperature oscillations are high frequency and integrate to zero over the relatively short term.
There are a (huge) number of well-researched papers showing that ENSO modulates cloud formation over the Pacific, and that this cannot be treated as a simple temperature-dependent feedback. By definition, the various ENSO indices are calculated to cycle around zero. (They are calculated relatively to short-term averages.) The “mainstream” argument is that since ENSO involves (only) an internal re-distribution of heat, it cannot be responsible for long-term temperature trajectories. Clearly, this argument is invalidated if there is a propensity for El Ninos during certain oceanic phases (and La Ninas during opposing phases) and these result in a significant reduction (increase) in cloud albedo/and or cloud LW effects.
I have decided that, on the evidence, I cannot eliminate the possibility that ENSO is actually a major indirect control on LOW frequency temperature change.
This is now making me very cautious when considering any and all attempts to “eliminate ENSO” from the temperature response, which will probably please Bob Tisdale.
Carrick, I did not find your comment terse – at least not Carrick terse. I have no issues with your latitudinal approach. I merely have been concerned about selecting and applying lags, how much uncertainty that adds to the result and whether if one were to model the lag it would be a continuous or categorical variable. I would suppose my concerns are the same whether applied globally or by latitudinal zones.
SRJ
Doesn’t AIC include a penalty for the number of paramters? Or is that only AICc? I always need to look it up.
I guess what I’m thinking is that including/excluding a lag is effectively another parameter. Clearly– if you can vary it– it acts in some ways like a parameter. So there *should* be a test to see whether the ‘optimum’ choice is necessarily statistically significant. Maybe it’s only AICc that accounts for the difference in number of parameters used?
Anyway– not a biggie for me. I’m mostly paying attention to other things today! (Got a d* graph coming. It’s my “harder to cherry pick” method. Nick is not going to like the answer from 2001. I was expecting to get “fail to reject”… but… not… I’ve got to look for some other years. Look for bugs too.)
In the light of the objections from Carrick and others about multiple regression with lagged variables, this paper might also have issues:
http://journals.ametsoc.org/doi/abs/10.1175/JAS-D-12-0208.1
I haven’t read it yet, so I am not sure if they do lagged regression of ENSO and AMO. If they do, the same criticisms as to F&R applies also to this paper.
Maybe we should collect the various comments on that issue in this thread and submit a comment 🙂
Nick:
As long as you never need an error bar, that is.
lucia (Comment #105066)
October 18th, 2012 at 3:46 pm
“Doesn’t AIC include a penalty for the number of paramters? Or is that only AICc? I always need to look it up.”
They both include a term that penalizes the number of parameters. AICc accounts for finite sample size. In the case of testing models all with the same number of parameters (as F&R do in the code published), AIC and AICc gives identical evaluations (relatively).
” I guess what I’m thinking is that including/excluding a lag is effectively another parameter. Clearly– if you can vary it– it acts in some ways like a parameter. So there *should* be a test to see whether the ‘optimum’ choice is necessarily statistically significant. Maybe it’s only AICc that accounts for the difference in number of parameters used? ”
F&R do not consider different number of parameters, all their models have 9 regression parameters (10 if you include sigma, like R does) and three lags.
Nick Stokes (Comment #105062)
“I think you’re focusing on what would be needed if you really wanted the confidence intervals on the lags. But they don’t – the actual value of the lags is not their prime result. They want to know the effect on the temperatures.”
If you have no confidence in the level of lags then, of course, you would not use lagged exogenic variables. I think we all agree that there are lags involved. Did the authors mentioned what their results would be without using lags?
SRJ, thanks for the link. Unfortunately it’s ahead of print so I’d have to pay hard cash to see it. I’ll wait. 😛
Paul_K, I don’t know that I buy into Tisdale entirely, but there is a physical basis for ENSO (and other large scale currents) doing more than just passively [*] redistribute energy.
(To clarify, a process that is passive is one that doesn’t need an energy source. An active process is one that does. The rules for what is “allowed” by a process like ENSO is very different if you view it as an active system.)
SRJ (Comment #105067)
“The deduced net anthropogenic global warming trend has been remarkably steady and statistically significant for the past 100 years.”
The import would not be how they handle lags, but better that if that last sentence in the abstract holds into the main body of the paper it would provide a view of anthropogenic warming occurring much early than most now estimate and evidently with greater early intensity.
Kenneth Fritsch:
Especially interesting because full scale models have deduced that pre-1970 warming… not so much. I’m not sure how you pull information content out of a very low-frequency phenomena like AGW warming trend…
Unless they conflate warming with AGW warming. In which case, wonder what they think about PETM?
Put another way, they’re not off to a good start. Except for people who like talking points. Good for talking points.
SRJ (Comment #105039)
“They use model comparison via AIC to find the “best lagsâ€. They are not testing multiple hypotheses and not calculating probabilities so there is nothing to Bonferroni correct.”
AIC, which, by the way, is not a test of statistical significance and cannot be used in hypothesis testing, will score candidate models based on the goodness of fit while penalizing for the complexity of the model. That in itself does not deal with the issue that Bonferroni does in testing multiple hypothesis.
AIC will consider the relative number of parameters in a model but in the case of testing lags I am assuming that the Foster paper authors, as described there, did these tests with a different lag each time and could have, as you suggested, obtained an AIC score. That process, however, says nothing about the repeated testing required to find that model/lag with the best AIC score. The AIC process is penalizing the parameters used in the models being compared and not the number of tests required to obtain the best lag.
Carrick (Comment #105073)
I saw a reference to a paper in a local newspaper article where the amount of methane in the atmosphere over a millennial time period had been measured, from ice cores as I recall, and claimed that the effects of anthropogenic methane would have had noticeable effect on global temperatures. There were not many details in the newspaper and I do not recall where the paper was published.
Kenneth, I would say the “best evidence” comes from GCMs and they include methane along with other GHGs.
I am comfortable with their conclusion that no discernible effect of humans on global mean temperature until circa 1970. I think the argument is that we really have great uncertainty even post 1970 on the forcings driving climate.
How could just looking at one forcing (when you have to combine them to get a total picture) even be used as the basis for broad-sweeping statements like this?
Carrick,
“I am comfortable with their conclusion that no discernible effect of humans on global mean temperature until circa 1970. I think the argument is that we really have great uncertainty even post 1970 on the forcings driving climate. ”
I am not nearly so comfortable with this. The only way you can reach that conclusion is if aerosol effects offset ~50% of GHG forcings from 1850 to until circa 2010. I have seen no evidence that this is the case, and further, it is a stretch to suggest that aerosols have (perfectly) managed to offset a very large fraction (circa 50%) of GHG forcing over the last 100+ years.
.
Let me offer an alternative interpretation: the aerosol offsets are wrong (two times too high), and the climate sensitivity is much lower than models suggest…. eg 1.9C per doubling at equilibrium.
Carrick: “no discernible effect of humans on global mean temperature until circa 1970”
I doubt 6 Pinatubo’s worth of SO2 put into the atmosphere by humans had no effect.
Bruce:
My discussion was specifically about measurable anthropogenic effect on global mean temperature, as modeled by the GCMs. (I think if you reread my comment you’ll see that plainly spelled out.) You can certainly measure effects from increased SO2, even if the net effect on global mean temperature over part of that interval is in the noise floor.
As I said “I would say the ‘best evidence’ comes from GCMs,” as summarized here. If I had to list something as “baseline” it would be the work of a lot of scientists, such as Isaac Held, Tapio Schneider and V. (Ram) Ramanathan, rather than anybody on this thread, myself certainly included.
SteveF—I’d say there are possibly models that are consistent with your lower level, but my concern is that you can compute two primary effects, direct CO2 forcing and water vapor feedback and they come in higher than the sensitivity I think you would favor.
I honestly don’t understand atmospheric chemistry well enough to know whether sulfates could be a factor of two off or not. Atmospheric aerosols have always made my eyes glaze over (as in totally bore me) when I’ve heard it discussed. I’ll have to defer arguments over its validity to others.
Bruce,
“Carrick: “no discernible effect of humans on global mean temperature until circa 1970″
I doubt 6 Pinatubo’s worth of SO2 put into the atmosphere by humans had no effect.”
You DO realize that the effect of CO2 in the lower troposphere and in the stratosphere is not the same net effect?? Unless you have a mechanism for those 6 Pinatubos of anthropogenic SO2 to have been injected into the stratosphere I would say you have misled yourself with meaningless magnitude.
Carrick: “I honestly don’t understand atmospheric chemistry well enough to know whether sulfates could be a factor of two off or not.”
They can according to James Hansen, see his chart 4:
http://www.columbia.edu/~jeh1/2009/Copenhagen_20090311.pdf
And when you include black carbon..
Beneath the figure he even writes: “The aerosol forcings are very uncertain, and their effect on clouds even more uncertain – the
error bars are subjective and may be too small.”
Is this huge uncertainty controversial?
Niels, since he’s going back to 1750, a factor of two is plausible.
I usually focus on 1950 forwards, because we know there are data corruption issues prior to that, and I don’t think economic data were as certain. There were certainly large changes in forcings and a decent period to study 1950-1980, to compare with 1980-2010.
I would expect that anthropogenic emissions as well as atmospheric CO2 content were pretty well known from then on. I don’t know the date where we have direct measurements of atmospheric sulfates, but I would suspect it would be post 1960.
I would suspect it’s believed that we know sulfate atmospheric concentration levels to much better than a factor of two data post 1980.
A bit OT, I see Aqua has stopped updating as of 10/7. Maybe it has completely failed.
The abstract of the paper on millennial anthropogenic methane levels in the atmosphere and its effect on historical temperatures to which I referred in my previous post is here.
http://www.nature.com/nature/journal/v490/n7418/full/nature11461.html
I believe Watts had a post on the background of the article also.
Anonymous, Maybe you could point at a paper comparing the effect of 20 megatons of SO2 injected into the stratosphere in a thin layer and the same amount near the thermometers.
Stratospheric SO2 actually has some warming potential.
“The aerosols increase the reflection of radiation from the Sun back into space and thus cool the Earth’s lower atmosphere or troposphere; however, they also absorb heat radiated up from the Earth, thereby warming the stratosphere.”
http://volcanoes.usgs.gov/hazards/gas/s02aerosols.php
Handwaving is not a proper response to 7 Pinatubo’s worth of SO2 in 1980 dropping down to 6 Pinatubo’s by 2000 … and then climbing in response to China’s coal power expansion.
Carrick,
Could it be that your latitudinal lag analysis is the result of how perturbations of equatorial upwelling propagate poleward?
I view ENSO and equatorial upwelling as being closely correlated. Duing La Nina, the upwelling and surface cooling is at a maximum and during El Nino they are at a minimum. I can imagine that ENSO would impact the meridional overturning circulation. Personally, I prefer to stick to the lag in global temperatures.
I do agree though that the “impulse response function is not a delta function with a lag of “tau0″, but a continuously varying function…”. It’s just that I haven’t seen any convincing observational evidence. Maybe it’s out there and I just haven’t seen it.
BillC,
Either that or they just don’t think the figures are reliable any more.
The funny thing is, the graph still says it is being updated every day, but that might be automatic.
Carrick,
as we discussed, I have tried to modify F&R’s analysis to allow the forcings have effect on multiple lags. To GISTEMP, I fitted a model with lags of 1-12 months for MEI, volcanic and solar. This model has a ridiculously number of parameters, 43 regression coefficents and 36 lags – 79 total. It has higher R^2 and higher AICc than F&R.

Here is a plot with GISTEMP data, F&R’s model and the many-lags-model:
http://i47.tinypic.com/fd6qe8.png
The many-lags-model does a slightly better job of capturing the effect of the 1998 Nino, and also the 1993 volcanic eruption. Not surprising, with so many more parameters it should be better.
In the manylags-model the parameters have much larger errorbars.

Here is a plot of the coefficients for each lag for the 3 considered forcings, plotted with 1-sigma errorbars:
http://i48.tinypic.com/34q8n7m.png
I think that this is approach is useless. Maybe it would be better to allow each forcing have multiple lags, but not at many as 12.
SRJ,
Now: Freeze it! Then, run it again after we have 10 more years of data, without adjusting the coefficients, and see how it does.
John N-G would be proud.
BillC, or train it on a 30-year period prior to the period you’re interested then freeze it.
SRJ, did you produce the coefficients for it? I’m interested in seeing whether they are realistic or not.
AJ:
Yes I think that’s exactly it.
Carrick (Comment #105091)
“SRJ, did you produce the coefficients for it? I’m interested in seeing whether they are realistic or not.”
Yes the coefficients are plotted in the second figure in my post above.
I have uploaded a textfile with them here:
http://textuploader.com/?p=6&id=GtKCl
Lag is in months, column names are mei.coef for for coefficients and mei.se for standard errors. Similar for volc and solar.
I have only provided the coefficent for the lagged terms in that file. The remaining coefficients are:
(Intercept) -1.047e+02
trend 1.715e-02
f1.cos 2.560e-02
f1.sin 2.597e-02
f2.cos -5.389e-03
f2.sin 2.169e-02
with standard errors
(Intercept) 51.93
trend 0.0017
f1.cos 0.0197
f1.sin 0.0196
f2.cos 0.0189
f2.sin 0.0188
Standard errors are corrected with a factor of sqrt(nu) = 2.07, determined from the residuals of the model..
Carrick,
“SteveF—I’d say there are possibly models that are consistent with your lower level, but my concern is that you can compute two primary effects, direct CO2 forcing and water vapor feedback and they come in higher than the sensitivity I think you would favor.”
I’m surprised at you! This dog don’t hunt.
I can do the same sums and I get a value of 1.84 degC for a doubling of CO2, from Planck, lapse rate and WV feedback. Since all of these effects are considered to be near instantaneous, we should then expect to be able to see the same feedbacks in the historical data, (without having to wait for a 600 year ocean effect). So what do we see from the observational data:-
– match to temperature using GCM forcing data – 1.2 to 1.5 deg C
– match to net flux data – 1.1 to 1.8 deg C
– match to pinatubo data – 1.4 deg C
The observational data suggests that either the theory needs to be modified or there is a persistent negative feedback which is underestimated.
SRJ,
“The many-lags-model does a slightly better job of capturing the effect of the 1998 Nino, and also the 1993 volcanic eruption.”
It’s amazing what you can do with science nowadays. When I was a lad this eruption used to start in June 1991.
I am not at all certain what SRJ was attempting to model here, but the discussion has touched on some salient points about using multiple hypotheses in selecting a best model and the oft neglected differences between the validity of in-sample and out-of-sample tests and the connection of these two points where the use of one can negate the importance of the other.
Given the Foster paper selection of lags and SRJ somehow thinking that using the AIC scores for candidate models can get around the statistics that apply to multiple hypotheses testing, it interesting that when the same selection methods are instead included in a model (more or less I should say, since I am not sure of the exact model used here) we can appreciate the degrees of freedom lost in this selection process.
Others here have posted on this thread noting that a method of reserving some of the time period for testing a model that was produced using another time period can and should be used to validate the model. I think since this method is often applied, the search for a best model using multiple hypotheses and what it does to the statistics is often given short shrift. That neglect is not so harmful when indeed a validation period is used but unfortunately a validation is not always part of the model testing – as we see it was not with the Foster paper approach.
Others here have pointed to true out-of-sample testing where a model is tested on data that the producers of that model could not have had access. Splitting the data into two time periods still has the risk that the tester has peaked at the validation period results. There are perhaps some randomized cross validation methods that might make peaking a less likely violation of the validation process.
Paul_K
Oh… I’m sure I’ve screwed up dates in the past too. Your right, its 1991.
Carrick,
Re: “impulse response function is not a delta function with a lag of “tau0″, but a continuously varying function…â€
Suppose that my hypothesis is that tau is constant. Could this be refuted by examining the lags to forcings of different frequencies? For example, let me assume that the ~70 day lag of SST’s to the annual signal is the truth. This equates to a tau of ~5 months. In a one-box model, the lag can’t exceed tau. Can you think of a lag that is obviously >5 months. ENSO appears to have a lag of 4 months. The lag on the 11yr solar cycle appears to be <5 months.
What about the ~100,000 glacial cycle?
SRJ (Comment #105088)
It appears from the graph that your model with 1/2 the number of lags that the F&R paper sorted through gives a result close to that of F&R with, as you note, a higher AIC score for your model. Your method, however, with its large number of parameters yields some very large CIs making the results, it would appear, rather useless.
I am still not certain of the rationale behind your model (with regards to assignment of continuous versus categorical to the lag variable) but it appears that, in effect, what your model does is better account for the extra parameters required to test all, or in your case 1/2, the parameters that the Foster paper did. Your CIs are, therefore, more representative of the selection process than the Foster paper used. The Foster paper simply makes multiple tests before modeling and then never accounts for the number of those tests used and the effect of those numbers on the final CIs.
Paul_K,
Thanks for the correction. I was just a kid at the time of that eruption.
Kenneth Fritsch (Comment #105096)
“I am not at all certain what SRJ was attempting to model here”
In discussion with Carrick, the idea came up that one could try to include multiple lags for the forcings. The idea is to avoid the problems that Carrick points out, with respect to using just one lag for each forcing.
Kenneth Fritsch (Comment #105099)
“It appears from the graph that your model with 1/2 the number of lags that the F&R paper sorted through gives a result close to that of F&R with, as you note, a higher AIC score for your model. Your method, however, with its large number of parameters yields some very large CIs making the results, it would appear, rather useless.”
I am not sorting through lags. I fit one single model with 12 lags for each forcing. The lags are 1 to 12 months.
Foster and Rahmstorf test models for all lags from 0-24, i.e. they are actually comparing 25^3 models, all with the same number of parameters (3 lags and 9 regression coefficients) and then choosing the one with lowest AIC.
And yes, the model I fit is useless – Just look at the solar forcing. The coefficients for lag 4 and 6 are negative. We would expect that solar forcing increases temperature, so this is kind of silly.
Also note, my model has a higher AIC than F&R, hence AIC is in favor of F&R.
Next idea is to fit with a smaller number of lags for each forcing. I will try that later.
SRJ:
This has turned in to a very fascinating discussion. I think the whole notion of using multiple regression to fit ENSO or solar to temperature – whether it be by choosing the lag with the best fit or by using multiple lags – is just plain model misspecification if one accepts a significant tau. IIUC, since both solar and ENSO are quasi stationary over a certain time scales wouldn’t it be necessary to use an integral of solar or ENSO (incorporating tau) to model temperature?
SLR, ” The coefficients for lag 4 and 6 are negative. We would expect that solar forcing increases temperature, so this is kind of silly.”
No that is kinda like the real world. The problem with solar is that it is absorbed in the atmosphere, surface skin layer, upper mixing layer (0-50meters) and the lower mixing layer (50 to ~100 meters). Each layer has a different time constant and the system solar is impacting is weakly damped. Then to really make it fun, the decay curves of the weakly damped system change with the distribution of the energy and the interaction with other harmonics. I think that is why it is called chaotic 🙂
Layman Lurker (Comment #105101)
” IIUC, since both solar and ENSO are quasi stationary over a certain time scales wouldn’t it be necessary to use an integral of solar or ENSO (incorporating tau) to model temperature?”
Maybe this paper by Kristoffer Rypdal is interesting in that context:
http://www.agu.org/pubs/crossref/2012/2011JD017283.shtml
it is not free, but you can email the author for a copy.
AJ,
” The lag on the 11yr solar cycle appears to be <5 months."
It only appears that way because it is a spurious correlation over the last 30 or so years. There is no reasonable physical explanation for the lag for solar forcing being much shorter than volcanic; tau increases with the period of applied forcing because the ocean has a wide range of response rates over depth; it fairly well screams that the F&R analysis is nuts.
SRJ (Comment #105100)
Also note, my model has a higher AIC than F&R, hence AIC is in favor of F&R.
Perhaps you could post the AIC scores for your model and F&R. Also could you post the code that F&R used to sort for the lags and AIC scores for their sorting process. It would be no surprise that F&R gets a better score because they did there lag sorting by incorrectly not penalizing the degrees of freedom for their sorting efforts by comparing a large number of models, whereas you modeled the lags and you thus are going to be penalized by the AIC scoring for all the parameters you used in you model compared to F&R’s individual models. All this amounts to is pointing to the error that F&R made in not penalizing the degrees of freedom for all the model comparisons they made..
“Next idea is to fit with a smaller number of lags for each forcing. I will try that later.”
And you realize that you are, or least should be, expending some degrees of freedom by making a second model/test.
As an aside, I am piqued after noting that I have been using peaked where I meant peeked.
Bruce,
“Handwaving is not a proper response to 7 Pinatubo’s worth of SO2 in 1980 dropping down to 6 Pinatubo’s by 2000 … and then climbing in response to China’s coal power expansion.”
Yup, handwaving is not the proper response. Where are the studies showing the effects?
Your contention seems to be that it will cause WARMING, except even the silly GISS and HADCRUT over the last 10 years has not shown any warming, or cooling either. Funny how everyone has a pet aerosol or GHG that will cause something and the earth doesn’t seem to care.
Maybe your SO2 is part of UHI? 8>)
Kenneth Fritsch (Comment #105105)
“Perhaps you could post the AIC scores for your model and F&R.”
I did. They are given in the legend of the plot, follow the link.
F&R AICc = -531.8527
many-lags-model AICc =-494.311
” Also could you post the code that F&R used to sort for the lags and AIC scores for their sorting process.”
It is not a sorting process. They just test every single lag from 0 to 24 for each forcing, then take the model with lowest AIC. All their models have the same number of regression coefficients (9) and lags 3, so the number if degrees of freedom do not change.
I have uploaded the relevant part of the code here:
http://textuploader.com/?p=6&id=UjvKI
“And you realize that you are, or least should be, expending some degrees of freedom by making a second model/test.”
From what I have read about AIC degrees of freedom do not enter the considerations explicitly. AIC will penalize models with more parameters, i.e. they need to have a larger likelihood to get good AIC scores.
Anonymous, SO2 has climbed since 2000, but not as much as it dropped from 1980 to 2000. And the changes are not all in the same place.
“At one measurement site [De Bilt], summer sunshine increased from 1985-2010 by 15 Watts per square meter”
“since the mid 1980s a significant increase in
visibility has been noted in western Europe
(e.g. Doyle and Dorling, 2002), and there are
strong indications that a reduction in aerosol
load from anthropogenic emissions (in
other words, air pollution) has been the
dominant contributor to this effect, which
is also referred to as ‘brightening’.”
http://www.staff.science.uu.nl/~delde102/CleanerAirBetterViewsMoreSunshine.pdf
http://hockeyschtick.blogspot.ca/2012/05/new-paper-finds-large-increase-in.html
SRJ (Comment #105107)
For now I’ll forgo what has become more of philosophical discussion on determining the proper adjustments for model selection using AIC and rather asked you a few questions more. First though I want to thank you for directing me to F&R code. I have not examined it in detail yet but I plan to as time permits and run some sensitivity tests and perhaps test the model on earlier data.
I was curious about the F&R AIC scores for the models that were compared. I know there would be a large number of scores, but I would think it important to determine what separation in scores were obtained for at least the top 20 models. Do you have that information handy for posting?
Also do you judge that looking at multiple models with AIC scoring could result in over fitting a model? In other words do you judge that AIC scoring can compensate/penalize for the number of models scored?
Do you see any problems with F&R using different lags in modeling with the three surface temperature data sets for the same variable and then claiming in the paper about how well the data sets provided similar results?
Do you think the F&R authors were remiss in not splitting the data into a model training period and a model testing period?
Kenneth Fritsch (Comment #105109)
“First though I want to thank you for directing me to F&R code. I have not examined it in detail yet but I plan to as time permits and run some sensitivity tests and perhaps test the model on earlier data.”
The TSI data only goes back to 1976 so even for the surface indicies it is hard to go back further. Then you need to use proxy data for TSI rather than the satellitte measurements.
“I was curious about the F&R AIC scores for the models that were compared. I know there would be a large number of scores, but I would think it important to determine what separation in scores were obtained for at least the top 20 models. Do you have that information handy for posting?”
I have just checked a few examles. Long time ago, can’t remember details. You could tweak the lagtest function to also output AIC scores for each model. Eg. you could make a dataframe with the lags and the aic:
data.frame(lag1, lag2, lag3, AIC(zfit$model) )
“Also do you judge that looking at multiple models with AIC scoring could result in over fitting a model? In other words do you judge that AIC scoring can compensate/penalize for the number of models scored?”
I am no expert on AIC, but the number of models tested does not enter the AIC calculation. Googling around I found this introduction to AIC:
http://avesbiodiv.mncn.csic.es/estadistica/senseaic.pdf
where this statement is given:
“Thus, the concept of significance becomes superfluous with the AIC.” (page 10)
Also, this book should be very informative on this matter:
http://books.google.dk/books?id=BQYR6js0CC8C&redir_esc=y
SRJ (Comment #105110)
SRJ, I believe you forgot to answer two of my questions about the F&R paper.
I am aware of what these articles that you can Google on AIC relate and unfortunately it does not consider the issue we are discussing. Unfortunately if you Google on multiple hypotheses problem it will not discuss AIC.
I can conjure up a thought experiment where I could use multiple hypotheses to select a best model and then repeat that same process by pooling all the model results together and selecting the best model by AIC score. I do not believe one can get around the multiple hypothesis problem here by using a huge pool of models and selecting the best with AIC scoring. Actually the F&R case would work nicely in my thought experiment.
In fact the F&R use of AIC does not need the parameter penalty since all the models being compared have the same number of parameters. One could do null hypothesis testing. What would it tell you if the top ten, more or less, AIC scoring models could not be shown to be statistically significantly different. I think that might be first look I do with the F&R code.
Kenneth
Model comparison (using AIC) is another paradigm than null hypothesis testing. AIC does not tell you if a model is significantly better than others. It just gives you the model that minimizes information loss. See e.g. slide 18 here:
http://www4.ncsu.edu/~shu3/Presentation/AIC.pdf
Or take a look at the paragraph “NHT is inferior to ITMC for model selection” at page 6 here:
http://research.eeescience.utoledo.edu/lees/papers_pdf/Stephens_2005_JAE.pdf
Here is another quote that emphasizes that AIC and hypothesis testing are different approaches:
“Note that these calculations are based on information theory, and do not use the traditional “hypothesis testing†statistical paradigm. Therefore there is no P value, no conclusion about “statistical significanceâ€, and no “rejection†of a model. ”
http://www.graphpad.com/guides/prism/5/user-guide/prism5help.html?reg_how_the_aicc_computations_work.htm
Paragraph 8 here is also relevant:
“Secondly, there are no problems with multiple hypothesis tests, as it is quite reasonable to compare several AICs for a given data set.”
http://tinyurl.com/9uzbwb6
I think you are mixing up to different paradigms.
You are however correct that if the AIC values are close it might be problematic to choose between models (see paragrah 8 at my last link). With respect to F&R it would be interesting to se how much AIC differs if we use giss lags for NCDC and HADCRU, and if we use ncdc lags for GISS and HADCRU etc.
It might be the case that the AIC’s are so close that it is hard to choose the lags.
SRJ–
But I think you can regain aspects of ‘hypothesis testing’ by comparing the AIC of the case with no lag with the AIC with lag– and computing the AIC “with lag” adding ‘1’ to the number of coefficients. Because the lag is a coefficient and to some extent, the cases ‘with lag’ are using 1 more coefficient that the base of ‘no lag’. This is true even if someone thinks of the no lag case as having the same number of parameters as the lag case. If the ‘no lag’ case has a better AIC than ‘lag’ when AIC is computed with +1 added to the number of parameters, that sort of replicates the notion of ‘significance’ test.
I know someone would have to custom-add the parameter count to R– but it should be do-able.
Been away all weekend and missed the fun. 😉
Paul_K, regarding this, if I get a chance I’ll read through your more extended notes which I assume to be related. It’s always tough to figure out where people are coming from when they don’t agree with standard calculations, but you aren’t sure what literature they’ve read and what they haven’t. So I don’t know if our disagreement is because of something I haven’t seen or considered, or something you haven’t seen or considered. Short comment on your calculation of sensitivity: Standard calculation for direct CO2 sensitivity is around 1.1°C/doubling, standard calculation for H2O gives a number with some uncertainty, but it’s generally agreed to be larger than the direct CO2 sensitivity, which puts it at at least 2.2°C/doubling (theoretical). As an aside, I don’t think a realistic calculation of “real” ECS is even possible without at least using a model that includes Walker cell circulation.
SteveF Regarding this comment, There is sub-annual as well as multidecadal lagged contributions to the response of climate to solar forcings. If you want to talk multidecadal, you’d certainly have to limit yourself to an integrated (low-passed) version of TSI. Anomalization does not and cannot remove all of the response to TSI, because TSI is not stationary. In fact, it is pretty messy (or not possible?) to get actually TSI response correct using anomalized data (you are dealing with a partial cancellation between changing TSI forcing and an algorithm that subtracts a fixed portion off of it, with the amount remaining after subtraction depending on the algorithm as well as changing available data over time)…It’s probably better to use unanomalized data if you really want to model TSI (something like Roman/Jeff/Nick/Steven’s code where you include TSI). I’ve decided I don’t want to model TSI. 😉
SRJ, regarding negative coefficients—they happen and they are physical. If you start out with a delta-function like impulse response function, once you frequency limit it, you’ll get a sinc-like response function, with negative as well as positive coefficients. It wouldn’t be “silly” even if the sum of coefficients were negative, as long as you are starting with anomalized data, because all that means is the baseline year anomalization method simply “over-subtracted” the response from TSI for the period you were regressing against. It would be “silly” only if you regressed against non-anomalized data and found a net negative response to TSI.
Language correction:
It wouldn’t be “silly†even if the sum of coefficients were negative, as long as you are starting with anomalized data, because all that means is the [interval used fro the baseline] for the anomalization method simply “over-subtracted†the response from TSI for the period you were regressing. [Or put another way, the contribution of TSI was larger during the baseline period than during the period you’re regressing over.]
SRJ (Comment #105119)
I think you miss my point completely or perhaps you do not understand the two parts of AIC testing.
In AIC you use both the number of parameters and maximum likelihood function. I know that AIC in that form cannot be used for hypothesis testing. From the links below you can readily see how AIC and likelihood ratio testing are linked. In the case of F&R they could have used hypothesis testing with the likelihood ratio test since the models used by F&R are nested (lag 0 to lag x) or the number of parameters are unchanged (lag xi to lag xj). In effect the only difference is that AIC pools the results and F&R selected the best model from that pool without compensating for the multiple hypothesis testing that that represents.
What I am saying is that under certain conditions, such as the case for F&R, I can test the models either by likelihood ratio (hypothesis testing) or using AIC and the calculations are nearly equivalent since the only change in parameters is from lag 0 to lag x. The only difference is that with AIC the results are pooled and the best model selected, but without penalty as would be applied to the near equivalent hypothesis testing.
I think the reason for not seeing any issues with these differences in the literature is because the claims made AIC are different than hypothesis testing. AIC makes no claims other than given the difference in parameters in the models compared that one model of potentially many attempted fits the data best. AIC does not claim to deal with the number of models compared while hypothesis testing does or at least can with an adjustment like Bonferroni.
http://en.wikipedia.org/wiki/Akaike_information_criterion
“In the general case, the AIC is
AIC =2k-2ln(L)
where k is the number of parameters in the statistical model, and L is the maximized value of the likelihood function for the estimated model.”
http://en.wikipedia.org/wiki/Likelihood-ratio_test
“In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which (the null model) is a special case of the other (the alternative model). The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other. This likelihood ratio, or equivalently its logarithm, can then be used to compute a p-value, or compared to a critical value to decide whether to reject the null model in favour of the alternative model. When the logarithm of the likelihood ratio is used, the statistic is known as a log-likelihood ratio statistic, and the probability distribution of this test statistic, assuming that the null model is true, can be approximated using Wilks’ theorem.
In the case of distinguishing between two models, each of which has no unknown parameters, use of the likelihood ratio test can be justified by the Neyman–Pearson lemma, which demonstrates that such a test has the highest power among all competitors.[1]
Each of the two competing models, the null model and the alternative model, is separately fitted to the data and the log-likelihood recorded. The test statistic (often denoted by D) is twice the difference in these log-likelihoods:
D = -2ln(likelihood for null model) +2ln(likelihood for alternative model)
The model with more parameters will always fit at least as well (have a greater log-likelihood). Whether it fits significantly better and should thus be preferred is determined by deriving the probability or p-value of the difference D. Where the null hypothesis represents a special case of the alternative hypothesis, the probability distribution of the test statistic is approximately a chi-squared distribution with degrees of freedom equal to df2 − df1 .[2] Symbols df1 and df2 represent the number of free parameters of models 1 and 2, the null model and the alternative model, respectively. The test requires nested models, that is: models in which the more complex one can be transformed into the simpler model by imposing a set of constraints on the parameters.[3]
For example: if the null model has 1 free parameter and a log-likelihood of −8024 and the alternative model has 3 degrees of freedom and a LL of −8012, then the probability of this difference is that of chi-squared value of +2•(8024 − 8012) = 24 with 3 − 1 = 2 degrees of freedom. Certain assumptions must be met for the statistic to follow a chi-squared distribution and often empirical p-values are computed.”
lucia (Comment #105122)
October 21st, 2012 at 5:20 pm
“SRJ–
But I think you can regain aspects of ‘hypothesis testing’ by comparing the AIC of the case with no lag with the AIC with lag– and computing the AIC “with lag†adding ’1′ to the number of coefficients. Because the lag is a coefficient and to some extent, the cases ‘with lag’ are using 1 more coefficient that the base of ‘no lag’. This is true even if someone thinks of the no lag case as having the same number of parameters as the lag case. If the ‘no lag’ case has a better AIC than ‘lag’ when AIC is computed with +1 added to the number of parameters, that sort of replicates the notion of ‘significance’ test.”
I know someone would have to custom-add the parameter count to R– but it should be do-able.”
F&R does not include the lags in the parameter count in their AIC calculation. I have included them for GISS, didn’t change the chosen lags.
But for the cases where one of the lags were 0, i did still include it as a parameter in the AIC.
I did this instead of AIC():
(2*(10+3)-2*logLik(zfit$model))
If one of lags considered are 0, with your suggestion this should change to:
(2*(10+2)-2*logLik(zfit$model))
etc, for 2 or 3 lags being 0. This can easily be implented with a few if statements to the loop code.
Kenneth,
I think we will have do agree to disagree.
Just one remark about AIC vs. likelihood test. Even with a constant number of parameters, they are different:
“If all the models in the candidate set have the same number of parameters, then using AIC might at first appear to be very similar to using the likelihood-ratio test. There are, however, important distinctions. In particular, the likelihood-ratio test is valid only for nested models whereas AIC (and AICc) has no such restriction”.
http://en.wikipedia.org/wiki/Akaike_information_criterion
Lucia,
I just added these lines to the AIC loop in F&R code:
if (sum(lagvec==0)==3) {k <- 10}
if (sum(lagvec==0)==2) {k <- 11}
if (sum(lagvec==0)==1) {k <- 12}
if (sum(lagvec==0)==0) {k <- 13}
It is not elegant, but it works to give the number of parameters as the 10 from the regression plus the nonzero lags.
I then calculate AIC as:
(2*k-2*logLik(zfit$model)
For GISTEMP, the best fitting lags are still 4,7,1 – just as in the paper.
Then the question is, is the AIC for that choice really better than for e.g. 2,6,1?
Thanks SRJ–With respect to the hypothesis testing. I know that with nested models, ‘significance’ for an added parameter always corresponded to a better AIC when I’ve done it for ice prediction fits with a zillion candidates (all *marginally* but none based on any actual phenomenological theory.) So, I was interested in seeing if they did have better AIC than lag 0. (Obviously, not interested enough in seeing it myself.)
I do know from just sending randomly created data with known ARIMA(p,0,q) through a subroutine that finds the ‘best’ ARIMA(p,0,q) with p<4, q<4 on using AICc, that for short periods, we get lots of mis-identifications-- that is incorrect p and q. Lots. And those are cases where I know the data was generated using ARIMA. And of course, the consequences of mis-identification are worse than those of getting the parameter estimates wrong.
So, I know I have qualms about mis-identification when someone has a fit with ( time, mei, solar, volcanic, intercept, ar , ma, lag1, lag2, lag3.) (OK... I know tamino doesn't use ARIMA to get the ar, and ma, but he gets them afterwards.) In that respect, I know where Kenneth is coming from! But it does seem to me that if AIC is the chosen way, we just discover if a lagged value gets a sufficiently better AIC to keep it relative to not keeping it. Can you mis-identify? Sure. Absolutely.
(FWIW: I tend to suspect solar *is* mis-identified. I suspect that solar does not have a simple lag and that if we had a long, long, long string of data where the only forcing varying was solar, we would find that a temporal lag simply doesn't work. It's just unphysical. Given the way the solar forcing behaves, if you had replicate samples, I think you'd find different lags for different lengths of analysis windows! To some extent, the same is true for volcanic eruptions. But because volcanic eruptions tend to be quick events with aerosols injected and then dropping out, it might look kinda-sorta- ok in a regression. These might be fun to explore with synthetic data--- but I'm doing something else now. 🙂 )
SRJ (Comment #105136)
“If all the models in the candidate set have the same number of parameters, then using AIC might at first appear to be very similar to using the likelihood-ratio test. There are, however, important distinctions. In particular, the likelihood-ratio test is valid only for nested models whereas AIC (and AICc) has no such restrictionâ€.
SRJ, once again you miss the point. Indeed AIC is not restricted to nested models, but in the case I am citing the models are either nested or have the same number of parameters and thus the AIC and likelihood-ratio tests are very nearly the same.
Do you judge that F&R could have properly used likelihood-ratio testing on their models? I know that is what I’ll do when I run their code. In that manner I’ll be able to determine whether the F&R lags selected are significantly different than the other candidates.
Lucia,
By the way, it just occurred to my why F&R not use the lags for computing AIC. The fitting is done in a subroutine lagfit.
It takes a dataframe and a vector of lags as input. From this it creates lagged versions of the appropiate columns in the dataframe. Then lm() is called on the dataframe with the lagged variables.
So the lags are not used directly in the regression, they are used to make lagged versions of the variables prior to calling lm().
In that respect it makes sense to use AIC without the lags as parameters.
But when testing a number of lags, one might want to include the lags in the calculation of the AIC. It appears that doing this do not to change best fitting lags.
That’s what I was curious about. It’s only comparing “no lag” to “lag” that I was wondering about.
I just wanted to follow up on my comment about residual solar forcings.
If you start with unanomalized data, the entirety of the annual cycle (“seasons”) comes from solar forcing, and there is a lag between solar forcing and peak temperature for a zonal band, but it varies with latitude.
If you start with anomalized data, you’ve supposedly removed that annual cycle, but in reality you haven’t, for reasons that people can argue over. I can demonstrate this by computing the spectra for e.g., HADCRUT versus GISTEMP:
Figure.
Note that the amplitude at 1 yr-1 is within a factor of two of the peaks associated with ENSO—that is the residual annual component from solar forcing is nearly comparable to that from ENSO.
Also note as an aside that the spectral peaks for GISTEMP and HADCRUT are very similar in frequency, but GISTEMP seems to produce estimates for magnitude that are systematically lower than HADCRUT… I would speculate this is due to the greater surface coverage of GISTEMP. The multiyear peaks are due primarily to atmospheric-ocean oscillations, and increasing the total land coverage, esp. Antarctica, “dilutes” the effect of the oscillations relative to HACRUT.
I agree with Lucia’s comments about it not being a simple lag. Physically one might expect an interaction term between TSI and ENSO (and one between ENSO and volcanic forcings). While neglecting these may be OK, it’s not clear how you can come to that conclusion without at least considering the order of magnitude of the corrections introducing by modeling the problem more correctly.
Wrong link. Right file.
Carrick,
http://rankexploits.com/musings/2012/pinatubo-climate-sensitivity-and-two-dogs-that-didnt-bark-in-the-night/#comment-105142
and
http://rankexploits.com/musings/2012/pinatubo-climate-sensitivity-and-two-dogs-that-didnt-bark-in-the-night/#comment-105149
Weird.
BillC, just more proof that great minds think. 😉
One other follow up comment is there’s no reason to expect the multidecadal scale response of climate with respect to TSI to be invariant, if for no other reason than ice albedo changes affect the relationship.
SRJ (Comment #105144)
I have just now started to go through the F&R code and the relevant code here is:
for (lag1 in 0:24){
for (lag2 in 0:24){
for (lag3 in 0:24){
lagvec = c(lag1,lag2,lag3)
zfit = lagfit(zfeed,lagvec,ndx.beg)
if (AIC(zfit$model) < bestaic){
bestaic = AIC(zfit$model)
bestlag1 = lag1
bestlag2 = lag2
bestlag3 = lag3
}
It appears that the lagfit function is applied and then the best AIC score is retained after each trial comparison and that way you finish with the one best AIC score and the others calculated are not retained.
Note that the "for" statements are incremented from 0 to 24. The no or 0 lag is treated just as the other lags are. It is very much the same as doing a likelihood ratio test using muliple hypotheses except here F&R compare the model score under test to the previous highest model AIC score, i.e they do not use probability in their comparison. Lag is not counted as a parameter but rather you have the mei, volc, solar,tau and the four parameters for AC of the temperature, tau, I believe, as f1.cos,f1.sin,f2.cos,f2.sin.
The last sentence in my post above should have said:
Lag is not counted as a parameter but rather you have the temperture, mei, volc, solar, tau and the four parameters related to tau as f1.cos,f1.sin,f2.cos,f2.sin.
Question: Is the variable, tau, in F&R a time constant?
Tau is a vector with the time variable, with 1990 as zeropoint.
It is defined in the beginning of the script.
The parameter for tau, given by lagfit, is the estimate of the linear time trend.
It is possible to count the lag as parameters in the AIC calculation, so when one lag is 0, you have 10+2 parameters. See my comment
#105138
It does not change the result of the lag choosing.
No.
SRJ (Comment #105172)
I am more interested in what F&R did then you do after the fact. I have run my modified F&R code for GISS for 0 through 10 lags and collected the AIC scores. I need to look at likelihood ratios for the best candidates next and then the same for the other temperature series.
First I really should confirm that I selected the same lags for the best AIC scores as F&R did.
I am going to post a link to the top 10 AIC scores for the five temperature sets used in F&R along with a brief discussion. I wanted to have the likelihood ratio test tabled with the AIC scores but I found I’ll have to do more modifying of the F&R code to do that. Actually I have to take apart the lagfit function to obtain the lm inputs required in the correct form.
Meanwhile, I now think I understand what tau represents in the model. What exactly do the 4 sine and cosine functions of tau represent? I can guess at this point in my understanding of the model, but if someone here does, I would appreciate a reply. Also can anyone link to models that are similar to F&R in using the sine and cosine functions of tau?
I have linked below the top ten AIC scoring models with the lags for the three variables listed for all 5 temperature data sets and a summary of the top AIC scoring model from F&R. At this time I only considered lags from 0 through 10.
First I can verify, from the tabled results, that I obtained the same top score for the lags for all 5 data sets as obtained by F&R. Secondly, it is apparent that the top ranked AIC scores would be regarded as not showing a lot of separation in most typical AIC score evaluations. Finally the differences from data set to data set with regards to lag become clearer when looking at AIC scores in the top 10 rather than simply looking at the top score, and particularly so for the TSI variable.
Looking for statistically significance differences between the top AIC scoring models with the likelihood ratio test as I intend to do will further resolve the top ranking models.
I should also note that my analysis here does not mean that I endorse the F&R approach but only given that approach what will more detailed analyses reveal.
http://imageshack.us/a/img545/140/toptenaicfr.png
Kenneth Fritsch:
Are you sure you’d have to take the function apart? I’d think if nothing else, you could dump whatever you wanted into a variable with just a couple lines inside the function then read the variable after the function finished.
They’re there to represent seasonal cycles not fully removed by anomalization.
By the way, you should try substituting the various lags you extracted into the later F&R code. The AIC scores are so close there’s little reason to pick one set over another. Or if you want to be bad, try using lags from one data series in another (though don’t use satellite lags for surface temperature ones or vice versa). Incidentally, your table shows it’s kind of cheeky for F&R to say:
Their AIC scores show they cannot realistically conclude solar forcing has a one month lag as opposed to a ~7 month lag.
By the way, while I agree with F&R that anomalizing temperature data does not remove all seasonal effects, I think it’s kind of crazy to “remove it” by including four extra parameters in a model with nine parameters. The extra parameters are going to force the model to fit better than it should.
I would think any leftover effect from the seasonal cycle should be constant across all analyses. I would not think the lags we pick for forcings should change our view of anomalization. Despite that, these is the components calculated for seasonal effects with the lag F&R used for GISS:
This is with lags of 4/8/7 (the AIC score for which is less than 1% lower):
I would think the correct approach to deciding anomalization doesn’t fully remove seasonal effects would be to apply a correction to the data (or to go back and redo the anomalization with a modified approach). I would not think the correct approach would be to throw a bunch of parameters into a model and allow them to fluctuate by 50%.
I don’t know that it actually matters, but it seems incredibly opportunistic to add extra parameters in a model when they can fluctuate like that.
Brandon Shollenberger (Comment #105228)
Brandon, thanks much for the explanation of the sine/cosine variables. I was thinking overfitting myself and I think that point can be tested.
I found that one single statement was all I needed to obtain the information that I required to do a likelihood ratio test. The problem is that the likelihood test from R that I used (library(lmtest) lrtest function) requires a nested model. Otherwise the degrees of freedom are 0. I can obtain chi square statistic values from the function but I do not know how to assign the proper degrees of freedom.
As an alternate test I am using the ks.test (Kolmogorov-Smirnov Test) in R to compare the fitted.values from the lm function for the top AIC scoring model to the 9 below it and reporting p.value. So far this would appear to indicate (assuming what I am doing is proper) what you say is correct. The test did force to me look at the individual fitted model results and they do not vary much within the top 10 models for each temperature data set. I might have to look at lower ranked models than the top 10.
Hi Kenneth
Nice work.
I would be interested in seeing how you have applied the lrtest() function, and also your code for the Kolmogorov-Smirnov-test
Brandon, a better approach might be to simply low-pass the temperature series (and TSI) before regressing the two series against each other. I think I mentioned that above as a means to avoid the residual seasonal component.
You’ll still “hit the wall” though because temperature response to TSI doesn’t obey a simply linear relationship to TSI (nor lagged TSI).
I am going to use log likelihood in manner I do not see it used in the literature, but required, in my mind, to handle the indirect way F&R handle changes in their model. I think that the log likelihood ratio between 2 models is calculated the same regardless of the parameter differences. My problem with the R function lrtest is that the script looks for a direct change in parameters between the models that are being compared – which, of course, is zero using the direct approach. In actuality, we are changing 3 parameters in the lags of the three lagged variables and since I could construct a model using the lags for the best scoring AIC model as a null, i.e. no lags, and then consider the changes to lower ranking models as adding a lag parameter to each of the 3 variables. That provides me with degrees of freedom for the chi square statistic that I can derive from the lrtest function in R. From there I can calculate a p.value for the probability of the change in lags being statistically significant.
I cannot get that probability by comparing the fitted.values and/or residuals of the models from the lm function using the ks.test function. That comparison, while not applicable to the significance of adding parameters or changing parameter values, is of interest when one wants to look at truly how much the change affects the end result of the model. It is surprising how little the end result changes from some of these parameter changes even when the change itself can be shown to be highly significant.
Kenneth Fritsch:
No problem. I was thrown off by the same part of the code.
Remember, don’t just look at the results. Also look at the calculated coefficients. The more those vary, the more choice of lag matters. If the results are affected less than the coefficients, that suggests the model is creating (somewhat) spurious results.
Carrick:
I’m not sure that’d make sense for their modeling. If nothing else, wouldn’t you need to do the same thing to all series rather than just those two?
Brandon, if you wanted to model the response of the long-period forcing component of TSI, it makes sense.
For example, 2-box models can be reduced to:
T(t) = a0 + c_S F_S(t) + c_L F_L(t)
where F_S(t) and F_L(t) are exponentially filtered versions of the total forcings with constants tau_S and tau_L.
That said, for any process with a lag that is less than one year, you really need to break it up into zonal series.
It seems that history needs to repeat itself. The posts and commentaries in this series are well worth the read.
Carrick:
If it were just between temperature and TSI, I’d agree. But if you apply a filter to some data series but not others, I don’t think the model makes sense. That’s especially true since you’d be filtering the series everything is being modeled against (temperatures).
In effect, you’d be modeling two series with high and low frequency data and a series with low frequency data against a series with just low frequency data. That’s what doesn’t make sense to me. I could see filtering data to examine low and high frequency components separately, but I think* you have to be consistent with what you’re looking at for a model like this. As in, you can look at high, low or both, but you can’t mix those choices.
*I’m, at best, an amateur so I could easily be wrong.
Brandon, if you want, the low-pass filter is just a poor man’s low frequency climate response function. It wouldn’t makes sense to apply it to e.g. ENSO34 index, because by definition it’s zero (ENSO34 index is detrended).
In practice, all of the long-term functional dependences are different (TSI, ENSO, aerosols, etc) because their latitudinal dependence is difference.
Anyway it’s a lot messier that F&R made it out to be to get right.
The other comment I wanted to make was one most of us realized when we were discussing the (linked) two box model stuff–and that is you probably shouldn’t attempt to use the two-box model to predict climate sensitivity.
Carrick,
“Anyway it’s a lot messier that F&R made it out to be to get right.”
Agreed. But will you use all the good work you have put into zonal variability and response to ENSO to produce something better than F&R? Clearly there is room to make improvements and point out the folly in much of F&R. If not you, who? If not now, when? 😉
In the link below, I have added 3 columns to the table I linked previously showing the top ten ranked AIC scores from F&R for the 5 temperature data sets for changing the number of lags on the variables MEI, AOD, TSI. The columns headed LL and KS represent the probabilities (assume <= 0.05 for rejecting the null hypothesis) for the null hypothesis that changing the lags from the top rated AIC model to any other model ranked below it does produce a better model (LL) and for the null hypothesis that the residuals (fitted values minus observed) of the top AIC rated model and any of the models ranked below it come from the same population(KS). My probability estimates were determined as I described in previous posts on this thread.
The table results show that not many of the top ten models for various lags produce statistically significant different results. The results also show that when comparing the top ten models we see different patterns deriving from the different data sets. What may not be particularly relevant to this analysis is the comparison of regression residuals, but I found it interesting in a general way in showing that the statistical test for adding/changing a model parameter can be significant while the resulting residuals (and fitted values) are very little different. I agree with Brandon that showing the changes in the model coefficients could be instructive and I might do that next.
I do not recall whether F&R used the AIC rating when comparing the final 9 variable model to a reduced number of variable models. Does anyone here know? That might be something that could be readily accomplish with slight modifications of the F&R code. The modifications of the F&R code for making the estimations presented in the linked table are listed below. I do not show the F&R code that leads up to point of my modifications. The entire F&R code is linked here with the following caveat from Tamino:
"Unfortunately wordpress won’t allow me to upload a zip file. So I’ve pulled a little trick. I renamed the file “allfit.zip†to “allfit.xls†in order to fool the wordpress software into believing that it’s an Excel file. You will need to rename this file, back to the name “allfit.zipâ€, in order to unzip it and access the programs and data."
http://tamino.files.wordpress.com/2011/12/allfit2.xls
http://imageshack.us/a/img515/7632/frlagprobs.png
Here is the code that I should have included in my previous post:
#Interchange giss,ncdc,cru,rss,uah
library(lmtest)
lrtest
AIC1=matrix(data=NA, ncol=4, nrow=1330)
zfeed = data.frame(uah,mei,volc,solar,tau,f1.cos,f1.sin,f2.cos,f2.sin)
ndx.beg = findstart(t,1979) # beginning index (at t=1979)
n=0
for (i in 0:10){
lag1=i
for (j in 0:10){
lag2=j
for (k in 0:10){
lag3=k
lagvec = c(lag1,lag2,lag3)
zfit = lagfit(zfeed,lagvec,ndx.beg)
AIC1[n,1] = AIC(zfit$model)
AIC1[n,2]=lag1
AIC1[n,3]=lag2
AIC1[n,4]=lag3
n=n+1
}
}
}
Ord=order(AIC1[,1])
T10=(AIC1[Ord,])[1:10,]
lag1=T10[1,2]
lag2=T10[1,3]
lag3=T10[1,4]
lagvec = c(lag1,lag2,lag3)
Z1 = lagfit(zfeed,lagvec,ndx.beg)
Zlm1=lm(Z1$model)
Res1=as.vector(residuals(Zlm1))
Gis10R=rep(NA,10)
GisRatio=rep(NA,10)
DegF=rep(NA,10)
for(i in 1:9){
lag1=T10[i+1,2]
lag2=T10[i+1,3]
lag3=T10[i+1,4]
lagvec = c(lag1,lag2,lag3)
ZX = lagfit(zfeed,lagvec,ndx.beg)
AX=AIC(ZX$model)
ZlmX=lm(ZX$model)
ResX=as.vector(residuals(ZlmX))
Gis10R[i+1]=ks.test(Res1,ResX)$p.value
LLRatio=lrtest(Zlm1,ZlmX)
Chi=as.numeric(na.omit(LLRatio$Chisq))
DF=0
if(abs(T10[1,2]-T10[i+1,2])!=0){DF=DF+1}
if(abs(T10[1,3]-T10[i+1,3])!=0){DF=DF+1}
if(abs(T10[1,4]-T10[i+1,4])!=0){DF=DF+1}
GisRatio[i+1]=pchisq(Chi,df=DF,lower.tail=FALSE)
DegF[i+1]=DF
}
TT10=cbind(T10,GisRatio,Gis10R,DegF)
Carrick, in that case it really doesn’t make sense to me. If you low-pass the temperature series, that should remove most, if not all, of the effect from ENSO. If you then regress ENSO against it, you’re necessarily going to get a weak response. I don’t see how you can hope to do a quantitative analysis of forcings if you first remove part of the signal you’re interested in.
If you do a low frequency regression to see what it shows, how would you interpret the results if something with no low-frequency signal is said to be have an effect?
Kenneth Fritsch, I looked at this sort of thing a few months back. I’ll see if I can find what I did and try to get back to you tomorrow. I’d do it today, but I’m being dragged out to a bar. Woe is me!
Actually Brandon, what I was saying to do was low-pass the temperature series and TSI before regressing.
You wouldn’t do that for ENSO.
(This is pretty common practice by the way: That is, selecting out the frequency band you’re interested in before computing correlation coefficients.)
I understood that Carrick, but I don’t see how that helps any. F&R’s model fits everything at once. That means you can’t regress TSI against a low-passed temperature series and regress ENSO against the full series.
I could see how what you’re suggesting would work if one used a different approach to modeling the data all together. I just don’t see how it could work with their model.
Unfortunately, I didn’t take that detailed of notes when I looked at this topic before, so I’d need to redo a lot of my earlier work to say much. That said, here are a couple thoughts.
First, HadCRUT and RSS are different from the others. One easy example is their calculated lags for TSI are notably different than the others. All five data sets give similar lags for TSI if you only look at the top model, but once you start going through more models, you see RSS and HadCRUT give more consistent results. The rest flip between ~1 and ~7. It’s interesting to consider why that could happen.
Second, there is a notable pattern that shows up in parameters. To demonstrate, I went ahead and extracted the calculated fits for the top 10 models using the GISS set. The linear trend (tau) varied by less than 5% across all of them. Every other parameter fluctuated by more. ENSO’s parameter fluctuates by as much as 10%. Volcanic, a little less. Solar, twice as much. Personally, I’d worry when the parameter I’m testing for varies by less than any other.
As a simple way of testing my concern, I would check how fluctuations in parameters line up with one another. As it happens, when one parameter is assigned more weight, other parameters will respond in (fairly) consistent ways. For a quick demonstration, here is what happens when I compare the percent fluctuation for fits for Solar, f1.cos and f1.sin (using NCDC).
Solar is black, cos is green, and sin is red. As you can see, any time solar is given more weight, cos’s weight is increased and sin’s is decreased. The same holds true for f2.cos and f2.sin. And to show it’s not just some correlation between solar and seasonal cycle, here is a comparison of linear trend (black) and solar (blue).
My interpretation of what I’ve seen is F&R’s search for the “right” lags is largely irrelevant. It’ll reduce the “noise,” but otherwise, it won’t change anything.
Quick correction. I believe the graphs above were generated with the GISS data, not the NCDC. I’ll have to check which variables got set to what later to be sure. It shouldn’t matter since as far as I can tell, similar things happen in all the data sets, but I should still make sure to get it straight.
By the way, the more I think about it, the more unimpressive the F&R paper is. You can get largely similar fits just by putting a temperature, ENSO, volcanic and solar series in a data frame with a linear parameter then running the lm command in R.
Brandon, I looked at the regression model coefficients for the top ten AIC scoring models for lag variations for all 5 temperature data sets and I saw nothing different than what they produced ,i.e. very similar regression residuals and fitted values. The coefficients varied little from model to model and all were well within the 95% confidence intervals.
I went on to look at the differences between 0 lag models and the best scoring AIC with its lags and here I see very significant differences in adding the lags for all 5 data sets. The p.values for the five data sets for the differences in the residuals of the best AIC scored model and the 0 lag model varied from approximately 0.22 for the satellite sets to 0.74 to 0.99 for surface sets. While the p.values for the satellite results where low the null hypothesis that the residuals came from the same population could not be rejected.
I am ready to leave the analysis of the lags from the prospective of what F&R did and what further analysis of what they did reveals. I conclude that the F&R model approach requires lags on MEI, volcanic and solar variables depending on temperature data set, but that the number of lags required is unclear when comparing the statistical significance of the change in lags in the compared models. That result would appear to me to beg for some a prior evidence for plugging lags into a model and not selecting/sorting as F&R did.
Looking at the best scoring AIC models for the 5 data sets showed me that some of the model coefficients for the 4 sine and cosine variables (seasonal effects) include zero when calculating the lower and upper CIs for the coefficients. Those were for GISS: f.2 cos; for NCDC: f2.sin; for HadCRUT: f1.cos and f2.cos; for UAH: f2.sin and f2.cos. All other best scoring AIC model coefficient CIs did not include zero.
I think I’ll take a quick look at an incremental AIC scoring of the variables in the F&R model.
First of all be lenient with me, I’m french and my interest in climatology was first political even if I have some scientific background. I found this blog because it is recommended by Stephen McIntyre and I really appreciate the way you talk about Global Warming, the very good explanations you give and the courteous tone you use in your exchanges.
II dare posting a comment here because even if David Rose cherry picked the starting date it is true it has been said that a 15 years period was statistically significant to say something relevant about a trend. It is also true that for the layman I am the catastrophism is the rule in the media with the support of the scientific community, the Met Office in particular.
Having say that I have some questions for you, and I would be very pleased if you could answer them or give me some clues.
On the first question you say there is a warming, a slight warming but a warming. How can such a small warming can be statistically significant ? If the null hypothesis is : “warming stopped during the last 15 years†how can you reject it with such a wide uncertainty (if I understand the meaning of the whiskers)?
Concerning the models I fall into the same problem: the error margin is so wide that in fact we cannot predict anything at all and what is the usefulness of such models ? I have another question linked to this one. On wikipedia it is said that the IPCC agrees on this CO2 forcing formula : delta(F) = 5.35 ln (C/C0) W / m^2 
and a sensitivity value of 0.8 K/(W/m^2) giving a 3 degree of warming for a CO2 concentration doubling starting with C0 value of 300 ppm.
Using GISS temperature record and NOAA CO2 concentration history I get a prediction of 1.13 C increase since 1961 when the real delta is 0.4 C. I know I maybe ill interpreted the way to use these data but in both case such information on Wikipedia is disastrous for climatology.
I also would like to know what you think about roman remnants found in places revealed by the melt of alpine glaciers ? In other words why climatology does not use more often archeological findings ? Don’t you think these studies have been overlooked ?
I hope I have not been too boring.


Tibor
The warming from that start date is not statistically significant. That’s not the same thing as there being no warming. Warming does become statistically significant from earlier start dates (and likely some later ones also.)
If that was someone’s null, you couldn’t reject it. But the question is: Why should that be anyone’s “null” requiring any sort of rejection? Strictly speaking one can pick anything as a null and test it. Selecting it as a null doesn’t mean anyone has to believe it. In this case, it seems rather likely that some have picked the “theory” that “warming stopped” based on looking at the data. When that is done, you can no longer test the theory based on the same data you used to develop the theory.
That isn’t quite so. The error margins are wide– but narrow enough to permit people to predict something. But I think it is wise to identify the limits of models.
I think you may be confusing the equilibrium sensitivity with the an instantaneous sensitivity. There is a response time involved. If the CO2 doubles in one day, it takes many, many years for the temperture to reach the amount corresponding to the doubling of CO2. The temporal component of this issue is greatly argued. But one thing is true: you can’t estimate the amount of temperature rise from 1961-now based on the CO2 rise from 1961-now. It just doesn’t work that way.
Thanks a lot Lucia, I’ll dig further on the last point but your explanation is greatand I should have thougt of that.
On the second point, you are also very clear.
Cheers.
I am not convinced by the answer to the first point but it is also a question of belief. (concerning the use of likely)
If I may, on the last point, have you a pointer to understand the time it takes to reach the equilibrium for a given increase ofCO2 ?
Sorry I have the answer, have a very nice day.
Kenneth Fritsch, I feel kind of stupid now. I decided to test my belief the choice of lags (as long as they’re reasonable) doesn’t have much influence. To do so, I used this code:
Setting zfeed equal to each of the five data sets showed a fairly consistent picture for temperatures. With that concluded, I decided to try removing the seasonal parameters (set the range to 1:5). Again, there was nearly no change.
Next up, I decided to see which parameters had how much influence. To do so, I changed the range to exclude one parameter at a time (keeping the seasonal parameters excluded). Then I excluded two. Then I excluded all of them. That’s when I realized the reason F&R’s linear trend is so constant is their model amounts to nothing but:
The range 349:732 is to pull out only the data F&R looked at (I think I got the range right). With that one line, I get basically the same result F&R got. That means the F&R paper is basically just fitting a linear trend and then removing “noise.” I don’t know why it took me this long to realize that. I mean, I’ve practically said that’s what F&R did.
Brandon Shollenberger (Comment #105463)
Brandon, what I have done is compare the highest AIC scoring model from F&R (their final model) and compared that model to best scoring AIC model I can derive by removing a variable from the F&R best model. In all cases except where I was testing lags I went back to find the optimum lags for the reduced variable model. I then did a log likelihood test (lrtest in R) and a Kolmogorov-Smirnov test (ks.test in R) to compare, respectively, the model being significantly degraded by removing the given variable and whether the residuals from the two compared models could have come from the same population. I have explained the null hypotheses for these tests in previous posts on this thread. The higher the probability for the KS test the less likely the reduced variable model was affected overall. The lower the probability for the LL ratio test the more likely the reduced variable model degraded the model fit.
The results are summarized in the table linked below. The results show that while one can legitimately rationalize the variables used in the F&R final model by AIC scoring or better log likelihood testing that does not complete the picture of what effect that variable has on the overall model fit. As you noted, Brandon, the Tau variable or trend has an overwhelming effect on the model fit.
While there are variations depending on which of the 5 temperature data sets is used for modeling, the variables: Lag, Seasonal (tested all for sin/cos variables together), Solar and Volcano do not significantly change the overall fit when removed from the best model. MEI shows significant or near significant effects when removed from the best model with the HadCRUT, RSS and UAH temperature data sets.
I think that F&R were primarily attempting to show that if all the variables affecting temperature are included in a model the recent warming shows an upward nearly straight line trend and the differences between temperature data sets can be reconciled. I judge that on more detailed analysis that picture is not so clear and particularly when looking at the differences in the KS testing of the overall effect of a particular variable between temperature data sets.
Lag appears more important for the fitting of UAH and RSS satellite data sets. The NCDC data set appears rather unique in requiring a seasonal adjustment. The HadCRUT data set needs a solar adjustment more than the other 4 data sets. UAH appears to need a volcanic adjustment more than the other 5 data sets. GISS appears to fit considerably better without an MEI adjustment than the other 5 data sets. All data sets need the trend adjustment provided by using the Tau variable.
It is surprising to me that evidently subtle differences between temperature data sets can dramtically affect the modeling requirements. It would appear to me that facing these differences head-on might be more productive use of these analyses than using them for making broader more general statements.
I am also pondering whether the F&R model (without considering completely different alternative approaches to their modeling) handles the ARIMA aspects of these temperature series. F&R actually is attempting to partition the trend part of the series and the remaining part of the series are merely interfering noise. I would think there should be more direct ways of accomplishing that end.
http://imageshack.us/a/img685/1709/frvariableremoveeffect.png
Kenneth Fritsch, my computer crashed on me, so I lost what I was working on. However, what you say is true. The inconsistencies across data sets should suggest a problem to examine. Instead, they’re effectively hidden by the lag fitting. It winds up being little more than a method for the “optimum squiggle removal.”
As for what you say about autocorrelation, that aspect has been bugging me. Wouldn’t the fact the data is autocorrelated affect the regression?
Brandon, or anyone else still reading this thread, I have read through the F&R paper in detail and I think understand and can replicate their calculations with one exception. It appears to me that the major claim of F&R is that their modeled trend, as estimated by the variable tau in the model, reduces the CIs for the trend slope by approximately 1/2 for the 5 temperature data sets versus the trend estimated directly by regression of the temperature on time.
The standard errors for tau in F&R are calculated directly from the model, while the standard errors for the trend derived by direct regression of temperature versus months from 1979-2010 are corrected for autocorrelation using the correction:
sqrt(1+2p1/(1-p2/p1)), where p1 and p2 are the first and second coefficients of the acf of the regression residuals.
I keep getting standard errors for tau and the direction regression slope that are nearly equal. I would look harder for my error here and I may have made one, but I also do not understand why tau and the corrected trend would have large differences in standard errors since the tau variable dominants the model, i.e. the other variables are making only small adjustments, and the R value for the direct regression is at a high value of 0.72 (for GISS). The F&R model response correlates with the observed temperature with a value of 0.83 (for GISS).
Kenneth Fritsch (Comment #105553)
I found my error in my caculations above and now replicate the standard errors that F&R calcualte for trend estimated from direct regression of temperature versus time and adjusted for autocorrelation for an arima(1,0,1) model.
But now I have the problem/question that the tau value (trend) standard errors from the F&R regression are not adjusted for autocorrelation -as is trend standard errors from the direct regression. Why would that be unless it can be assumed that the F&R model does that? But how would it make a compensation? Obviously, the estimated ses for tau are from the coefficient for tau in the F&R model, but I do not see how that would handle autocorrelation.
Kenneth–
To determine the “true” standard errors in F&H, now that you have the method they used to get the taus &etc. Why don’t you
a) generate synthetic trends with the fitting parameters, noise etc. they claim fit. I know you need to generate “arima sym” errors, but I think you also need to generate extra errors for the “volcanic”, “mei” and “solar”. You should be able to do this under the assumptions he used to calculate them. If he just used a multiple regression in “R”, then if the correlation parameters are stated, you should just be able to generate an “mei” with the correct correlation– and so on. (Hmm… might take some thought though. You want the time series for volcanic and mei to look something *like* the time series for mei and volcanic eruptions.)
b) use their method to find the ‘m’. Do it fully– as if you don’t know the tau, coefficient on mei etc. Then get the ‘m’ from the regression.
Repeat 100 times. Check to see if you get the sample mean ‘m’ matches what you used to create the series. See what variability in trends you get. Either their error bars are correct or they aren’t. But they are supposed to be the variability in trends for something with that noise property if you didn’t know them in advance.
There are some details to be thought out. But at least initiall, you can just use simple functions for mei / volcanic / solar– maybe make them red or arma(1,1) but correlated with temperature. Then see what you get.
lucia (Comment #105559)
Lucia, I think you suggestion is a good one and particularly for me, since my computer and I like the brute force approach (do not do emoticons). I’ll need to consider how best to do it.
I have just now calculated the residuals from the F&R model as applied to the GISS data set and determined that an arima(1,0,1) model provides a very good fit (from the acf). The F&R SE correction is about 2, while that for the direct regression is about 3. If I applied that correction of 2 directly to the tau coefficents from the F&R model, the tau SEs would be slightly larger than those for the direct regression. I am not sure with multiple variables how to apply the correction.
Kenneth Fritsch, am I understanding you correctly that the error margins are larger for the fitted values than they are for a simple linear regression?
That seems… weird.
Brandon– It can happen though.
lucia, I’m not completely surprised at something like that happening. What I’m mostly surprised by is the idea that would happen, and F&R would publish their results as they did.
Brandon, I am unsure how to exactly adjust the tau coefficient using the model residuals and the F&R adjustment assuming a fit of an arima(1,0,1) model to the model residuals. I would think, since tau dominates the model, most of the adjustment would go to the tau variable but perhaps not all.
I am surprised that F&R does not deal explicitly with this issue when discussing Table 1 in their paper where they compare the SEs for the direct regression, which is adjusted for autocorrelation of the residuals assuming an arima(1,0,1) model, to the SEs determined for the tau( trend) variable in their model which is presented without adjustment. I found that the model residuals fit an arima(1,0,1) model (for the GISS data set) very nicely and using the F&R formula for autocorrelation I obtain a correction factor of 2 for the SEs. As it turns out the F&R correction for an arima (1,0,1) model is very sensitive to the ratio of the first and second acf (p1 and p2) coefficients for the residuals and not so much the absolute values of those coefficients.
At this point I do not see why the tau coefficient SEs from the F&R model are not adjusted for arima(1,0,1) autocorrelation.
I am posting what I plan to do with regards simulations to determine the CIs for trend in the F&R model here and more or less as a place holder.
I plan to use the the F&R model lags and coefficients for the GISS temperature data set to adjust the GISS observed temperature by subtracting out the Mei, Solar and Volcano effects and then using that adjusted data to find the proper arima model to be used for simulation. I can include the trend and even the F&R seasonal parameters (if required) in my over-all model. I’ll run the simulations 10,000 times on this final model and determine the CIs for the trend. I plan to do the same with the model derived using the observed GISS data and the trend.
I also plan to use this approach to test the F&R model on the early time period of F&R data not used in their modeling. I’ll remove the solar variable – as this data is not avaialable over the entire time period – for my comparison of the 2 early/late time periods.
Kenneth Fritsch (Comment #105610)
“At this point I do not see why the tau coefficient SEs from the F&R model are not adjusted for arima(1,0,1) autocorrelation.”
F&R do adjust the coefficients for both the simple linear regression and for the multiple regressin (what you call the F& Model).
It is done with this code in the case of simple linear regression:
rate.giss = xfit$coef[2]; rate.giss
se.rate.giss = summary(xfit)$coef[2,2]; se.rate.giss
q = acf(xfit$res)$acf[2:3]; q
theta = q[2]/q[1]; theta
nu = 1+2*q[1]/(1-theta); nu
se.giss = se.rate.giss*sqrt(nu); se.giss
and this code does it for the multiple regression
####################################
# compute correction to std.err.rate
####################################
q = acf(zgiss$res)$acf[2:3]; q
theta = q[2]/q[1]; theta
nu.giss = 1+2*q[1]/(1-theta)
nu.giss; sqrt(nu.giss)
coef.giss = zfit$model$coef; coef.giss
se.coef.giss = summary(zfit$model)$coef[,2]; se.coef.giss
se.giss = se.coef.giss*sqrt(nu.giss); se.giss
I checked the calculation, and my numbers match table 1 exactly. So F&R do correct their SE’s for autocorrelation in both the simple and the multiple regression
SRJ (Comment #105660)
You are no doubt correct here as I am nearly through with my calculations on this matter. What I saw was the coefficent for tau and the CIs for that coefficent which were the same as reported in Table 1. In my approach above, I can replicate the model regression and will report my findings soon.
Here they are for GISS in degrees C per decade : trend slope= 0.1709, SE uncorrected= 0.000653 and SE corrected =2.004*0.000653 or 0.0131. F&R reported an SE corrected of 0.016 and a trend slope =0.171.
SRJ: What correction factor did you obtain for the arima(1,0,1) autocorrelation. I obtained 2.004. I wanted to see whether the difference is in the uncorrected SE or in the correction factor used.
Next I plan to do my simulations for the simple and F&R regression and estimate CIs from there to compare to those calculated by F&R.
For F&R’s multiple regression of GISTEMP the correction factor is:
sqrt(nu.giss) = 2.00428
The uncorrected SE for tau is:
7.937048e-04
The corrected SE for tau is then:
sqrt(nu.giss)*se.uncorrected =
2.00428*7.937048e-04 = 0.001590807
with rounding to 2 digits this is exactly what F&R reports.
SRJ: I went back and reviewed the F&R method for deriving SRs for tau and I see how they did it. I was looking at coefficient CIs that were already adjusted for autocorrelation and did not realize it.
I’ll have to reconcile why my uncorrected SE CIs are smaller than F&R’s. In my approach I simply subtract out from the GISS observed temperature anomalies all the F&R model variable effects except for the trend. I can duplicate the F&R trend slope but I obtain a smaller unadjusted CI.
I’ll see what I obtain from simulations.
I did my 10,000 arima.sim simulations using the ar and ma values found from arima and obtained a trend slopes and SEs for the GISS observed regression versus time of slope = 0.167 degrees C per decade and SE =0.027, while for the F&R model regression I obtained a trend slope = 0.171 degrees C per decade and SE=0.012. It appears that the F&R correction for autocorrelation for an arima(1,0,1) model under corrected the observed simple regression and overcorrected for the F&R model result.
In other words, the claim of reducing the variation with the F&R model is better than the authors claimed or least by way of Monte Carlo simulations. Of course, the other side of that coin is that if the SEs go sufficiently small, it becomes easier to show statistically significant differences between the satellite and surface temperature trends. The F&R authors pointed specifically to the difference of satellite and surface trends not being significant. I am curious now what that comparison will show when I Monte Carlo all the temperature data sets and look at the differences between the combined surface and combined satellite data.
Kenneth:
Good, but to clarify you created runs with
ar=?
ma=?
known slope of m=?
Found slope using… what ? lm(y~time)? And multiple regression?
Because I’m not sure precisely the difference between what you mean by “F&R model regression” vs. “GISS observed regression “. Is the former fitting with a multi-regression package in R and the latter a simple regression package with R?
lucia (Comment #105695)
The ar and ma coefficients were 0.831 and -0.412 and the sd was 0.146 for the residuals from the simple regression of GISS observed versus time, while for the F&R regression the values for ar, ma and sd were respectively, 0.647, -0.343 and 0.118.
My simulation was from this code generally:
Arm=arima.sim(n=Len,list(ar=0.8307, ma=-0.4115),sd=SD) +as.vector(time(Gobs)*Slope+Intercept)
MatN[i,]=Arm
}
Q1=rep(NA, 10000)
for(j in 1:10000){
Q1[j]=coef(summary(lm(MatN[j,]~time(GAD))))[2,1]
}
CIM=mean(Q1)
CIL=quantile(Q1,probs=0.025)
CIU=quantile(Q1,probs=0.975)
The data used to determine the ar, ma, sd, slope and intercept for the simple regression against the monthly time period 1979-2010 were derived from using the GISS observed data, doing a lm function and then fitting the residuals to an arima model.
In the case of the F&R regression I obtained the starting point data (as I would using the observed GISS data) by subtracting out the effects of the mei, solar, volcano, and 4 seasonal variables from the observed GISS data. You can do this using the F&R data and the coefficients from the F&R model and then anomalizing all the data for the 1979-2010 period. I can essentially, in this manner, reproduce the result I see in the F&R paper in graphical form. I then handle this series in my simulations in the same manner as I did for simple regression with the observed GISS data.
I suspect that the differences I see in the CIs derived from my simulations from those that the F&R found are simply due to the limitations of the correction equation used by F&R for autocorrelation.
I actually found that a better fit (but not a lot better) to the residuals is from an arima(2,0,0) trend stationary model and I might simulate that model also.
My AIC scores for arima models (1,0,1) and (2,0,0) trend stationary were for the simple regressions -549.67 and -549.67, respectively, while for the F&R regression these scores were -601.03 and -603.83, respectively. The arima(2,0,0) model has ar1 and ar2 coefficients for the simple regression of 0.422 and 0.253 and for the F&R regression these values were 0.284 and 0.172.
Using the arima(2,0,0) model for my 10,000 simulations produced for the simple regression a trend slope = 0.167 and CI = 0.024, while the F&R regression produced trend slope = 0.171 and CI = 0.0115 with all units in degrees C per decade.
From these calculations you can see that the arima(2,0,0) model gives smaller CIs at the margins. I next want to look at the UAH observed with the simple and F&R regressions and then look at the trend slope and CIs of a difference between GISS and UAH from the F&R regression.
Per my previous post on this thread I adjusted the GISS and UAH observed temperature series for the period 1979-201 using the results of the F&R multivariable regression. In effect the F&R adjustment reduces the variability of the trends in the observed data and affects the value of the trend slope itself little. I calculated a difference series (GISS adjusted – UAH adjusted) and calculated a trend slope for the difference and then CIs in the manner that I did for the series reported on this thread previously. For the 10,000 simulations I used the best arima fitting model which by AIC scoring was arima(0,0,2) trend stationary and the ma coefficients were 0.2287 and 0.1372.
The difference series had an estimated trend slope = 0.0301 , lower 2.5% CI = 0.0138 and upper 97.5% CI = 0.0468 – with all units in degrees C per decade. Since the CI range does not include 0 we can say that, counter to the point made in the F&R paper, the recent warming in the lower troposphere is less than it is at the surface when we compare by difference the adjusted GISS and UAH temperature series. As time permits I intend to compare other combinations of the 5 temperature data sets used in F&R.
I had time to look at several combinations of differences between the F&R adjusted GISS, NCDC, HadCRUT, RSS and UAH temperature series as noted in my previous post on this thread. Below I report the results as the trend slope (TS) and the CI 2.5% and CI 97.5% confidence intervals as well as CI 5.0% and CI 95.0% in a couple of combinations. All units are in degrees C per decade.
Two-sided null hypothesis that the trends are not different
GISS-UAH: TS = 0.0302; CI 2.5% = 0.0138; CI 97.5% = 0.0467
NCDC-UAH: TS = 0.0343; CI 2.5% = 0.0131; CI 97.5% = 0.0558
HadCRUT-UAH: TS = 0.0293; CI 2.5% = 0.00913; CI 97.5% = 0.0491
GISS-RSS: TS = 0.0143; CI 2.5% = -0.00125; CI 97.5% = 0.0296
NCDC-RSS: TS = 0.0182; CI 2.5% = 0.00072; CI 97.5% = 0.0359
HadCRUT-RSS: TS = 0.0133; CI 2.5% = -0.00231; CI 97.5% = 0.0292
(GISS+NCDC+HadCRUT)/3-(UAH+RSS)/2: TS = 0.0231; CI 2.5% = 0.00732; CI 97.5% = 0.0392
One-sided null hypothesis that the surface series trends for GISS and HadCRUT are not greater than the RSS trend
GISS-RSS: TS=0.0143; CI 5.0% = 0.00097; CI 95.0% = 0.0271
HadCRUT-RSS: TS=0.0133; CI 5.0% = 0.00018; CI 95.0% = 0.0265
From the foregoing we have strong counter evidence to the F&R paper claim that the lower troposphere warming cannot be shown to be warming significantly slower than the surface – using the F&R adjusted temperature series. How much of the difference, we see here between the results of F&R’s methods and mine, is due to differencing the series and how much due to the way the CIs were calculated can be determined on further analysis.
There is no doubt that the F&R adjustment of the temperature data sets reduces the variability within these series such that small differences in trend slopes can be determined to be statistically significant. I am not convinced that the F&R approach is entirely correct or proper. I need to go back and determine what happens when I use the same lags for all the surface data series and while different from the surface sets the same lags within the satellite sets. I am not certain that a seasonal adjustment is required for all data sets either. I also need to see the results using the F&R model on the data they supplied prior to the start (1979 and few months prior to account for the lags used) of their model. The comparison would be without the solar variable – as monthly data for that variable are not available going back in time.