As some of you know, I’m trying to convert to using R as my base plotting tool. It has some advantages over Excel, but I have to admit that I’m not always sure I have done the exact analysis intended. Today’s plot is an attempt to show:
- The least square trend fit computed using Ordinary Least Squares, shown with ±2 σ where σ represents the standard error in the mean trend computed using red corrected ordinary least squares.
- The best fit trend computed assuming residuals from the trend with time are ARMA(1,1). The illustrated line only showed the trend with its associated ±2σ standard error in the mean trend as spit out by R.
All fits are to GISSTemp data. Before over interpreting this: I am sufficiently uncertain about the ARMA(1,1) fit and especially how to interpret the ±2σ uncertainties from the ARMA(1,1) fit, that I am planning to run some monte carlo with synthetic data to verify that what I do really, truly does on average reproduce the mean trend from a batch of synthetically generated data and gives appropriate uncertainty intervals.
But.. in the mean time, anyone who wants to read the skanky parts of the Rscript and tell me if they can tell straight off whether I am using R correctly to fit data under the assumption of ARMA(1,1) and whether I am correctly fishing out the stated uncertainty in the trend, have a look. Let me know! (If possible, for today, please limit advise to using R as a black box rather than suggesting more complicated analyses that I could do after I become an R master.)
Script: First bit from Kelly O’Day; last bit by me.ReadGISSTempR_ARIMAFit

The confidence curves should be curves and not straight lines that cross in the middle, I think. A confidence range of zero at the center implies that there is no uncertainty in the intercept of the fit. There should be two sets of curves too, one for the line and one wider for the data (prediction interval). Dunno how to do this in R, though.
The relevant equations for confidence and prediction intervals are here.
What the curves should look like are here.
Dewitt–
That would be true if I was trying to show uncertainty bounds for the individual data.
The real problem is trying to show uncertainty in the trend (i.e. best fit slope) on that sort of graph. Ideally, I would just show this with a bar and whiskers graph. The only thing I intend to show is the uncertainty in the mean trend.
The uncertainty in the estimate for the intercept is important if we are creating a projection, and are the uncertainties in the monthly data points. But I’m not trying to compute or show right now and my purpose in creating this graph is not to forecast. (I know lots of people use ARMA to forecast, but that’s not what I’m trying to do.)
(Maybe I’ll have to add traces for all those other things to the graph just so I can get people to disregard it and then focus on only bit I’m an trying to understand in R today. The bit I want to understand: Have I interpreted the uncertainty in the slope, i.e. trend, correctly. That is:
y=mt + b + ui
right now, I only want know if what I did really gets “m” and the uncertainty in “m”, when we do assume u is AR(1,1).
Dewitt–
Right now, my only question is: Did my computation permit me to correctly compute equivalent of the “criss cross” lines on the graph above.
I understand what the other stuff is– but clearly, I can’t correctly construction the more complete stuff unless I have the “criss-cross” ones correct. So.. my graph only contains the stuff required to present my question.
Re: lucia (Feb 22 15:52),
If you calculate the curves then you’ll be able to tell if the criss-cross lines are inside or outside the bounds. The calculation for OLS should be easy. There are multiple questions for ARMA, though. First, is twice the standard error really 95% or do you need to correct for reduced degrees of freedom from the autocorrelation. The same then applies to the number of data points in the square root. You should have all the sums of squares for the rest of the calculation.
Dewitt–
At least with OLS, you compute the curved lines based on the criss-cross lines, not the other way around.
The ARIMA question: My impression is that R is supposed have already taken account of the correct number of deg freedom when it reports the standard error in the mean trend. So, my question has to do with what R does.
I’ll be running monte carlo tomorrow, so I guess I’ll know that way. But if someone ran R a lot and had specific knowledge about this– I’d like to know. (I don’t for example know if the standard errors are “known” to be under or over estimated with ARIMA the way lag-1 autocorrelations are know to be biased for small samples etc. That’s why… if anyone volunteers stuff like that, I’d like to know so I can find it faster.)
Try something like Igor or Origin first. They are much better suited to preparing plots and fairly good on simple statistics. Igor has the slower learning curve but IEHO it is more powerful.
Eli–
R can beautiful plots. I have no idea why you are suggesting Igor or Origin. I’m asking a very specific question and it’s neither “please suggest a statistics package” nor “find me something to make pretty graphs”. If you don’t know the answer to the question I ask, that’s fine. But R seems suitable for what I’m doing.
lucia (Comment#70442),
.
I suspect that Eli believes you ask the wrong questions pretty much all the time…. which perhaps explains his frequently orthogonal responses. 😉
Igor’s also got a $600 price tag.
Where’s the contest?
I don’t know the answer, but I’ll steal it when you post it next. 🙂
ARIMA is on my ‘to-do’ list.
Ron–
I’m pretty sure what I’m doing is correct. It’s just there that documentation for R varies in the way you expect for something free, and also, there are a few pages discussing ‘issues’ about interpreting trends for ARIMA in R specifically.
I’ll be running the monte-carlo today because I need a module to at least verify it does return the answers I seek.
Lucia
I suggest you submit your question to the R Help forum. I find that many of the R package authors track the forum to monitor questions about their packages.
link
You will get responses from professional statisticians.
“The best fit trend computed assuming residuals from the trend with time are ARMA(1,1).”
Lucia, my post here is not to attempt to answer your question here about the statistics and R as I judge you want some expert opinion.
I was at first confused by your use of ARMA(1,1) on the trend residuals, but I see that the residuals from the trend line, in effect, detrends the series and thus the middle differencing term from, for example, ARIMA(1,1,1) is not required. You have a combined AR(1) and MA(1) model of the trend residuals.
I would have the same questions as DeWitt had about showing the 2 sigma limits.
Unsolicited, but I have found that most temperature series (I have been working with all the observed data sets and many of the climate models and some proxies) fits a ARIMA(0,1,2) model. Some other poster made this same essential claim on these blogs not too long ago. ARIMA(1,1,1) also fits most series well. If the series, or a part of a series analyzed, shows no trends then the middle differencing term is not required and models like ARIMA(1,0,0) and ARIMA(0,0,1) can fit the data well.
What I have learned by my work to date is that analyzing series not detrended can lead the trend to greatly influence the value of AR1.
Kenneth–
R lets you specify a trend and/or differencing. There are some R pages discussing a certain level of “confusingness” in how on should interpret output based on which you pick and what R considers to be “the intercept”. (The person who wrote the blog post was a faculty member teaching students R, and wrote to the person who wrote ARIMA. Evidently, they have differences of opinion on whether or not the organization of ARIMA is “confusing”.)
I’m dealing with which fits looks “best” as a separate issue.
For the analysis I want to do, I need a trend in the test. This will be true whether the analysis concludes we would reject the trend for a particular temperature data set.
I’m not sure what you are saying here. Are you saying that:
If you don’t remove the apparent trend and analyze as if the data are ARMA(1,0,1) with no trend, then you will get a different numerical value for the lag 1 autocorrelation than the one you get if you first remove the apparent trend? Yes. Of course.
Are you saying something else?
lucia (Comment#70455)
Yes, of course.
But deeper than that I wonder whether if AR1 could cause an apparent trend in the series and that by removing the trend before measuring AR1 one is getting a wrong estimate. Also if the AR1 calculation is affected by a trend (not removing it) then is the correction for degrees of freedom using AR1, calculated without removing a trend, correct?
Of course, I am sure this is all clear if one thinks about for awhile. AR1, I would guess, is whatever it is for the series measured. The dependence of a result on previous ones does not matter whether it is by way of a trend of some other phenomena. It is just that I think there might be confusion for amateurs like myself that might misconstrue a trend effect on AR1 with something else.
Lucia, please consider the source here, but is not a bar and whisker plot used to show deviations from a normal distribution and outliers? Your graph above is showing some very normal distribution limits.
Also do not modelers use and attempt to minimize the RMS error when evaluting the model fit to the observed data? What you want to do, I think, is determine what are the odds that by chance your model and the observed results will produce the (small) difference measured. Monte Carlo it and then do the theoretical work.
Ken–
Yes. If the “real” trend is 2C/decade, and the “apparent” trend is only 0.0638 as indicated by OLS then the deviation from the “real” trend– manifesting as a too low trend– is likely due to some huge auto -correlation or something. Likewise, if the real trend is 0 C/decade, but the apparent trend is 0.0639 as shown by OLS, then the positive trend is, likely the result of autocorrelation.
Depends on what statistical model you are using….
I think sometimes there isn’t just one answer for every possible application. You need to read through the full methodology you want to apply to do whatever it is you ultimately want to do. Usually, the “right” answer depends on what the question is.
On your 2nd and 3rd questions– you seem to be making some asumption about my end goal here and are asking questions based on that assumption. But I actually don’t know what you are assuming I am doing, and so I don’t understand quite what you are asking me (or even why).
lucia,
IIRC ARIMA modeling with integration order not equal to zero in R is tricky. It doesn’t deal with trends (a constant term in the first difference) correctly. There’s a tutorial page somewhere that explains the workaround. Again, IIRC, you difference the data before calculating the AR and MA coefficients and constant. I also strongly disagree that an integration order of one in any way shape or form is an accurate model for temperature time series. It may give you a good fit in the short term but it’s useless for prediction and trend testing as it will very likely overestimate the variability and make trends not significant. Fractional integration would be better. Unfortunately, the fArma package is incomplete. It doesn’t calculate statistics on the fit parameters, last I checked.
Dewitt–
When you write “IIRC” do you mean “If I Read Correctly” or something else?
Yes. I’ve seen the tutorial page warning about using R with differencing, trends etc. That’s the page I was referring to when I wrote
Could you elaborate what you mean when you write this
(For what it is worse, I am not fitting trends to forecast. I’m not sure precisely what method you think will over-estimate variability etc.)
“But I actually don’t know what you are assuming I am doing, and so I don’t understand quite what you are asking me (or even why).”
I’ll go to R now and look at what ARMA does and how it produces confidence intervals for a trend. I would do better at this point in my understanding by reading what you do next and looking at how non normal residuals of a trend affect the uncertainty of the trend value.
Re: lucia (Feb 23 15:26),
IIRC = If I Remember Correctly. I didn’t bookmark the page and I’m too lazy to try and find it again.
That would be an ARIMA model (a,1,b) where a and be are arbitrary integers. In other words a Unit Root. Is that sufficient to remind you of my extreme dislike of the existence of unit roots for temperature time series? The tests are applied to data that very likely has non-linear deterministic trends. That’s a no-no IMO.
I’m pretty sure that
a) I’ve asked the R list and what I’m looking up is *supposed* to give the uncertainty intervals based on theory.
b) Before xmas, I did some monte-carlo and think they might, nevertheless be too small.
Since I may either misremember on both points, I’ve got a list of “to dos” and one is to run the monte-carlo and check. I have a coupld of other to-dos’ but I’m gearing up to run the monte carlo currently, and I’ll report.
You won’t know the big picture of *why* am an testing out certain modules in R or what the ultimate goal is until later– because it’s just way to difficult to explain without examples. And I can’t show the examples until I’m sure I am using R correctly. So.. for the time being, if a question sounds like a question about R as a black box, that’s what it is. I have to use some data– but the fact that I might use GISS, or Hadley etc. is due t that data being handy. My pushing that data through that box to find a specific number isn’t meant to imply anything in particular.
( I could show graphs with uncertainty interval for GISS computed suing AR(3,0,1), with some coefficients fixed at zero– this happens to be the case with the best Akaike critierion from 2001-2011. And it happens to give teensie-beensie error bars which could ‘suggest’ that we know the trend very, very well. What would that graph mean? No idea– yet.)
Dewitt
Are you referring to cases where people have applied tests to data from 1900-now to try to test the long term trend?
DeWitt, do you have some examples with of FARIMA producing better fits than ARIMA with temperature time series where you can show the reduced fitting error.
What I have found is that various ARIMA models can be fit to the same data and with basically the same (small) error. What I have read is that if differencing (in ARIMA) is required to make the series stationary, sometimes AR and MA parameters are required to compensate for postive or negative autocorrelation produced by differencing.
If FARIMA can fit better it would allow me to reduce those errors further – that seem very limiting and at times small using ARIMA. I currently have been attempting to compare observed, modeled and proxy temperature time series for fits to an ARIMA model.
Does fractional differencing truly get around the existence of non linear trends in series?
Obviously if one knows whether the series has a linear trend (or a number of linear trends with breakpoints) or a non linear trend all one needs do is remove the trend and then do the ARMA modeling. The key is to know what kinds of trends exist in the series before removal. Also would not, for purposes of model fitting or calculations, the matter be what best approximates the fit? If as Lucia is doing in her analysis, she is looking at a short time period and even a non linear trend can look linear over a short span.
Re: Kenneth Fritsch (Feb 23 16:15),
That’s the problem with fArma, it doesn’t give you the statistics and I don’t know how to do it myself. I have a project on the back burner to use FARIMA to generate pseudo-proxy series for the Mann-O-Matic. Mann only used AR(1) and then only with a fairly small coefficient. A quick and dirty experiment says that you get hockey sticks at a far higher percentage with FARIMA noise than with AR(1) and a lot of the proxy series generate a fractional difference between 0.1 and 0.5. That also minimizes negative AR coefficients. Does that mean anything? Dunno.
Probably not.
Re: lucia (Feb 23 16:00),
Yes. Starting with Beenstock and Reingewertz.
The link below would appear to bear on what we are discussing in this thread.
http://www.iemss.org/iemss2004/pdf/volatility/kalltren.pdf
“The trend is assumed to be a slowly varying
deterministic component caused e.g. by human impact like global warming. Its detection is dificult since it might be superimposed by natural variability also present on large time scales. In an innovative approach a polynomial trend component and a stochastic model part are combined. With the stochastic model long-term and short-term correlations in time series data are considered. A reliable test for a signicant trend can be performed via three steps: First, a stochastic fractional ARIMA model is fitted to the empirical data. In a second step, wavelet analysis is applied to separate the variability of small and large time-scales, assuming that the trend component is part of the latter. A comparison of the overall variability to that restricted to small scales results in a test for a trend. For the analysed series no significant trend could be found under the assumption of the models presented. The extraction of the large scale behavior by wavelet analysis provides valuable hints concerning the shape of the trend.”
Steve F: Basically yes, although given Lucia’s lead off
————————
I’m trying to convert to using R as my base plotting tool.
————————
Maybe not so much
You use a plotting program to plot and a statistics program to do statistics. You can pass data between them. Unfortunately the best plotting programs are expensive. So are good socket wrenches
Eli:
I think Lucia is looking to produce turnkey scripts that people can download and run, without the assumption they have several thousands of dollars to dump into proprietary software that isn’t fully multiplatform (e.g. runs on MAC,WINDOWS and LINUX).
As to what makes sense, it very much depends on context whether Igor would be a good choice or not. (For many EXCEL does what they need it to do.)
DeWitt, thanks for the reply as I need these inputs for my analysis and comparisons of observed, modeled and proxied temperature series. I believe FARIMA, of which then ARIMA becomes a special case of the general one, is applied to series that might or do contain long term memory. Is not there another method called NARMA to be used for series with non linear trends?
At some point these methods get beyond my immediate comprehension and then become a dangerous black box for the likes of an amatuer like myself. I have found that getting down and dirty by applying these methods does get you away from, at least, some of the black box approach.
Test to see if my email is moderated. ( I had a disk crash… so I’m off line for a while.)
Eli is sure dumb for a smart guy.
But since he has Igor maybe Lucia can pass him data and he can be her ‘plot boy’
plot monkey? plot slave?
steven mosher
February 24th, 2011 at 10:57 am
Eli is sure dumb for a smart guy.
But since he has Igor maybe Lucia can pass him data and he can be her ‘plot boy’
plot monkey? plot slave?
plot bitch
Semi-off topic bleg:
I would like to know what 30-year period since 1880 showed the greatest warming trend, and what the amount of warming was during that 30-year period.
If somebody can find that information in two or three mouse clicks, or can point me to where I can see it without eyeballing, I would be greatly appreciative.
Many thanks
Tom,
It depends a bit on which surface temperature record you use. For NCDC its 1974-2003 with a trend of 0.190 C per decade. 1916 to 1945 has the highest trend during the early part of the century, but lags behind the modern period a bit at 0.156 C per decade.
Tom,
As a follow-up, here is an Excel sheet with all the 30 year trends for NCDC, GISTemp, and HadCRUT3v: http://www.2shared.com/document/j12Uj9js/30_year_trends.html
Thanks, Zeke. Would you like to offer an opinion on the status of my wager with Joe Romm last year? I took .15 this decade…
Tom,
What were the parameters of the bet? That the 1991-2020 trend would be 0.15? Or just the 2010-2020 trend?
Also, because its a neat visualization, here are the running 30-year trends for each major temp series: http://i81.photobucket.com/albums/j237/hausfath/Screenshot2011-02-24at93929AM.png
Hi Zeke, it was 2010-2020. Obviously, I took the under. I’m saving all my quarters in a Joe Romm box, but would obviously hope to use them to clean my dirty laundry instead. Sort of a moral equivalence kind of thing…
Tom,
For what it’s worth, it looks like the multi-model mean (A1B) predicts a warming trend of 0.223 for the 2010-2020 decade. If I may ask, how did you two agree on the number of 0.15?
That number would be on the low end for models in the upcoming decade, but even besides that, consider what was projected for the 2000-2030 trend (~0.23). If one believes that the deviation between this past decade’s observed trend and the average model projection is the result of weather/noise (as presumably Joe Romm does), oscillating around some mean that will balance out over, say, 30 years, even a decade’s warming of 0.2 from 2010-2020 would put that 30 year trend seemingly out of reach. It seems to me like you could have pushed Joe up to at least 0.2?
Regardless, I suppose if you believe this same deviation was primarily due to models strongly overestimating the warming, you may be confident in the 0.15 bet 🙂
Hi Troy,
The figure was set by Romm. I was ticked off because it was so low–where’s the courage of his convictions, anyhow–but it looked to me that some natural cycles were changing to exert downward pressures in this decade, so I thought I’d take a chance.
I’m just trying to catch a downward tick in the sawtooth waveform.
Tom Fuller (Comment#70531) February 24th, 2011 at 1:30 pm
“The figure was set by Romm. I was ticked off because it was so low–where’s the courage of his convictions”
.
Say what? When you consider the courage of his convictions, you may want to consider his moderation policy. He is a coward. I offered him a bet once as well. First he responded with outrageous terms. When he figured out I wasn’t very well known, he said he had no interest in a bet with me. It is all politics for him.
DeWitt, the library(fracdiff) in R seems to be able to return statistics for fitting a ARFIMA model.
http://rss.acs.unt.edu/Rdoc/library/fracdiff/html/00Index.html
Alternatively could not you find the best fit, d, differencing parameter with a fracdiff function and then take that fractionally differenced series, determine the best fit for ar and ma parameters in ARIMA with arima(p,0,q) and then do the statistics on that model result?
Re: Kenneth Fritsch (Feb 24 16:13),
Dunno, haven’t tried it. As I said, that project is on a very slow back burner. If it were that easy, though, it would probably have been implemented if fArma.
Lucia
You can simplify the GISS data retrieval code to just 2 lines of script by sourcing my RClimate.txt file. Here is the shortened version:
source(“http://processtrends.com/files/RClimate.txt”)
dfm <- func_GISS()
You could then substitute HAD, NOAA, RSS, UAH after func- to get the other 4 major series to test your ARMA model on those series.
As an alternative,you could download my CTS.csv file which has GISS and 19 other monthly climate time series in one file. Here is the link.