The ARMA(1,1) noise correction.

In today’s post, I’m going to describe how I correct for the serial correlation in my recent comparison of multi-run means to observations of surface temperatures from Hadley.

Data are only approximately AR(1).

To estimate uncertainty intervals in a computed trends, we must first assume we know something about the time series that generated the noise. One of the ways we learn something about the time series is plotting the correlogram.

When trying to diagnose whether a time series is “AR(1)” or “ARMA(1,1)”, it’s convenient to look plot the log of the lagged correlations as a function of lag. In both cases, when we have an infinite number of data points, the Log of the lagged correlations will fall on a straight line. For AR(1), the zero intercept will be zero, for ARMA (1,1), the intercept will be non-zero.

A plot of the log of the lagged correlations for differences between Hadley and the model mean of the A1B runs computed using data beginning in 1950 is shown below:

Examination of longer time series does indicate data look like ARMA(1,1) or the sum of a ‘red’ (i.e. AR(1) ) noise and a ‘white’ noise process. (Read earlier post for discussion of the similarities.)

Just treating nonlinearities in the signal results in uncertainty intervals that are too large, ignoring the possibility that noise is are ARMA(1,1) rather than AR(1) results in uncertainty intervals that are too small. So, as we move away from simpler models that neglect both effects, it makes sense to alter the method to account for both factors rather than just the factor any individual “side” might prefer to include.

Earlier, I discussed how I deal with non-linearities in the signal. Today, I’ll discuss how I specifically deal with ARMA(1,1).

Formula for estimating the effective number of samples.

The method of adjusting the confidence intervals from ordinary least squares is to compute the effective number of samples, Neff based on the known number of samples in the time series, N and then substitute Neff for N the formula to compute the uncertainty in the trend.

The method to estimate Neff based on N was discussed in previous blog posts with the formula for estimating the number of effective coefficients provided in equation (4) here. ). That post discusses the shape of the correlogram for a process that is the sum of a “red” noise process and a “white” nose process. The method is based on knowledge of two coefficients: 1) The lag 1 correlation for what I think of as the “red” noise part of the process ρR,1 and 2) a coefficient I denoted as α which accounts for the “white” noise part of the process..

I’ve run a few monte carlo cases generating time series with known ρR,1 and α. It appears that if we know those values, using that formula results in a method where the stated confidence level is reproduced. (I haven’t run tons–but those I have run indicate the method would work.)

Unfortunately, when we are dealing with finite amounts of data. So, we must estimate the values of ρR,1 and α.

Ideas for how to estimate.

There are many possible ways to estimate the coefficients. Since our goal is to perform a t-test rather than do a curve fit to predict next years temperature, our main goal is to come up with a method that estimates ρR,1 and α which when used with, and equation (4) in the previous blog post results in a method that that creates a 95% confidence intervals when we claim we have a 95% confidence intervals. We’d also like it to be simple.

We can test whether we have achieved our goal by running some monte-carlo analyses.

I tried a number of simple methods based on either

  • Estimating ρR,1 and α based on ratios of the first three lagged correlations of the observed process.
  • Fitting a linear regression through the log of the first three or four lagged correlations.

Testing by running monte-carlo, I found the following method achieved the required goal– at least near the values of ρR,1 and α suggested by the data:

  1. ρR,1 is computed as the maximum of either a) measured correlation at lag 1, b) the ratio of the second to the first lagged correlations, c) the ratio of the third to the second lagged correlations.
  2. α is computed as the ratio of measured first lagged correlation and ρR,1.

I tested this method with synthetic data for 10 year processes and 30 year processes with coefficients near those exhibited by the data and found the confidence intervals are pretty good. ( That is: running with 64,000 samples, the confidence intervals I got were consistent with the method returning correct confidence intervals– or being very slightly too tight.) I tested other methods like using the linear fit to log transforms of the correlograms, and those gave confidence intervals that were too wide.

Summary

Basically, my method of estimating the uncertainty intervals is to assume the data the form of correlograms one expects for ARMA(1,1) processes, estimating the coefficients from the correlogram, and then applying a correction to estimate the effective number of samples.

The method appears to return suitable confidence intervals for time series with correlograms similar to those we are analyszing, I have is not tested it over a full range of all possible values of α and ρR,1. It’s worth noting that logically, the method must fail if the time series corresponds to an ARMA(1,1) case whose correlogram exhibts 1<α. That sort of series arises if we are analyzing data from a process that looks "red" when sampled extremely rapidly, but which is averaged over some time period that is a significant fraction of the integral time scale of the process. In this case, my method will result in uncertainty intervals that are always too large. ) For those with concerns the method cannot provide appropriate uncertainty intervals for shorter time series, I'll be happy to show results by computing the parameters ρR,1 and α staring in the ’50s, an apply those to the 9 and 10 year time series which are the shortest I discussed last week. The switch does alter the size of the uncertainty intervals– for one set of observations the uncertainty intervals get larger, for the other they get smaller, but (unless I find an error in my computations) the switch does not alter the “reject” or “accept” diagnosis for those time spans.

11 thoughts on “The ARMA(1,1) noise correction.”

  1. OK.
    It is interesting to be able to reject the models (at 95% confidence) as an accurate representation of reality. But I think a more interesting question is “what is the best estimate of the true climate sensitivity?” The models predict a temperature increase based on an average sensitivity of ‘X’; since the measured increase is less than that predicted based on sensitivity ‘X’, can we not estimate a climate sensitivity ‘Y’ (as a fraction of ‘X’) which represents a best guess estimate of the true climate sensitivity?
    .
    I know that an accurate calculation will be much more complicated than a simple ratio, but it would still be nice to have a first-order approximation.

  2. SteveF–
    Sure. That’s a more interesting question. What analysis do you propose one do to answer it?

    Sometimes the most interesting questions can’t be easily determined based on availaable data.

  3. Lucia,

    I hope you enjoyed the Moose lodge dinner (and that you are not texting from an iPhone or something). I made linguine with spicy clam sauce, a Ceasar salad, a nice French bread, and a bottle of J Lohr cabernet. Quite tasty, even if I had to clean up the kitchen.

    I do not like answering a question with a question, but sometimes it is helpful. What if the average model prediction sat dead center on the measured temperature trend with uniform scatter around the measured trend? Tamino and company would be very happy, and we would have to assume that the models represent a good estimate of reality. Since the models do not sit centered on the measured trend, can’t we calculate how much displacement of the average model trend would be needed to sit centered on the measured trend? In other words, if the models over-predict warming by 40%, is not a sensitivity of ~60% of the average model a reasonable first guess for the true sensitivity?

  4. SteveF–
    We can know the average model projected high. But we can’t know if that is due to sensitivity, improper conduction of heat to the lower levels of the ocean, improper applied forcings or whatever. So, I don’t think you can scale the average sensitivity by the amount of overshoot to estimate the true sensitivity.

  5. I think that it is good to correct the degrees of freedom to account for some noise hypothesis. However, I have begun to use an alternative approach.

    Consider a short timeseries with some memory. Points in the interior of the time series will share information with neighboring points. Points in the end of the series, will also share some information with its neighbors, but as they have fewer neighbors they hold a greater proportion of the total information in the series. Now suppose that you want to do a linear fit. The end-points in the series should constrain the fit more than other points. The consequence is that the optimal fit is no longer the traditional one which minimizes the least square residuals with equal weight over the entire series. In extreme cases (very long memory), the best linear fit would probably be the one which connects the end points.

    One method to finding the best fit is to estimate the error covariance matrix (C). That can be easily done for most noise processes simply by making a monte carlo. Then you find your model parameters (slope, intercept) by minimizing the misfit=residuals^T * C^-1 * residuals. One additional benefit is that it is possible to account for a non-stationary null hypothesis (e.g. old measurements are worse than new).

  6. Aslak–
    I think with least squares, the end points do influence the fit more than the interior points. Try this:
    1) find the best coefficients using least squares, look at the trend “m”.
    2) just change the magnitude of the data at the two end points. Recompute the trend, “m2”. Compare to “m”.
    3) Now go back to the original data. Just change the magnitude of two points near the center and compute m3.

    You should see that fiddling with the end points made a bigger difference than fiddling with interior points.

    The last part sound interesting, but I don’t understand what you mean by this:

    One method to finding the best fit is to estimate the error covariance matrix (C). That can be easily done for most noise processes simply by making a monte carlo.

    What do you do the monte carlo on?

    Do you have a reference so I can read what you mean?

  7. @Lucia: Sorry, I dont have a good single reference. My main references for this is papers by Mosegaard, Tarantola and Sambridge. Especially their work on “inverse monte carlo” which is basically a “markov chain monte carlo” method (the second name gives more search results).

    But inprinciple you can do like this (untested matlab syntax):

    N=length(timeseries);
    Nmc=1000;
    Cmc=zeros(N,N,Nmc);
    for ii=1:Nmc
    nr=noiseprocess(); %a noise realization
    Cmc(:,:,ii)=nr*nr’; %outer product
    end
    C=mean(Cmc,3);

  8. Aslak,
    “Consider a short timeseries with some memory. Points in the interior of the time series will share information with neighboring points. Points in the end of the series, will also share some information with its neighbors, but as they have fewer neighbors they hold a greater proportion of the total information in the series.”

    I’m not sure about this. It seems to me that causation requires that any point in a time series representing a process with ‘memory’ can only be influenced by trailing data points, not by future points. So all data points ought to be equal in the information they contain. If the process being described is a continuous one (where the first data point does not correspond to the beginning of the process), then pre-series data is of course not visible, but the early data points will still contain information shared with those pre-series data points.

  9. Aslak–
    Thanks. I think I generally understand how you are minimizing now.

    Right now, my goal is to just discuss uncertainty in the trend. But yes, it would be better to get the trend that minimize the uncertainty given the noise model. (That’s what I think you are doing. )

    Certainly, if we were trying to extrapolate based on a fit, that would seem advantageous.

    What I do know is that right now, I have a method where, under the frequentist definition, the 95% confidence interval rejects 5% of the time or less. It’s not perfect but it works. So, that gives a basis for evaluating whether a model mean is ok, or probably off.

    It might be that your method gives smaller error bars in general. In which case, it would have a lower “beta” error at the same “alpha” error, and then should be used instead of what I do. (That’s pretty much my rule when I am presented with two suggested statistical methods: Always pick the one with the lower type II error at the same type I error. That is– if you know which has the lower error!)

Comments are closed.