How does averaging affect the variance of GMST? (Boring Post 1 in series I promised)

Friday, I promised you all several boring posts, and now I’m going to deliver!

Today’s post will contain equations relating the variance of an averaged process to the underlying instantaneous process with exponentially decaying autocorrelation. The discussion will ignore the existence of the mean trend. (This simplification involves no loss of generality when later use the functions to relate statistical properties of monthly averages to annual averages.)

Because I haven’t installed proper math functionality at the blog, will not include any derivations, only results. But, if I don’t write up results, I won’t remember then 2 weeks from now when someone posts a comment asking me what equations I used in a spread sheet!

Continuous AR(1) process

Let us imagine that weather noise v(t), is a continuous stationary process of t, with mean zero (i.e. 〈v(t)〉=0), variance σ2= 〈v(t)v(t)〉 and autocorrelation described by the function ρ=〈v(t)v(t+s)〉 = e|-s/τ|.

(Note: 〈〉 brackets are used to indicate “ensemble average” of any quantity inside the brackets. So, 〈Y〉 is the “ensemble average of Y”.)

The parameter Ï„ is a time constant, describing a period of time during which the correlation between noise at time “t” and that at time “t+s” drops by a factor of e-1. In the case of AR(1) noise, the quantity Ï„ corresponds to something called the “integral time scale” for the weather-noise. A large time constant corresponds to weather that is, on average, quite persistent. So, for example, if Ï„ were very large — say 10 years– and the earth’s surface was unusually warm today, one would expect the surface to remain unusually warm for 10 years. In contrast, a a short time constant would correspond to weather varying rapidly. So, if Ï„ were equal to 1 hours, today’s relative warm might dissipate in an hour or so.

Variances of process averaged over time T.

Suppose an agency measures ‘v’ and reports monthly, or annual averages? How do the variances of these averaged measurements relate to the variance of the underlying process, σ2?

Well… if I’d properly installed math functions, I’d show you the derivation. But, instead, I’ll just define the meaning of symbols and show the result.

Let’s call the variance of the “v” averaged over “T”, σT2. If we average “v” in windows with block “T”, and variance of this quantity is related to σ2 as follows:

(1) σT2= σ2 [ 2 τ/T ] [ 1 + τ/T {exp(-T/τ) -1} ]

It is possible to show that this equation has the following correct properties:
1) σT2= σ2 as T/τ → 0.
2) σT2~ C /T, where C is a constant as τ/T → 0, as required for white noise.

Sneaking in just a little of the debate over testing the AR4 2C/century projection

Since the purpose of this exercise is to see test whether the AR(1) process Gavin considers “closer” to the models than the one I obtain by fitting data, I’ll now do a bit of fiddling to convert the process Gavin described into a monthly process. (Mind you, I’m under the impression gavin doesn’t believe the models are AR(1), but I still think it’s interesting to test whether the AR(1) process that he thinks closer to the models falls within the range of possible AR(1) processes for the data. )

So, let’s turn to the process Gavin described as being closer to representative of models, which Gavin described as follows

However, for the case that is closer to what I did with the models, I calculated 9000 7-year AR(1) time-series with an underlying trend=2 degC/century and with p=0.1.

After I asked him, Gavin said the he used a standard deviation of σYEAR=0.1C for this year.

To test whether Gavin’s AR(1) process reproduces “weather noise”, or trends consistent with data, I’m going to test the hypothesis that his entire process describes weather against data. However, to do the test, I must translate his process from annual to monthly.

If Gavin’s “closer” process were not averaged, and ρ described the autocorrelation for the continuous process at a 1 month lag, then to determine the time constant of the process prior to averaging, we would use:

(2)ρ = e|-s/τ|.

Evaluating for s=1 year and ρ =0.1 we get Ï„ = 0.44 years. (You can double check: 0.1 = exp(-1 year/ 0.44 years. 🙂 )

For this case, the variance of the annual averaged quantity is found using equation 1, and we obtain σYEAR2=0.53 σ2, which is to say, by averaging over a year, the variance of the “weather noise” drops by about 1/2 compared to the continuous process.

In contrast, if we average over a month, the variance of the “weather noise” will be σmonth2=0.92σ2, which is quite a bit larger.

So, it turns out that ifτ = 0.44 years, the ratio of the standard deviation of the monthly averaged measured value to the annual standard deviation is: σmonth/σYEAR= 1.3.

So, based on what you’ve seen so far, you might imagine that to translates Gavin’s process to monthly values, I should use σmonth= 0.13C.

But this won’t be quite right because, as I noted before, I found the time constant Ï„ by ignoring the fact that Gavin’s ρYEAR corresponds to an annual averaged process. This means I can’t use equation (2) to obtain a precise value of Ï„ for the continuous process. I need to do a bit more fiddling!

How far has this boring post gotten us?

So, far, I have an equation that lets me find σmonth that corresponds to σyear if I know Ï„. I have an approximate way to estimate Ï„ from the annual averaged values– but it happens that I know it’s not quite right.

So, in the next boring post in this series, I’ll describe how to find the Ï„ that matches the annual averaged process. I’ll might also discuss a few other issues related to using avearaged data instead of instantaneous data (including why being forced to use averaged data can be a pain in the ***)

After that: on to monte-carlo and hypothesis tests! 🙂

6 thoughts on “How does averaging affect the variance of GMST? (Boring Post 1 in series I promised)”

  1. Lucia,

    “Boring post.” I read it and I was trembling with excitement. “There must be some nasty stuff involved” I thought. No doubt, but alas I got stymied in the fourth and fifth paragraphs.

    What is “e♣-s/τ♣?” That is what is the “♣” supposed to represent?

    And what do you mean by “ensemble average of Y?” If Y is a time series Y(t), then the average is simply the average over the periods of the sample. “Ensembles” in statistical analysis are collections of either different realizations of a single data generation process or realizations of different DGPs, usually differing in the variation of the parameters. And the average of the ensemble would then itself be a time series with the averaging across the various realizations. So what is the difference between the ordinary average of Y — Sum( Y(t)| t=1 to t=N)– and the ensemble average?

    Sorry to be the dummy on the block, but someone has to.

  2. The are supposed to be || for absolute values. I guess those don’t show up on your browser. Does anything else break on your screen?

    Lucia

  3. On the ensemble average:
    “Ensembles” in statistical analysis are collections of either different realizations of a single data generation process or realizations …

    Yep. Ensembles are collections of…. And the “Ensemble average” of any measureable quantity “Y”, is the average over all possible realizations in the ensemble.

    So, yep, “ensemble” average and “time” averages are distinct.

    If I had equation writing ability loaded, I would explain more. You’d see both the ensemble avearging and the time averaging symbols together. But, until I get the equation writing, I can’t explain. (I’m not going to cut and paste a zillion picutres, and format that!)

  4. Did Gavin specifically state he was talking about AR(1) processes? I thought his example was of sinusoidal “weather” (like the ocean oscillations, TSI, etc) and you’re addressing something different here?

    One problem is they’re not strict sinusoids so you can’t pull them out with Fourier analysis (which I’d been trying at some point back); they’re more chaotic than that, but they are real. But I don’t think AR(1) (which I’m not terribly familiar with anyway) describes the kind of long-term slow non-periodic oscillation that’s the main issue here in the talk about “weather”, does it?

  5. Arthur:
    In this comment Gavin discusses an AR(1) process that is supposed to be “closer” to the models.

    He does not say the models result in AR(1) behavior, or that AR(1) is a good representation of the behavior in the 2000-2030 range of predictions. (Or when volcanos don’t erupt etc.)

    What Gavin said was:

    However, for the case that is closer to what I did with the models, I calculated 9000 7-year AR(1) time-series with an underlying trend=2 degC/century and with p=0.1. The distribution of the resulting OLS trends is N(2.0,2.2), but the range of the s was from 0.16 to 3.8 (mean was 1.7). The chances of getting within 25% of the s.d. in the trend distribution (i.e. an s between 1.65 and 2.75) is just under 50%. The chance of getting something less than half as big, is ~15%. Therefore there is a significant uncertainty in what the real s is given only one measurement of it.

    You can calculate the ‘falisification’ rate as you define it though: the fraction of cases where m+2*s < 2. …. this turns out to be 5%. So you aren’t doing too badly. :-)

    However, I’m testing that process Gavin described because Gavin used it to justify the plausibility of Gavin’s 8 year uncertainty intervals, while representing some sort of match with the model-data.

    Unfortunately, if we compare the “weather noise” in Gavin’s process to measurements of the “weather noise” got, using the AR(1) process Gavin suggests is “closer” to the data, it turns out we need to reject Gavin’s AR(1) process. The reason is: The statistics for the monthly weather are an enormous outlier.

    I’ve tried to translate the AR(1) to monthly in two different ways, and posted one. But, FWIW, both falsify.

    There are things I didn’t look at:
    1) I didn’t add measurement noise to my simulations.
    2) I didn’t look at the model runs themselves to see if AR(1) is even a reasonable description first place. I just wanted to test what happens with the one Gavin suggested first. (Outcome: we must reject it.)

    So, there is still stuff to do. But Gavin’s argument in that paragraph breaks down under scrutiny because that particular AR process must be rejected when compared to data. The test does not involve the magnitude of the trend, only the amount of variability in the “weather noise” and it’s time constant.

Comments are closed.