Accounting for Measurement Uncertainty.

How many of you think the measurements of GMST are imprecise? As in, they might contain errors of some sort? I do. 🙂

Nevertheless, up to now, I have been performing hypothesis tests applying the assumption that the time series data for global mean surface temperature (GMST) are AR(1) distributed. I have been doing this despite the fact that as far back as December, I discussed the idea that even if the actual temperature process is AR(1), measurements uncertainty results in data that obeys a slightly different process.

Of course, everyone must always make assumptions about a statistical model; in some sense, it’s fine to make assumptions about the data rather than the underlying process. However, in order to permit consistent treatment of annual averaged and monthly data, it is preferable to make assumtions about the underlying process itself, and then overlay things like measurement uncertainty and integration. Making assumptions about the underlying process is also preferable if we ever transition to discussing phenomenology.

So, now I’m going to begin to assume the GMST is AR(1) but the data from Hadley, GISS, NOAA, UAH and RSS includes measurement uncertainty. If I analyze the new process, and perform trend analyses to test the IPCC projections, I calculate larger uncertainty intervals. (Based on estimates of the noise, they won’t be much larger, and don’t change the “reject/fail to reject” conclusions of July’s “hypothesis test” post. (If they had, I wouldn’t have posted it!)

However, the difference in uncertainty intervals is a bone of contention. Also, it may make a difference to hypothesis tests in future months. So, I do want to discuss the effects of measurement uncertainty, and going forward, I will be adding a column accounting for measurement uncertainty.

Uncertainty intervals to account for measurement uncertainty.

In terms of statistical analysis, this post addresses the question:

How do we estimate uncertainty intervals for a trend in the global mean surface temperature (GMST) as function of time, if we assume the expected value (or underlying climate trend) for the GMST increases linearly, the “weather noise” is an AR(1) process and Gaussian, but that for various and sundry reasons, measurements are imprecise. To make this problem tractable, I decided I would further assume the measurement errors due to impression are normally distributed and uncorrelated from month to month. That is: I assume the errors can be treated as Gaussian white noise with variance σe.)

Because AR(1) processes are referred to as “Red Noise” processes, I’ll be calling this method of estimating the uncertainty intervals “OLS corrected for Red+White noise”. Later this week, I’ll post the results of this month’s hypothesis test using these uncertainty intervals.

Effect of adding white noise on the correlogram for a Red Noise

I discussed the effect of lack of measurement precision on the lagged correlations for an underlying process AR(1) (i.e. Red Noise) process back in December 2007. If we assume the global mean surface temperature (GMST) for the earth is an AR(1) process, and lack of precision associated with measuring the system superimposes white noise on that process, the autocorrelation for the measurements process should vary as:
(1) ρm(t)= α e-t/τ

where ρm(t) is autocorrelation of measured values as a function of time , t is time, τ is the integral time scale for the AR(1) process, and α is the ratio of the variance of the residuals for to a fit for the GMST itself (i.e. σR2) to the variance of measurements of the process ( σm2)
That is α=(σR/σm.)2 (Note, I’m using “R” to imply the weather noise for GMST is “red”, “m” to imply “measurements” and “e” to describe measurement errors in the measurements. Using the assumptions described previously, σR2+ σR2=σe2. )

Using simple multiplication, followed by averaging, it’s easy to show that if the GMST were sampled even increments of time Δt, and an infinite number of samples were collected, the lag ‘n’ correlations for the measurements would obey the relation:

(2a) ρm,n= α(ρR,1)n for 0<n

and
(2b) ρm,0(t)= 1 for n=0

where ρR,1= e-Δt/Ï„ is the lag 1 autocorrelation for the underlying red noise process sampled at intervals of Δt. (In the context of this blog post, that would be the lag 1 autocorrelation for the “weather noise” itself.)

(Note that, if the process is sampled continuously, but integrated, the correlogram will also be described by (2a), but the magnitude of α will be affected.)

Simulations

Since many people don’t believe mathematical derivations, (and I don’t trust myself to never make mistakes), I ran a simulation of with 10,000 “months” of AR(1) “weather” with “white noise” added, and compared the simulation to the theory. I compared the lagged correlation for the first 5 lags to the theoretical value for those lags, and obtained good results. (The value of the lag1 correlation for the measured value matches the lag 1 for Merge 3 data this month’s hypothesis test post. The amount of noise was selected to make a stringent test.)

Correlogram based on 10,000 months of data.Click for larger.

Assuming I knew the best fit parameters, how would I estimate the uncertainty in the trend?

So, now I’m at the point where I can estimate uncertainty intervals based on a data model. But how do I estimate them?

In an article discussing Hurst Tamino suggests that if we know all the lag 1 autocorrelations for a process, and we have a time series with “N” data points, we can first use OLS to determine a trend . To obtain the uncertainty intervals for the trend, we can use the normal method, but substitute the effective of degrees of freedom, Neffective, for “N”. To compute the effective number of degrees of freedom, we use the relation:

(3)N/Neffective= 1 + 2Σ Ïm

Where the sum is computed over m from 1 to infinity, and ρm is the lag ‘m’ correlation for the statistical models.

For the model described above, we can show that for the red/white process, with known parameters, the ratio becomes:

(4)N/Neffective = [1+ (2α – 1)ρR,1] /(1- ρR,1)

Note when there is no measurement uncertainty, α=1, and the result is

(5)N/Neffective = (1+ ρR,1) /(1- ρR,1)

which Tamino recommended as the correction for Red Noise in August 2007, when he showed the GMST had a statistically significant trend using 91 months of data from 2000- July 2007. It’s possible to show that if α <1, as occurs when there is measurement noise, equation (4) will always result in larger uncertainty intervals. So, of course, we must use (4) when testing IPCC models-- making them more difficult to reject, while those proving global warming get to use (5) -- making it easier and defend the accuracy of their proofs based on expertise. 🙂

How can we determine α and (ρR,1)?

I’ve done enough fiddling with monte-carlo to determine that estimating decent values of α and ρR,1 based on 90 months of data is not easy! (Moreover, I don’t know if there is an “official” way to do this. Luckily, I can count on critics to suggest methods, which I will then go ahead and do! 🙂 )

As far as I can tell, the key difficulty is estimating α. Once this is estimated, the estimate of ρR,1 is no more difficult that estimating ρm,1 based on the measurement! (This isn’t necessarily that precise, but the effect of the uncertainties in this estimate is easy to compensate for when doing the uncertainty analysis!)

So, going forward, I’ll be doing estimating α three possible ways:

  1. I’ll estimate the amount of measurement uncertainty using estimates from the data centers themselves. Currently, based on data from Hadley, I estimate the standard error (aka measurement uncertainty) for the monthly data is sN=0.44C. (I found the ±95% confidence intervals since 1990, averaged and estimated the standard error based on that.) Once this is determined it’s possible to show that α= (sm2-sN2)/sm2 where sm is the sample standard devaition of residuals to the regression for the measurements.)
  2. I’ll estimate the measurement noise from the standard deviation of monthly measurements to data from the GISS, Hadley, NOAA, UAH and RSS.
  3. I’ll estimate based on a linear regression of the lag “n” correlations up to the first negative one. I’ve been doing this in all my files, using data since 1993. (Atmoz found “issues” with the early satellite data, and I picked my start data based on that.) Based on this, the uncertainty in measurements is 0.66 C. (This method can both over or under-estimate the measurement uncertainty, for a variety of reasons.)
  4. I’ll estimate based on a linear regression to the autocorrelations up to the first one that goes negative. Monte Carlo simulations show this method results in a large amount of scatter– but it’s currently giving sN=0.77C, for “merge 3” data. This a larger value for the uncertainty that the other methods and results in the largest possible uncertainty intervals for my data. So, I’m using it as the upper bound. (In future, I’ll also be modifying to do weighted least squares to compensate for a transform into the natural logs before fitting.)

Conclusion

I think I have proven, once again, that I can write some truly boring posts. However, I’ll be using these results, so I needed to post them.

Still, just in case you want to see whether or not “others” are thinking about this process, I will suggest your read a comment by Tamino, in which he pretty much uses the statistical model described in this post. He describes some results.

So I computed the trend rate from 1998 to the present (a wee tiny bit more than the last decade), for both ENSO-corrected data series, using the complete formula for autocorrelation correction to linear regression, modeling the AR coefficients as rho_j = rho_1 * alpha^(j-1) (which gives a much better fit to the AR coefficients than the AR1 model rho_j = (rho_1)^j). The results: for ENSO-corrected HadCRU: +0.0014 +/- 0.0172 deg.C/yr (2-sigma); GISS: +0.0134 +/- 0.0180 deg.C/yr. The 95% confidence intervals are, for HadCRU: -0.0158 to +0.0186; GISS: -0.0046 to +0.0314. Both ranges include the oft-quoted modern trend rate +0.018 deg.C/yr.

Given that I’m always testing the IPCC 2C/century projection, I was interested in something Tamino didn’t highlight. It is this:

  1. After correcting for “El Nino” (to knock the temperature in 1998 down), and
  2. Expanding the uncertainty intervals using this model
  3. The trend for HadCrut since 1998 would falsify 2C/century! (GISS doesn’t falsify 2C/century. But then, that’s the same result I get.)

Granted, the IPCC AR4 projection doesn’t apply until after 2000 by any stretch. Still… it seems now that the temperature trends are not quite so “up”, some people are abandoning the methods they used to create tighter uncertainty intervals last August. 🙂

Update: For future references wikipedia article.

15 thoughts on “Accounting for Measurement Uncertainty.”

  1. Seems to be quite a conundrum. Tighter uncertainty bands are desired to support that the trend is a certain value early in the prediction period. But after nature doesn’t behave as intended, what to do? Look for reasons to widen the uncertainty while walking that tightrope of maintaining confidence (of other people, not statistical confidence) that the trend is well-defined and that the models are in the “fail to reject” category. But the wider uncertainty undermines the purpose, as 0 and negative slopes become part of the 95% confidence interval, which reduces the impetus of politicians or the citizenry to take action.

  2. JamesH:

    Ahhh.. but purpose of getting the uncertainty intervals isn’t supposed to be political. Right? 🙂

    Notice that unlike some who just want to remind us uncertainty intervals “are bitter”, I am trying to quantify the larger size of uncertainty intervals. I want to connect any estimates to something even those who don’t know loads of statistics can test and understand.

    For example: Does it make sense to say the amount of ‘measurement noise’ is near the level suggested by Hadley? Or to suggest it’s much, much larger in order to create large uncertainty intervals? Does it make sense to suggest the “weather” is an AR(1) process? We can at least look at the data to see if that assumption is clearly wrong, and if yes, change it. Or we can look at model data. (Which is a next step.)

    JohnV also looks at things quantiatively. Though we often disagree on the level of agreement on the effect of the solar cycle, at least we can verbalize our disagreement, and then go hunt for evidence and present it. (I’m hoping JohnV does start posting, as we’ll get better information faster if he does!)

  3. “Ahhh.. but purpose of getting the uncertainty intervals isn’t supposed to be political. Right?

    I know, I was being a bit sarcastic as I watch this unfold and try to learn what I can about how to approach and understand these analyses that you’re sharing with us.

  4. Dear Ms. Liljegren,

    On June 29th, you posted a graph showing James Hansen’s 1988 climate change predictions plotted on the same scale and observed temperature changes from the Hadley Center and GISS.

    I sent a letter to Andrew Revkin, science writer for the New York Times, enclosing your graph and suggesting he publish it. After a couple of e-mails back and forth, he said he was interested but that he would need your permission.

    If you would not mind your June 29th graph being published in the New York Times, would you reply to me? I will forward your reply to Mr. Revkin. Alternatively, I could step out of the middle of this exchange and you could contact Mr. Revkin directly.

    Please let me know if this meets with your approval.

    Thank you for your interesting and informative calculations. You often apologize for “boring” posts but be assured that if a topic interests you, it will interest those of us who follow your web log faithfully.

    Sincerely yours,

    Dale R. McIntyre, PhD
    Bartlesville, OK

  5. I think I have proven, once again, that I can write some truly boring posts. -lucia

    Boring? Never.
    Overwhelming, but in a good way? Maybe.
    Awe-inspiring? Often.

    Besides, real men find chicks with a gift for math irresistible –which is probably why I am married to a math babe.

    Cheers.

  6. Lucia,

    As I understand the “weather noise” plus “measurement noise” model, I would write it as:
    1) Y(t) = B0 + B1 * X(t) + w(t) + u(t),
    where:
    Y(t) is the temperature variable (avoiding the notational problem of ‘T’ for temperature and ‘t’ for time.)
    X(t) is the trend variable usually equally to ‘t’, time, (or a whole set of exogenous variables)
    u(t) is the measurement noise and assumed to be normal with zero mean and variance V1,
    w(t) = z(t) + rho* w(t-1), where z(t) is the innovation (i.e. the i.i.d. “noise” with no serial correlation, zero mean, variance V2)

    Because u(t), the measurement noise, is not part of the autoregressive process in w(t), its existence messes with the estimation of the variance of weather noise. Now here is the problem: the measurement noise doesn’t simply mess up the estimate of the weather variance, it also messes up the estimate of the B1 and rho coefficients. If w(t) were not autoregressive, then the addition of the u(t) would be equivalent to increasing the variance of w(t) by the variance of u(t). Not desirable but not apt to lead to bias or misunderstood estimates. However, the u(t) is part of the Y( ) and Y(t-1) is a regressor when we correct for autocorrelation or simply estimate the autoregressive coefficient in a nonlinear or maximum likelihood format. And yes, that applies to Cochrane-Orcutt, but the coefficient on trend should be biased because the regressor there — time — is strictly exogenous and deterministic. If there is a measurement error in the regressors — a mismeasurement from the right — the regression coefficients are biased downward in absolute value: they move closer to zero. [Note: this point is a little subnote in regression analysis that anybody who does practical work with regressions should know. “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left” Hausman, Jerry, “Journal of Economic Perspectives” Vol 15, No. 3, Autumn 2001, pg 57-67 is a clear simple explanation of the effects of measurement errors on regression models. At http://faculty.smu.edu/millimet/classes/eco7377/papers/hausman.pdf%5D

    The upshot of all this is that AutoCorrelation Function (ACF) of equation 1) should NOT approach the theoretical ACF ( AR1 = rho, AR2 = rho^2, etc. ). Now this is not so say that you can’t make estimates of the effects of the measurement errors, especially if you are willing to assume an AR(1) process. Sorry to say that I don’t have such a procedure, even after thinking about it for awhile.

    Could you explain you Monte Carlo in a bit more detail because right now there appears, at least to me, something wrong with your conclusion there.

  7. Martin–

    Which conclusion or monte-carlo do you want me to explain?

    I generated a time series for the monthly temperatures as red noise. Then I added white noise to that. So, I did:

    Y(t) = B0 + B1 * X(t) + w(t) where Y(t) is the temperature.

    Then, I created the measurement “Z(t)” by adding white noise’
    Z(t)= Y(t) + u(t).

    So, the Z’s end up with the sort of equation you describe.

    I generated 10,000 of these, the calculated the lag 1,2,3 and 4 coefficients. I compared the lags to what they should be based on the simple theory (equation 2a). They match.

    The trend for the long series matched the one I used in the simulation– there wasn’t any problem. So, I don’t know which part of the conclusion you want me to explain.

    It’s true that my ability to estimate the trend based on small amounts of data degrades– but that’s what’s supposed to happen. That’s why I’m looking at this.

    Thanks for the reference.

  8. Lucia,

    I was referring to the near equality of the autocorrelations between the theoretical values and the empirical values. The theoretical are just the integer powers of rho and the empirical are the correlogram values for the Z(t) in your comment (same as the Y(t) in my comment). I am saying that if the are close, something is wrong, e.g. the variance of the u(t) is very small compared to that of w(t).

    I will build an Excel Monte Carlo to show what I mean, but it will have to wait a few days as I recover from the return of my family and the advance of carpenters.

  9. Martin–

    The theoretical are integer powers multiplied by a constant, α. If you extended the straight line for the correlations, they don’t go through 1 at t=0. The offset matches the value for the α I happened to use.

    I picked this case so the variance of u(t) is nearly equal to the variance of w(t). Otherwise, I’d just be testing red noise or white noise. When the variance of u(t) << w(t), the extension of the fits through the correlogram go through &rho=1; at t=0. When u(t) >> w(t), the autocorrelation drops to zero away from 1.

  10. Lucia
    Insightful to follow your statistical probings.

    You and your readers might find useful the following links on Measurement Uncertainty: Uncertainty of Measurement Results at NIST

    The information on evaluating and expressing measurement uncertainty within this reference is adapted from NIST Technical Note 1297. The publication TN 1297, prepared by B.N. Taylor and C.E. Kuyatt and entitled Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, is in turn based on the comprehensive International Organization for Standardization (ISO) publication, Guide to the Expression of Uncertainty in Measurement. Users may access TN 1297 online in a pdf version or in an html version, or order it at no charge for postal delivery. They may also purchase the ISO Guide.

    Steven D. Phillips, Keith R. Eberhardt, & Brian Parry, Guidelines for Expressing the Uncertainty of Measurement Results Containing Uncorrected Bias Journal of Research of the National Institute of Standards and Technology Volume 102, Number 5, September–October 1997 pp 577 – 585, http://nvl.nist.gov/pub/nistpubs/jres/102/5/j25phi.pdf

    It appears you are modeling the Type A uncertainty (e.g., standard deviation).

    Type B – Bias errors:
    However, you may wish to include the Type B or bias type uncertainty.

    Urban Heat Island
    While you have probably been following this, may I recommend exploring for bias due to the Urban Heat Island effect.
    I have seen several graphs showing that the “global warming” trend is a strong function of the size of the metropolis or county where the measurement is taken.
    E.g., see Fig 15 in Environmental Effects of Increased Atmospheric Carbon Dioxide ARTHUR B. ROBINSON, NOAH E. ROBINSON, ANDWILLIE SOON, citing data from:
    51. Goodridge, J. D. (1996) Bull. Amer. Meteor. Soc. 77, 3-4; Goodridge, J. D. (1998) private comm.
    52. Christy, J. R. and Goodridge, J. D. (1995) Atm. Envirn. 29, 1957-1961.

    Anthony Watt et al. at surfacestations.org is further categorizing the quality of the meteorological stations from 1 to 5 with increasing degradation of the measurements due to proximity of parking lots, air conditioners etc. They have surveyed 44% of sites so far. See presentation 8/29/07

    “Most stations have estimated bias errors larger than 1°C, and most are 2 °C.

    On top of those static bias errors, there appear to be UHI trend bias errors, e.g., from increasing population, adding parking lots and air conditioners, and moving the sensors nearer buildings.

    See also Climate Audit
    Trends in Peterson 2003 Steve McIntyre Saturday, August 4th, 2007 at 3:01 pm
    See
    Urban vs Rural temperature trends.

    This Type B Urban Heat Island bias would be within the temperature measurement series that in turn make up the larger temperature trends you have been using. Formally identifying and quantifying that Type B bias and publishing it would throw a very big rock into the Global Warming pond.

Comments are closed.