This post is mostly for Paul_K; he and I discussed on of my claims in my previous article discussing the NOAA anomaly. Specifically, I claimed to compute two possible test variables from linear fit to observed data which in the current post I will call d*a and d*m and said the errors in these tests parameters were uncorrelated. I probably worded it in a confusing way, and I’d like to clarify for Paul_K. After clarification, it may turn out Paul_K has a suggestion, but for the time being, we seem to be talking at cross purposes. To do so, I’m going to discuss some observations based on trendless synthetic data.
Suppose I generate a series of 137 months of Gaussian white noise with standard deviation 1 and mean zero, and call that “Yi“. I can now compute the mean of Yi in the normal way. We know the true mean of this data should be 0. But instead, we will get some sample mean value Ybar, which we recognize as an error. That is: it’s different from 0, the known correct value if we repeat the experiment over and over and over and take an average over the ensemble of all possible experiments. Let’s call this εY.
Next, I can also compute the best fit trend through
Yi=m i + b + ui
where the ‘i’ values are (1,2,3,…. 137), m is the best fit trend, and b is the intercept at i=0, and ui are the residuals. The best fit values of m and b are those which minimize the sum of the squares of the 137 residuals.
Because I’ve generated the Yi as gaussian white noise, we know the true value of m for the process is m=0. Nevertheless, when I generate 137 data points, I’ll nearly always get a non-zero value for the trend based on the sample, m. The difference between the sample value of the trend, m, and the true trend is the error in the trend. Let’s call this εm.
My claim is that the errors εY and εm are uncorrelated. I also make a separate claim that if we do the least squares fit to the
Yi=m (i – imean) + a + ui
where imean is the mean value for the indices of the data. So, if the indices were 1 through 137, imean=69.
We’ll find that the value of ‘a’ is equal to the sample mean of the Yi; I leave showing this as an exercise to the reader. 🙂 But it’s obvious from this if ‘a’ is equal to the sample mean Yi; then the error in a just the error εY. So, I am now claiming the errors in the determination of ‘a’, the intercept in the second form of the fit is uncorrelated from the error in the determination of the trend ‘m’.
To provide an empirical demonstration of this when the residuals are white noise, I created 105 137 month sequences of white noise and computed a the sample mean and trend for each series. This resulted in 105 pairs of trends and means. I then took the step of normalizing with the estimate of the uncertainty based on the least squares fit. This results scaled errors with mean equal to zero and standard deviation equal to 1. I then computed the cross correlation between the scaled error for the mean and the trend which for the most recent case tested happened to be -0.0004674765; and also made a scatter plot of the two:

The more or less circular appearance is typical of uncorrelated data.
Since the main purpose of this post was to clarify a point for PaulK, and I think this might be sufficient for the clarification, I’m going to stop here so we can continue our conversation on my attempting to develop a sort of “portmanteau” test involving testing the combination of the trend and the constant term from a least squares fit. I think such a test would have the properties of being a bit more difficult to cherry pick and under some circumstances can be more powerful than testing either the trend alone or the constant term alone. (There are circumstances where I think it may be worse– specifically, if the baseline was too short in time, it can become less powerful. But testing power is something that can be done after we figure out if I’m missing something that makes my portmanteau test untenable. )
R File: This R file (MonteCarlo d_star.) was used to generate the plot. The demonstration with AR1 is also included. So are various cryptic and obscure comments.
Lucia,
The ‘circularity’ of the plot (and perfect lack of correlation) would be better if the x and y scales matched each other physically… that is, make the x-dimension of the graph a little smaller.
But I have to admit that after seeing your graph I am convinced of two things:
1) Your argument is correct.
2) Based on the similarity between your graph and many graphs from published climate science papers, you may have a future in climate science. 😉
SteveF– I thought about fiddling with the axes. Then, I thought “nah”. 🙂
Paul asked me to show this with synthetic data that contains a trend, so that will be coming tomorrow. (I’ve started dinner….)
Gosh Lucia, I feel privileged!
I have no problem with anything you have done so far.
Mmmm, I think I can also anticipate your result tomorrow, at least in part. I had missed the effect of your mean-centering the x’s.
Doing a quick bit of simple algebra, your expectation of the intercept for any single trial is the sample mean of the y’s. And the expectation of this intercept for any trial GIVEN a value of m turns out to be independent of m because of the mean centering, so you are right – no correlation between the errors in m and the errors in the intercept.
I will look at the second moment tomorrow, when I think I might have a new question… I really appreciate the clarification.
Paul_K:
Doesn’t this assume that the residuals of the fit are uncorrelated?
Carrick–
I think the errors in the trend and the errors in the ‘intercept’ still turn out uncorrelated from each other when there is temporal autocorrelation. But…. I’m not absolutely certain.
If you look at the R script, you’ll see I also compute for AR1. That doesn’t resolve every possible question– but I think it can be shown analytically that the errors in the trend and in the ‘intercept’ are uncorrelated but it’s a PITA, and it’s actually easier to just code up and check a few cases to see if I can detect any correlation before doing the algebra! (Lazy, yes…). (Bear in mind– ‘intercept’ is the wrong word as it’s only the intercept if you center the independent variable.)
Now there is something that is not uncorrelated: the trend computed in period 1 and the trend computed in the following period are not uncorrelated if the errors exhibit serial autocorrelation. That’s why I’m not sure that errors in m and the intercept will remain uncorrelated if the residuals of the fit show temporal autocorrelation.
Lucia, upon reflection, I think this is a pretty general result.
If we let E = sum((y_k – a – b x_k)^2,{k,1,n})
with sum(x_k,{k,1,n}) = 0
then the correlation coefficient between a and b is proportional to
E_a_b
(second derivative of E with respect to a and b). However,
E_a_b = 2 sum(x_k,{k,1,n}) = 0.
So the result seems pretty general, at least for the unweighted least-squares fit.
BTW, for the weighted least squares, given
E = sum(W_k (y_k – a – b x_k)^2,{k,1,n})
you’d have to require:
sum(W_k x_k,{k,1,n}) = 0
(e.g., weighted mean of x = 0) for the correlation coefficient between a and b to vanish.
This assumes W_k is uncorrelated with x_k.
OT
http://stevemosher.wordpress.com/2011/06/21/rghcnv3-a-new-package/
a new package for R. i gotta finish testing but i think i got the last bug sqashed.
O.K. Lucia, no do the data that reflects the temperature series.
Start the simulation with the white noise on 2 and as the series runs drop the white noise until the standard deviation is 1 for the last point. In the actual data series this is the case.
The assumption that error is Gaussian is worrying to me. In many data sets the error is lowest at the midpoint of a data set and more noisy at the two ends.
Re: DocMartyn (Comment #77703)
June 21st, 2011 at 5:24 am
I am fairly sure that (just) changing the sd of the error term will not affect Lucia’s result at all for a simple linear fit. (I still haven’t done the math for serial correlation.)
Lucia is effectively fitting:
y – ybar = m (x – xbar)
whence b = ybar even if the sd of the error term is varying.
Mosher–Kewl!!!
Doc–What you are discussing is heteroskedasticity, which is a different problem from non-Gaussian. I’m aware that the real data from 1880-now are heteroskadastic. The question of whether the two parameters are uncorrelated and the issue of heteroskadasticity are two different things. (Not being Gaussian is a third distict thing.)
The issue of heteroskedasticity might need to be considered on a case by case basis as might non-Gaussian nature of the residuals. For example: Heteroskedasticity owing to changes in the number of sampling stations doesn’t matter so much if tests are restricted to testing the forecasts. I’ll worry about that on applications where it might matter. Otherwise, no.
Lucia did you understand my comments, or do you think they missed the mark?
They seem to be pretty general and are a result of the fact that your basis functions (1 and x) are made symmetric and antisymmetric respectively by the recentering operation. You’ve made them orthogonal to each other, and the character of the data you are projecting them onto won’t affect this (at least for unweighted OLS).
Feel free to tell me I’m completely off base.
Re: Carrick (Comment #77709)
For what it’s worth, I get the same result as you do. It is interesting that this only holds (apparently) when you do the recentering bit. If sum(x_k,{k,1,n}) = 0 does not hold, a and b are correlated, in general.
I think that is correct, julio.
As I mentioned above, I believe it is due to the fact that the basis functions are orthogonal to each other, after you apply the shift.
Carrick–I think I understand and agree. You are finding the results is general. So, I’m not just happening to pick “noise” models I’m testing. That’s what I hoped the result would be.
Julio– Yes. If it’s not re-centered, the proof needs to involve (x-x_bar)– so effectively recentered. This is one of the issue with understanding the estimate of the uncertainty in the calculated ‘intercept’ term too. It’s much easier to understand lots of thing about curve fitting if all the variables are normalized to have zero mean.
Now I have to look at power. I think testing my combined metric gives a more statistically powerful test in some situations. (I also think it will be less powerful if someone picks a very short baseline.) I know the method has the advantage of being more difficult to cherry pick the result you ‘like’.
Lucia, glad I wasn’t just losing my mind.
/wipes_sweat
Actually, I use a similar technique for fitting to a sinusoid.
Write as:
y = a cos(w t) + b sin(w t)
(w = 2*pi*f is specified)
Transform t so that the domain of t is symmetric between -T/2 and T/2.
Now you have an even and an odd function, so the derivative matrix E_i_j is diagonal. Repeat and rinse if you have “N” sinusoidal functions you are fitting to simultaneously. (The trick only works in this case if the frequencies are commensurate with each other.)
Well, Lucia you are our Stats Witch and I demand you do my bidding; or something to that effect.
The plot is actually very nice. In real data you would find asymmetries as there is typically far less variance on one axis, compared with the other, you should get a lens.
I always think of points as black to white circles or lenses which describe the Gaussian. The real pig is non-Gaussian distributions, or changes in signal to noise ratios during data-sets.
Hi Lucia,
So I think that we now agree that:
a) In general, for a model of the form y = a + bx, the errors in intercept and the errors in trend are negatively correlated (except for b = 0).
b) In general, for a model of the form y = a + b(x-xbar), the intercept, a, will always be equal to ybar for that sample, and the errors in the intercept will not be correlated with the errors in trend.
OK so far (?)
Paul
DockMartyn,
I wonder if being an atheist precludes being a witch. Just askin’.
Carrick, Doc, Lucia,
Just a note, given Carrick’s analysis and Doc’s question. It is important to distinguish here between, on the one hand, the mechanics/properties of a simple linear (OLS) fit and, on the other hand, its validity.
The data may be autocorrelated, the error terms may be non-normal, auto-correlated or heteroskedastic. It makes no difference. The mechanics of a simple linear OLS fit say that the line will always pass through (xbar, ybar), so Lucia’s result (no correlation between the errors in intercept and trend) will ALWAYS work for data mean-centred on x, for this type of model.
It is a completely separate question whether the choice of the simple linear model is the correct one for the data. From some chicken-scratching that I have done, I do not believe the result can be generalised to all ARIMA models, but there may be some specific functional forms of models, such as AR1, where Lucia’s finding is still valid. (I have not succeeded in proving this yet.)
Paul
Re: lucia (Jun 21 06:40),
test code is cranking away.. arrg. I should have done TDD
Anyway, once it finishes I have decided to include demo code in the package
as opposed to writing a vignette ( arrg latex).. need to research how that is done.
The final package is very small, but it has all the core stuff.. I’ll add more
( ocean, metadata, etc ) when I get better at writing the docs.
Making R manuals is a PITA. laborious
Doc
I normalized by the estimate of the standard errors in the trend and intercept. That’s why there is the same variance on both axes.
On the heteroskedasticity/ Gaussian issue: No one is saying you are wrong to discuss those or be concerned about them when addressing some questions with the data. It’s just that the current topic happens to be whether the two parameters in the linear fit are uncorrelated. It’s a very focused question. Heteroskadasticity and non-Gaussian error can be important but just not to this specific question. Since we are all interested in this specific question, we are discussing it and ignoring things that don’t matter to this question. Sometimes you have to compartmentalize.
PaulK:
Paul_K, I agree… the correlation for OLS is bounded:
-1 <= rho(a,b) <= 0.