Every now and then, someone who doesn’t like what I do at my blog suggests I should be using annual average data instead of monthly data. They have correctly noticed that test based on 8 years of annual average data give different results than test using 96 months worth of data, and, if they like the results with annual average data better than those using monthly data, they suggest that result based on annual avearage data is somehow “better”.
Well…. erhm….
Today, I’ll show that, at least for cases where the residuals are “gaussian white noise”, using monthly data gives better results than using annual average data. Or at least, statistical analysis based on monthly data are “better” than those using annual average data if you think the following are “good”:
- Have smaller variability in trends from all possible samples drawn from the population of all possible 8 year (or 96 month) trends. (That is: you want to chose the method that maximizes signal to noise ratios given the same amount of data.)
- Have lower total rates of error while matching the confidence level of when you reject the null hypothesis. (That is: You want to make fewer mistakes overall.)
On the other hand, if you really, really, really want to strongly prefer to cling to your belief in an incorrect null hypotheses, you can go around telling people to always use annual average data instead of monthly data.
There are a number of ways to show analyses based on monthly data are better that those using annual average data, but I know people don’t like equations much. So…. I generated synthetic data. 🙂
The test
To test whether computing trends using monthly data or annual average data is “better”, I did the following for monthly data:
- Created a series of 96 months of data by adding a trend of 0.02 C/year + noise for that month. The noise was gaussian and white with a standard deviation of 0.1C and zero mean.
- Computed the trend and the standard error in the trend using the standard formulas in any undergraduate text. (The noise is known to be white and normal, so the standard method applies. )
- For 94 degrees of freedom, the cut-off for the 95% confidence interval is 1.986, so I checked whether the trend fell outside the range from (0.02 C/century ±1.986 * the standard error ). If it did, I rejected the trend of 0.02 C/century. This represents a test of whether or not I reject a correct null hypothesis. Because this data is generated synthetically, when a run rejects 0.02C/century, I know that rejection is type I error.
- To estimate the type II uncertainty for this sort of test, I chose a knowingly incorrect hypothesis of 0.015 C/year and tested how often the runs generated with a trend of 0.02 C/century would reject the 0.015 C/year. When this data fails to reject 0.015 C/year, this is type II error.
- I repeated this 40,000 times. (There is no particular reason for this choice.)
- Based on the 40,000 runs I computed a) the standard deviation of trends, b) the rate I rejected 0.02 C/century (i.e. the rate of type I error) and c) the rate I failed to reject 0.01 C/century. (The rate of type II error.)
I then repeated the procedure, but this time generating 8 year worth of annual averages. To create annual average data, I used white noise with a standard deviation of 0.02889751 C, which I obtained by taking the ratio of 0.2 C to the square root of 12. Naturally, I used the cut-off of ±2.45 Standard errors associated with 6 degrees of freedom.
Results:
Because both cases will have identical type I errors, I wanted to rate the two methods based on a) which has the lower standard deviation in trends, b) which has a lower overall error rate. Here’s what I found:
- The standard deviation in trends was 0.0044 C/year and 0.0045 C/year for the analysis based on 96 monthly values and 8 annual values respectively. Smaller standard deviations means less noise. While the difference is slight, if you happen to have 8 years worth of data, and you want to obtain the most precise estimate based on the data available, you should do the analysis based on monthly data. Averaging the 12 months worth of data to create annual averages before fitting the trend will increase the noise in your final results.
- The type I error was 4.8% and 5.2% for the analyses based on monthly data and annual data respectively. If I used infinite numbers of samples, these should be 5% for both, but I only used 40,000 synthetic runs and these are within the expected range given the number of runs I did for each case.
Based on type I error, the two methods are tied, as dictated by the protocol of selecting the confidence interval. (If someone really wants me to, I can tweak my program to make it easier to runs for a millions zillions data points and count.)
- The type II error for an incorrect “null” hypothesis of 0.015C/year was 79.9% and 84.0% for the analyses based on monthly data and annual data respectively. This means the total error rate was 84.7% and 89.2% based on monthly data and annual data respectively.
So, using monthly data beat out using annual average data based on this test.
Now, some will wonder: Is this a general result? It is a general result when the residuals are gaussian white noise. Fitting trends to monthly data will always be better than fitting to annual average data. For some cases, fitting trends to monthly data will be much better. For other cases, it will only be a little better. It depends on how noisy the data are, how many years worth of data you have, and which alternate hypothesis you test. However, statistically speaking, fitting to monthly data is always as good or better than averaging the data to create an annual average, and then fitting to get the trend.
Could there be situations where fitting to annual data might be better? Maybe.
I haven’t tested for data with temporal auto-correlation. I suspect the answer will come out monthly data are better for nearly any type of noise except noise with an annual cycle you cannot remove. (The anomaly method for temperatures is supposed to remove that.)
As far as I can see, unless you suspect your data contains an annual cycle that you can’t remove by defining anomalies, and you are not using a multiple of 12 months, there are good reasons to expect monthly data will generally give better results than analyses based on annual average data.
If you think you have a case where annual average data will give better results for trend when estimating trends, you should run some test synthetic data created using the specific type of noise model you think describes your random process and check. Otherwise, if you make it a practice to use annual average data when monthly are available, I think you are running a serious risk of unneccesarily making more mistakes than required based on the data currently available.
Or…. as I said before, if you want to cling to your belief in a null hypothesis, and you think people listening to you don’t know analyses based on monthly data contain less noise, the go right ahead and do the analysis based on annual average data. Why let anything come between you and your confidence in your own favorite theories? 🙂
“Naturally, I used the cut-off of ±___ Standard errors associated with 6 degrees of freedom.”
Typo. I guess you meant to write 2.45.
I would say it’s also better to use monthly data because you have more degrees of freedom than when using annual data. Thus you get tighter CIs.
Lucia:
I think “much better” occurs if you use a simple 12-month unweighted running average to capture the annual trend. As I’ve commented in another thread, unweighed running averages are little more than a very poor quality of low-pass filter.
I think what is happening in this case is some of the high-frequency noise is leaking through your filter into the 12-month averages and artificially inflating the effective noise floors. This is easy enough to confirm by computing spectral periodograms… if I’m right, you should see an inflated noise floor for frequencies approaching 0.5 year^-1 in the annual averated data, compared to periodograms of the original monthly data.
If you use e.g. a 4th order forward/backwards Butterworth filter (and keep only every 12th point of course), I would predict you would find little between the two approaches.
That all said, again my opinion here, I think there is in general [*] no justification for taking an average of monthly data and averaging it before fitting it to a trend. At the best you would get no difference in the results, and unless you are very careful, you’ll end up inserting extra noise through your intermediate and unnecessary data manipulation.
[*] I use “in general” in the sense of a set of measure zero, for the math types who always look for confounding exceptional cases to ever statement. 🙂
Sorry if it appears that I’m being picky.
“For 96 degrees of freedom, the cut-off for the 95% confidence interval is 1.986,…”
I think you meant to write 94 degrees of freedom.
Chad:
Well there’s only one right answer, so it’s hard to argue that you’re being picky here!
Are you suggesting that Lucia may need to fix her analysis?
😛
Carrick,
No, I’m not suggesting that. I checked the cut-off for the 95% confidence interval and it is 1.986. Just pointing out a typo.
Hey Lucia, I ran a similar simulation using the same parameters. However, I used red noise instead of white noise. I used GISSTEMP to get an autocorrelation coefficient of 0.450. In addition to using OLS, I also used the Nychka correction for serial correlation. Here are the results.
Type I – OLS 21.76 %
Type II – OLS 67.24 %
Type I – Nychka 4.06 %
Type II – Nychka 91.29 %
I know the point of this analysis is to compare monthly to annal type errors, but I didn’t run anything for annual data because given so few data points a) estimating rho would have a great deal of uncertainty and b) even if rho were exactly known, using it to generate red noise would be less than adequate. I checked by looking at the autocorr plot for the generated noise and the AR(1) coefficient was well off the mark and fell well within the 95% bounds. The rho I calculated for the annual data was negative, but not significant (95%).
Carrick–
When you use filtering, you aren’t assuming the underlying trend is linear, right? So, you aren’t fitting a trend, right?
This post simply discusses the choice if you are going to fit to a least squares line afterwards. You say this:
That opinion would be confirmed by my computations. There is no reason to average first and then perform the least squares fit. That result ends up worse.
Chad– By Nychka, you mean doing OLS and adjusting the number of effective data points by dividing with (1-rho)/(1+rho), right? (Rho is the lag 1 serial correlation in the sample?)
Or do you mean the suggestion published in a later Nychka internal report that includes a further correction of +0.68/sqrt(N) added to “rho”?
I’ve done both with the real lag 1 autocorrelation = 0.45. If I use enough runs, the first Nychka method converges type I errors to getting just a little too high. The second one converges with type I errors too low.
Like you, I get if we drive with the same innovation of 0.1C, and then just add autocorrelation, the type II errors increase quite a bit.
You were wise not to compare the annual average and monthly cases with autocorrelation. As a rough approximation, if monthly average data have lag 1 autocorrelation of 0.45, the annual data will have a serial correlation of 0.45^2. Unfortunately, this is very rough, because the conversion for the lag-1 autocorrelation actually only works for if the monthly data and annual data we for a specific time (say 5 am, May 1, then another exactly 30.1 days later…. for monthly data.) Averaging a process whose autocorrelation decays exponentially with time actually results in an ARMA(1,1) process, and the parameters are affected by the correlation in the continuous process, and the size of the averaging window. You have to do the math to figure out how the monthly and annual average parameters relate. (I’ve done the math. I could run the two cases, but I figure those people willing to believe me will think showing the white noise case is enough. Those who wish to cling to using annual averages will do so!)
Chad,
Thanks on catching the glitch. I actually wrote the draft with ___ and ___ respectively in places. Then cut and pasted numbers in. I missed that one! You were correct that, I did not use ____ * standard error as my number and that was a boo-boo. 🙂
Lucia- I used the correction found in that unpublished paper by Nychka. The one with the 0.68/sqrt(n) term.
But why stop there?
Daily data should be better.
Switching to hourly (or any sub-day period) would mean you get to deal with adjusting for day/night patterns.
But daily should clearly be better unless there’s a tremendous number of missing records. And there are several techniques for re-weighting if a particular period is underrepresented.
Lucia,
is there an available graphical illustration of your test and result for us “visual folks” ?
Chad– Then you and I get the same sorts of answers. I find the original method gives a few too many false rejections; the correction is over corrects and give too few.
Alan S Blue–
Daily data would be better. However….I can’t get that. Whatever you get, if there is a cycle, you do need to get that cycle out.
Given the type of noise, I suspect the incremental improvement from daily data to hourly data will be tiny. Obviously for those not mathematically inclined, reducto ad absurdim is likely to work:
Would it be more accurate to find trend using
a) 20 year avearages spaced 20 years apart? or
b) annual averages spaced every year or
c) monthly data spaced monthly or
d) daily data spaced daily.
etc.
If you go to every second fitting the data is computationally intensive. But, we all have computers now. 🙂
Gardy La Roche….
Actually, it is possible to show what’s going on with type II error graphically. It’s tedious showing the graphs. But maybe I’ll do it next week. This post might help:
http://rankexploits.com/musings/2008/falsifying-is-hard-to-do-%CE%B2-error-and-climate-change/
But I can make graphs that specifically apply to the current post next week and explain it a bit more. Connecting it to the current post would probably help because the old one is tailored to a different question.
Alan :
Yes , daily is clearly MUCH better then monthly with the same assumptions as for the monthly-yearly comparison .
Problem is that the white noise hypothesis is clearly not reasonable for daily data that are significantly autocorrelated .
One could argue that it is not very reasonable for monthly and yearly either but the autocorrelation is not so obvious as it is for daily data .