This post is a continuation of the discussion in Part 1 and will be equally rambling. There is a bleg at the end: (If you can suggest a test for the euquality of two variances in R, I’d appreciate it. 🙂 Update: I found this:”We can use the F test to test for equality in the variances, provided that the two samples are from normal populations.” here. Next rambling post will use that. 🙂 )
Now, for the ramble:
In part 1, I’d ended the discussion by comparing the distributions of
- the variance in trend “m” computed from 10,000 pairs of OLS fits to time series consisting of white noise, denoted $latex var_2(m)$
- the mean over two runs of the estimated variance in m from the same 10,000 pairs of OLS fits denoted $latex \langle sm^2 \rangle_2 $
In this post, I added the subscript “2” to indicate that the variance and mean are over 2 runs.
The distribution of the former was highly skewed. The distribution of the latter was somewhat skewed. But we saw that the mean of the two quantities were approximately equal of $latex \langle var_2(m) > ~ \langle \langle sm^2 \rangle_2\rangle $
One of my claims is that $latex \langle sm^2 \rangle_n $ is an unbaised estimator of $latex var(m)_n$ for any value of “n”, and so as the number of cases tested increase from 10,000 to infinity, we should find the two variances will be equal: $latex var(m)_2 = \langle sm^2 \rangle_2 $ This claim is not unique to me: it’s actually something that is known based on purely mathematical arguments. So, I’m not really going to test this claim per se.
What I’m going to do instead is to create 10 “models” worth of synthetic data consisting of pairs defined as follows:
- The average over 9 ‘periods’ of $latex var(m)$ computed over 2 runs. I’ll denote this $latex $. The brackets denote the quantity has been averaged once over 9 periods; the variance itself was computed over 2 runs.
- The average over 9 ‘periods’ of $latex \langle sm^2 \rangle_2)$ which was previously averaged over a pair of $latex sm $ from 2 runs. I’ll denote this $latex \langle \langle sm^2 \rangle_2 \rangle_9$, the double bracket denote that the quantity has been averaged twice: once over 2 runs and subsequently over 9 periods.
This will provide me with 10 pairs of $latex (\langle var(m)>_9, \langle \langle sm^2 \rangle_2 \rangle_9) $.
I will shove those into a t-test to see whether the t-test tells me to reject the hypothesis that $latex _9- \langle \langle sm^2 \rangle_2 \rangle_9) $ cam be shown to be skewed, I’ll also perform one-sided tests with p=0.025 to see the test is too strict or too liberal on one side relative to the other.
The goal of this tests is to see if the t-test “horribly bad” in the sense of not providing false rejections at the correct rate.
For those wondering why I am testing, the reason is that when I create the 10 data pairs as discussed above, I know, without even looking, that the neither the distribution of $latex \langle var(m) \rangle $ nor the distribution of $latex \langle \langle sm^2 \rangle \rangle $ are normal (i.e. Gaussian). This might not matter since I will be doing a paired t-test, but it’s easy to see that distribution of the difference between the two quantities is also not normal. All are skewed.
At the same time, one of the assumptions for the t-test is that the data being tested is randomly drawn from a population that is normally distributed. Clearly, my data are not drawn from a population that is normal. For purists, this is a big problem.
Nevertheless, t-tests are applied to data that is not normally distributed all the time, and claims that it is “robust” to non-normality abound. As a practical matter, the relevant question is: How much difference does the deviation from non-normality make given the distribution of data actually being tested?
Now here goes!
How skewed are the distributions:
| $latex \langle var(m)_2 \rangle_9$ | $latex \langle \langle sm^2 \rangle_2 \rangle_9 $ |
![]() |
![]() |
| mean=1.218e-05 | mean=1.201e-05 |
| Theoretical mean 1.188e-05 | |
| median=1.125e-05 | median=1.201e-05 |
If you examine the properties of both $latex \langle \langle sm^2 \rangle_2 \rangle_9 $ and $latex \langle var(m)_2 \rangle_9 $, it’s easy to see that $latex \langle var(m)_2 \rangle_9 $ is highly skewed, while $latex \langle \langle sm^2 \rangle_2 \rangle_9 $ is approaching normal. This should mean that a t-test will be deficient. But, for reasons yet to be revealed, I will be applying a t-test to this “sort” of data. So, I want to know what sorts of results it would return if we happened to believe every run from every model happened to be drawn from a population with the same mean trend and the same variance. That is: what sorts of results would be we get if we threw this into a t-test and all variability in trend was due to “weather”. (I don’t think either assumption I will also be applying other tests, and I’ll be asking some readers to suggest R functions and tests for heteroskedasticity at the end of the post.)
Rejection Rates for T-test
To determine how the skewness affected the false positive rate or rejection of the t-test, I wrote function that first generated 10 sets of $latex (\langle \langle sm^2 \rangle_2 \rangle_9 , \langle var(m)_2 \rangle_9) $, pairs and applied a t-test to check test if the null hypothesis $latex (\langle \langle sm^2 \rangle_2 \rangle_9 = \langle var(m)_2 \rangle_9) $pairs based on that set of 10 pairs of data using p=0.05. Of course I generated the data so that the null hypothesis is true.
If the t-Test “works” it will reject 5% of the cases. If it does not “work”, it will reject either too often, or too few times, and we would want to know this.
To complement this, I also performed two one sided tests with p=0.025%.
The results were:
- The t-Test rejected 4.9% when the alternative was $latex \langle var(m)_2 \rangle_9 – \langle \langle sm^2 \rangle_2 \rangle_9 <0 $; this is roughly twice the expected rate of 2.5%.
- The t-Test rejected 1.2% when the alternative was $latex \langle var(m)_2 \rangle_9 – \langle \langle sm^2 \rangle_2 \rangle_9 >0 $; this is roughly half the rate the expected rate of 2.5%.
- The t-Test rejected at an overall rate of 6.05% when the alternative was $latex \langle \langle sm^2 \rangle_2 \rangle_9 \neq \langle var(m)_2 \rangle_9 $. This exceeds the expected rate of 5%.
Summary:
If a collection of $latex (\langle var(m)_2 \rangle_9 , \langle \langle sm^2 \rangle_2 \rangle_9 ) $; pairs were created using where we know the two quantities are equal, the rejections rates T-test are imperfect, but still close enough to provide a somewhat useful test of the equality from sets of data that might be generated “in the wild”. (For example,possibly from model data). But it would certaintly be nice to find a better tests (and I have been thinking about them. 🙂 )
Bleg: Can anyone point me to the R test for heteroskedasticity? I want to think through how to apply that so I can do better, purer, clear tests of the underlying principle, $latex (var(m) , \langle sm^2 \rangle ) $ for a particular collection of data. I know both $latex (var(m) $ and $latex \langle sm^2 \rangle ) $ are variances and should be equal. Can anyone suggest the best test? Then, I can improve my R skills and apply a more reasonable tests! 🙂
PS: Thanks to the people who provided R tips. Also, I’ll be uploading the R script. (WordPress doesn’t let me use the convenient “upload” feature. Drat!)


The Breusch-Pagan test is available in package lmtest as bptest, but that’s for the residuals of a linear regression fit. Or there’s ncv.test in the car package, which is apparently also a B-P type test (from this link).
Hi Lucia,
Interesting, but I’m not sure where this effort is going.
http://www.inside-r.org/pretty-r
I would suggest the “car,” “DistributionUtils” or “moments” packages off CRAN subpage on the R Project website. You can also search the packages list for the key word – heteroscedasticity is a pretty common term in the list. If you download from CRAN you’ll want to capture the pdf for the package as well since it is the only manual available. You will want to check the “Depends” and “Suggests” lists for the packages as well, since they list other packages that are required or will improve results.
I just finished an R session in which I was working with temperature series and using the acf and pacf functions for obtaining some multiple orders of auto correlation and partial auto correlation. I wanted to table the results so I could put the information in a big graph of my own design.
To get the information I did “for” statements with something like ACF = acf(X)$acf[i] and so naturally I wrote:
PACF=pacf(X)$pacf[i] and keep getting error statements and never filling by matrix of values.
I will not tell you how long it took me do an unlist statement for pacf and find that I should have written:
PACF=pacf(X)$afc[i]. Now who should I blame (cuss at) for that? After the fact, the blame has to go to me for not doing the analyzing earlier. So now it is to bed without a snack (already had dinner).
Kenneth I’ve had similar day long fustrations. I blame the universe