Falsifying is Hard To Do! β error and climate change.
Recently, I posted a test one could apply to IPCC projections to see if future data falsifies a specific consensus prediction for global warming. I said that if the slope of GMST obtained by OCL was 0C/century or less for the 10 years including 2001-2010, inclusive, then, the 2.0C/century or greater predicted trends to a confidence of (1-α)= 95%.
The value “α” (alpha) just listed is the probability that one might decree that a null hypothesis is false when it is true.
However, some readers are likely to ask a different question. What’s the probability that I will fail to decree a null hypothesis false when it is false? Is that equal to α?
That is: If someone keeps predicting they will falsify, an they fail to do so in 10 years, do they have egg on their face?
The answer to the second question is: Not really. You’ll have egg on your face if you fail to falsify in 23 years!
Why 23 years? Well, egg on your face is when you fail to falsify and β (beta) error is less than or equal to your α error. In this case, that’s 5%.
It turns if the IPCC makes a prediction of 2.0 C/century or greater, it is falsified to the (1- α) = 95% confidence level if the measured trend is 0.0 C/century. But the β if the real trend is 0C/century, the β error associated with a 10 year trend is 50%.
So… while it’s true that if the real trend in GMST is less than 0.0 C/century, and you test the “hypothesis 1″ that the trend is 2.0 C/century or higher, there is a 50% chance “hypothesis 1″ will not be falsified in 10 years!
But, after 23 years β error for this test drops below 5%.
So, if you fail to falsify the 2.0 C/century, pack it in. Seriously! (Some of you will notice that Lubos Motl uses something like 20 years for his deadline for falsification. I don’t know the particular post.)
It was hard to falsify “no warming” too!
Today, let’s pretend it’s 1978 and some scientist advances a hypothesis that over the upcoming three decades, warming will take place at a average rate of 2.0 C/century.
Now, back in the 1978 it’s plausible to say that the general public needed to be convinced that such a prediction could possibly be correct. So, ‘the consensus’ hypothesis might plausibly be: There is no warming.
In a large sense, the argument wouldn’t have been about how fast warming might be, but whether or not there was warming at all. In this particular case, the skeptical Joe Q. Public, might decree that, he would assume the following null hypothesis was true until it was disproven or falsified to a confidence level of 95%.
where m is the real trend in the GMST determined by ordinary least squares.
So, now, our poor 70’s scientist is faced with a problem. He believes that m=2.0 C/century. But, no one believes him.
But then, Mr. Scientist does some calculations and shows that if, after 10 years, the measured trend in temperature, measured using OLS is 2.0 C/century, he will have falsified the trend of 0C/century or less.
To illustrate this to others, he tells them that if they are correct that m≤0 C/century, and previous statistics for variability of weather data based on measurements since 1880 apply, then after 10 years, the distribution of measured trends will follow a normal distribution as illustrated in the curve in red shown to the left.
Then, he points out that, if they are correct and m≤0, there is only a 5% chance that the measured trend will be greater than m>2 C/century.
That is: if weather varies as observed since 1880, there is a 5% chance that the measured trend will fall to the right of the dashed vertical line in the figure.
In that case, he tells them that if the trend measured from 1979-1988 inclusive, 2C/century or higher, he will have falsified their hypothesis to the 95% certainty level.
Most people will agree to accept this argument– particularly if he proposes this before the data trickle in. (Those that understand the assumptions get to quibble for a few technical reasons. But, people tend to be willing to accept more assumptions when the test is against future data. )
But what it turns out the measured value is too low?
So, now our poor scientist sits and waits 10 years. He waits and waits, and then, in 10 years, it turns out m< 1 C/century.
The public says: Sorry Charlie! Even though you predicted 2.0 C/century, that's still consistent with no trend.
Sitting here in 2008, we know that there actually was a warming trend during the second half of the century. Yet, poor Mr. 70s Scientist-man was sitting there looking wrong in 1988.
What happened?
Look at the blue curve!
If you glance at the figure of the normal distributions, next to the red curve, you’ll see a blue curve.
The blue curve is the distribution of results for the trend we’d have expect to see if the scientist had been correct. Notice that the scientist predicted m=2.0 C/century.
Now, notice that, even if Mr. Scientist had been correct about the 2.0 C/century trend, then after waiting 10 years, half the time– as indicated by the area under the blue curve to the left of the red dashed line– the measured trend would have been lower than 2.0 C/century. The other half the time, it would have been higher than 2.0 C/century
So, if Mr. Scientist had proposed a 10 year test to “prove” his hypothesis about a +2.0C/century temperature trend between 1979-1988, he would have been “dis-proven” half the time. Or so it would seem.
I use the quotes on “proven” and “dis-proven” intentionally– because failing to disprove the null hypothesis doesn’t prove it’s right!
Why do we accept the null hypothesis if it’s “not disproven”
We accept the null hypothesis as “correct” not because this particular statistical test proved it, but because we believed it to be correct before we ran the test!
Basically, we are stubborn and don’t change our minds unless we are shown convincing evidence.
Now, let’s suppose our hypothetical 70s scientist’s honestly believed real trend was 2.0 C/century. After his theory ‘failed’ the 10 year test, he likely still believes he’s right. After all, with only 10 years data, he knew there was a 50% chance that he couldn’t convince the public they were wrong.
He then “embraces the error bars” and begins to explain β (beta) error.
The probability that one can’t disprove a null hypothesis even if it’s wrong is called β error. In the example above β = 50%. That is, 50% of the time, the Mr 70s scientist-man would be unable to disprove the rival “null” hypothesis even if it were wrong and he was right!
But the scientist knows failing to disprove a hypothesis doesn’t prove the hypothesis. That’s why any scientist, if he feels sure of his theory, he collects more data.
Statistics tell us is that if we take more data, those normal distributions, illustrate by blue and red curves above, get taller and narrower, And, if enough data are collected, eventually, if the real trend is 2.0 C/century, the data will become inconsistent with 0.0 C/century.
How long could this take?
It’s naturally to wonder… “How long might a scientist have to wait to disprove an incorrect null hypothesis?”
Well, of course, if the scientist is wrong, he’d have to wait forever to disprove the null hypothesis of no warming. If he’s a crackpot, he will!
But, if the scientist were correct that the real trend is m=2.0C/century of warming, then in 16 years, the probability that he will fail to falsify the null hypothesis is only 3%.
So, if the trend during the 70s and 80s has been 2.0 C/century as our hypothetical Mr-70’s scientist-man thought, there was a roughly 50% chance weather would cooperate and he’d have ‘dis-proven’ no warming in 10 years.
There would have been a 95% chance he’d have dis-proven “no warming” after 16 years.
In some sense, then, if we are even handed about interpreting both β ; and &alpha error, then, if the guy failed to disprove the “null hypothesis” in 16 years, the public might actually say we consider his claim of 2.0 C/century to be falsified. After all, we were willing to give up our assumption of 0.0C/century or lower with α=5%; so maybe he ought to give up 2 C/century with β =5%.
(Of course, there is no actual statistical principle requiring us to be even handed, but it seems sort of fair. In reality, individuals can continue to believe anything they like for as long as they like. But at this point, any scientists would didn’t revise his claim would be considered a bit odd and the public would stop listening to him and decide he is “Mr. 70s-crackpot-man”.)
Could the scientists revise his claim?
Well… of course! He could claim that there is warming, but it was less than 2.0 C/century during that period for any number of reasons. He could devise a new hypothesis, based on a slightly revised theory. (This is normal in science.)
Naturally, whatever new theory he devises, he will come up with something that is not inconsistent with the data he just collected.
This is also normal in science. Who would devise a theory that was inconsistent with already existing data?
Maybe the new claim will be 1.0 C/century, or 1.1 C/century. Or something. Or maybe he’ll claim something happened, and persist with 2.0 C/century, or even ramp up to 6.4 C/century.
Naturally, if the theory is is revised and in a way that is known to be consistent with previous data, and to validate a new predictive claim, we test the predictive power of the new claim with new data.
Still, no matter what the Mr. 70s-scientist claims, as long as he quantifies his claim, it turns out that with each claim, there is an expected β error, as a function of α and the number of years. We can always test the claim, and we can always sort of figure out when he’d better revise or be considered a crack-pot.
It’s perfectly easy to run numbers and create a chart showing the β error for a test as a number of years. I did so, assuming α=5%, and weather varies as I described in my earlier blog post.
If you look to the left, you’ll see that the real trend were m=1.1 C/century, then there is an 80% probability that an 10 year trend would be “not inconsistent with m<0 C/century to the 95% confidence level".
If the 70s scientist wanted to avoid looking wrong, he should have looked at this charge and picked a time period where he would falsify the null hypothesis at least 5% of the time. So, if a scientists really believes m= 1.1 C/century, then he should plan on taking waiting up to 23 years. (He might get lucky and prove himself right sooner, but he can’t plan on it.)
And of course, if the scientist was unable to disprove no warming in 23 years… well then it’s probable that warming probably either a) wasn’t happening or b) it was slower than 1.1 C/century.
So, what did happen?
In reality, we all know what actually happened. Any hypothetical late ’70s scientist who suggested warming would happen, over the next few decades did manage to falsify “no warming”.
Ultimately the temperature did rise during that period, and it rose fast enough to disprove “no warming” during that period.
So, even though the late 70s and early 80s didn’t look promising for the theory of AGW, starting around the mid-80s, temperatures rose briskly.
A skeptic can argue about the cause but statistical reality is that the temperature did rise. The rise was predicted before it happened. Moreover, given reasonable estimates of weather variability from past thermometer measurements, the trend does seem statistically significant.
Because the data are largely consistent with predictions of warming made at least as early as the 70s, today, the claim that at least some warming happened and will continue to happen is largely accepted by scientists and the public.
In that sense, it is the consensus claim.
Because at least some warming is now the consensus claim, those who believes there is no true trend, and this is all due to natural variability are now on the flip size of the “falsification’ equation.
At this point, to convince most people there is no real warming trend, those who take this contrarian view are in the position of 70s-scientist man: They must. That’s difficult: falsification is hard to do.
But if a theory is false, it will falsify….. eventually. Weather being what it is, generally speaking disproving prevailing consensus takes… well… 20-30 years.
If course, if the consensus is right you’ll keep failing to disprove it forever!
Previous Post:
« A Look At Temperature Anomalies: Is there disagreement?
Next Post:
My notes on Cochrane-Orcutt: Applied to GMST. »
15 Responses to “Falsifying is Hard To Do! β error and climate change.”
You can leave a response, or trackback from your own site.



Dan Hughes (35 comments.) March 5th, 2008 at 7:09 am
From Press et al., Numerical Recipes, Chapter 13, page 454 of the first edition of 1986:
But we need people who will traffic with stats. I’m just glad that I’ve not had to be one of them.
lucia March 5th, 2008 at 7:59 am
Dan–
I’m only doing undergraduate stuff from books I used sophomore year in college!
One of the reasons hypothesis testing isn’t math itself is that you need to say what the hypothesis is.
When doing tests on past data, this permits soooooo much cherry picking. It often doesn’t even seem like cherry picking to the people doing the analysis. But the people you are trying to convince always recognize the cherry picking. (Why start hurricane counts in 1970? Or assume ‘X’? Or look at temperatures since precisely 1998?)
In some ways, stating what the test hypothesis and describing how to do the test before collecting data, encourages clarity. Doing this before data comes in means we don’t have each person suggesting a hypothesis they actually cherry picked. We also don’t end up with people insisting on precision for the steps that they know in advance help their point and going all loosie-goosie on steps that, if implemented fully, hurt their point.
I am totally imprecise on red noise. I also do no tests for normal data or normally distributed residuals etc. I’m doing this with the precision one would generally do to estimate how much data you will need before figuring out the budget for doing an experiment.
These sorts of scoping calculations are actually rather routine done for lab or field work! You might not do this formally, or document it, but you will do it on the back of an envelop. Who wants to start an experiment only to discover that the uncertainty is so large that even if an effect is real, you probably can’t prove it in less than 1,000 years?
David B. Benson March 5th, 2008 at 2:26 pm
These questions provide examples where the Bayesian factor method is likely to lead to sharper results:
E.T. Jaynes
“Probability Theory: the logic of science”
lucia March 5th, 2008 at 3:07 pm
David–
That’s interesting. Could you elaborate? What do you mean by sharper?
BTW– since many people argue by rhetorical question, I want to make sure new visitors know that mine aren’t.
Most applied statistics I’ve done have been for laboratory experiments. When designing a lab experiment, we usually try to take enough data to avoid having to tease out results using the sorts of techniques econometricians or paleo-people have to do. We aren’t stuck with waiting for the earth to go around the sun and weather to happen.
I do know that in this area, which is politically contentious, I want to explain what the answer is if someone insists that “no warming” must always be the null hypothesis because they think that’s some sort of inviolable rule. I also want to explain what the answer is if someone else insists that “x C/century” must be the null hypothesis because that’s what some governing body says. There are repeated arguments that amount to arguing over what one must accept as the null hypothesis at various blogs.
Bayesian gets used to mean that we assume some sort of probability to some original hypothesis, right?
Don Fontaine March 5th, 2008 at 5:10 pm
Lucia,
Type I and II errors are much more clearly described at Wikipedia http://en.wikipedia.org/wiki/T.....pe_I_error
I think you got it wrong…. alpha gives the probability of type I error. Beta gives probability of type II error.
Don
Don Fontaine March 5th, 2008 at 5:29 pm
Lucia,
Sorry I type and edited too slow, meant to say…
Type I and II errors are much more clearly described at Wikipedia http://en.wikipedia.org/wiki/T.....pe_I_error
I think you got it wrong…. alpha gives the probability of type I error. Beta gives probability of type II error.
You say “The value “α” (alpha) just listed is the probability that one might decree that a null hypothesis is false when it is true. ” This sounds like a false negative, type II error.
Type I error (measured by alpha)is:
Type I errors (the “false positive”): the error of rejecting the null hypothesis given that it is actually true; e.g., reject hyposthesis of no difference (That is you conclude there is a difference) when there is, in fact, no difference.
Type II errors (the “false negative”): the error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., Failing to reject the hypothesis of no difference, (that is you conclude that there is no difference) when there is in fact there is a difference.
Don
lucia March 5th, 2008 at 5:41 pm
Hi Don,
I think Wikepedia and I agree:
Wikipedia:
Me
Wikipedia’s word <==> my words.
“Reject the null hypothesis” <==> “Decree the null hypothesis false”.
“When it is true” <==> “When it is true”.
By both Wikipedia and my choice of words, alpha error is a false positive. Commonly, the null hypothesis is the equivalent is “nothing happened”.
So, “Rejecting nothing happened” is finding “Something happened”. If nothing actually happened (that null hypothesis is true) then you got a false positive.
Alan D. McIntire March 5th, 2008 at 5:59 pm
1. I became aware of autocorrelation on reading Climate Audit. I’m sure either side of the alpha beta argument
could counter with the argument that year to year temperature fluctuations are not completely random, but dependent on previous years’ temperatures. You yourself argued in a previous thread that Schwartz underestimated his time constant, so you were obviously giving a simple statistical test which you do not truly belive in.
http://rankexploits.com/musing.....-suggests/
Hans von Storch addressed this problem here:
http://coast.gkss.de/staff/storch/pdf/misuses.pdf
Storch gives examples of “detrending” date, but that was for AR(1) models. I’m not sure how it would work
against more complex models.
From what I understand of his article, you’ve got to run it against truly random data: Schwartz would say that every 5th year constitutes random data, I suppose you would say every 8th year.
2. As I said before, I was blissfully ignorant of autocorrelation until reading Climate Audit. I get the impression that if a process is truly random, the residuals should follow a random distribution.
I get this idea for an autocorrelation test from an old textbook: “Introductory Mathematical Statistics”, by Ervin Kreyszig. It gives a test for “runs”- checking if a run of heads or tails, (or in this case temperatrue fluctuations above or below the trend line). For 2m events, m heads and m tails, the mean number of runs should approach m + 1 as a limit. The variance in number of runs should approach (m(m+1))/(2m-1). This approximation is supposed to work well for 2m ove 20. I presume that with strong autocorrelation, the number of ‘runs’ of temperature above or below the trend line would be signicantly different than m+1.
The number of runs of 1, 2, 3 etc from flipping a coin should be 1/(2^(n+2)), so in a sequence of 1024,
you should get about 1024/8 = 128 runs of 1 head, 128 runs of 1 tail, 64 runs of 2 heads, 64 runs of 2 tails, etc. As a second possibility, maybe a Chi test would work to test for autocorrelation in this case.
I’m sure there are better test already, I’d just like to know if my two off the top of my head ideas make
sense, or these off the top of my head ideas are like dandruff-small and flaky- Alan McIntire
lucia March 5th, 2008 at 7:20 pm
On the first part of your comment: Yes. I’ve tried to comment on the fact that I’ve neglected serial correlation in the residuals in the current (and most recent) blog post. Serial correlation in the residuals matters.
I’m actually looking at fixing that up right now, but the reason I’m comfortable neglecting it with annual data is the correlation isn’t very big for annual values. (It should be looked at, but it’s not all huge and doesn’t affect the estimate of how long it should take to falsify or validate. The residuals for monthly data are highly correlated.)
I will be writing more about this though.
On the second bit: It’s not the serial correlation in the residuals I criticized Schwartz for. (Although that does matter.) What criticized him for was looking at the data in an odd way that made it difficult to separate out actually measurement error from real weather variability. His instrument measurement error does not obey conservation of energy. (Tamino also neglected to recognize this distinction, and that’s why Tamino’s simulated temperatures don’t look like temperature measurements. )
Funny you should mention Kreysig. I’m getting beta and alpha error from… “Advanced Engineering Mathematics” by…. Erwin Kreyszig
I know what correlation is. Before I bowed out of research to be a slug, I wrote things like The influence of a mean fluid velocity gradient on the particle-fluid velocity covariance. However, I’m not used to dealing with these sort of time series. In turbulence experiments, you either design to sample so fast you can pick up the full shape of the autocorrelation function or you sample so slow, you make sure your data point are uncorrelated.
In between makes data analysis difficult and error prone. It’s best to avoid it.
Inadequate Reasons to For Suggesting the Falsification of IPCC Projections Doesn’t Apply. | The Blackboard March 17th, 2008 at 10:23 am
[...] So, why can others find strings where the IPCC trends are not to falsified? Well, for short tests, the major difficulty is β error is large. In fact, with less than 10 years data, if the trend were 0C/century, we expect β=50% of all 10 year trends calculated based on annual averages would fail to falsify a hypothesis that the underlying trend is 2 C/century or greater. (After 15 years, that β~5%). I discuss this here. [...]
Niche Modeling » Surface Temperatures - estimating the SD of the trends March 18th, 2008 at 12:44 pm
[...] What are the exact conditions under which global warming statements can be falsified? Over at the blackboard, Lucia has been giving this controversial topic well deserved attention. After all, it is pretty [...]
Comparing IPCC Projections to Individual Measurement Systems. | The Blackboard March 25th, 2008 at 10:32 am
[...] variability. This elevates β error, without reducing α error. I discussed β error previously and explained that if a null hypothesis is actually wrong it can take many, many years of data to [...]
Niche Modeling » Recent Climate Observations Compared to Predictions by Rahmstorf et.al. - a review March 25th, 2008 at 1:32 pm
[...] no significance tests quoted for the trend!. Unlike other examinations of IPCC projections here and here, no attempt has been made to determine if the trends are due to climate variability. As reported, [...]
Geoff Larsen March 26th, 2008 at 2:37 am
Lucia as you are aware, David Stockwell in his blog Niche Modelling (see comment 1344 above) has done so nice detective work in finding the providence of the smoothed temperature trend in Rahmstorf et al.
http://landshape.org/enm/
He sources the smoothing technique to a paper by Michael Mann, “On smoothing potentially non- stationary climate series”, GRL, 2004. Mann advocates the use of a “minimum roughness” constraint for the end of a time series.
http://holocene.meteo.psu.edu/.....nGRL04.pdf
Steve McIntyre has some things to say about this technique, in another context.
“As I noted in the earlier post, Mann’s “minimum roughness” constraint, when translated from inflated Mannian language, boils down to a reflection of the series both horizontally and vertically around the final value”.
“When I wrote a little routine to implement Mannomatic smoothing, I noticed something really funny. I know that it seems bizarre that there can be humor in smoothing algorithms, but hey, this is the Team. Think about what happens with the Mannomatic smooth: you reflect the series around the final value both horizontally and vertically. Accordingly with a symmetric filter (as these things tend to be), everything cancels out except the final value. The Mannomatic pins the series on the end-point exactly the same as Emanuel’s “incorrect” smoothing”.
http://www.climateaudit.org/?p=1681
Looking at the series in Rahmstorf et al, which ends in 2006, this appears to be the case. I don’t think this solves our problem re the relationship between the series trend & the IPCC chart, however if 2008 should turn out to be quite cool it would be interesting to see this chart updated at the end of the year, using this technique.
IPCC Projections Continue to Falsify | The Blackboard April 21st, 2008 at 1:00 pm
[...] Comments: Falsifying is Hard To … [...]