Yes, There is Statistically Significant Warming… Since the FAR!
Yesterday, I tried to decipher Tamino’s method of proving statistically significant warming since 2001 and applied it, incorporating data since the time he did his test. The statistically significance vanished, “Poof!” Of course that happens when we have short trends of data. Even when hypotheses are wrong we sometimes can’t prove they are false.
However, it is my philosophy that when we apply a test, we must apply and interpret it even handedly. So, having applied “The Method Tamino Seems to Use” (as best I can decipher it), I thought I would show the results using different “start” dates. (That’s what Tamino did here.)
Today, using various start dates, and assuming the August 2007,’Tamino Method’ is now or ever was valid, I wanted to determine the answers it gives to two questions:
- Can we exclude the possibility the underlying trend is 0C/century at to the 95% certainty?
- Can we exclude the possibility the underlying trend is 2C/century to 95% certainty?
Why these two questions?
I test the 2C/century projections for a simple reason: It is the central tendency projection in the most recent report by the IPCC; that is the AR4.
Testing 0C/century is a bit harder to justify. I suspect almost no-one believes the long term underlying trend is 0C/century. (I don’t.)
So, if some blogger suggests proving 0C/century doesn’t apply represents a counter argument to a widely held claim, that blogger likely enjoys jousting with strawmen. However, 0C/century is important as a benchmark. When someone points to recent high trend, and suggests it indicates some sort of runaway warming, or that the IPCC TAR might have been under predicting warming, it is always worth testing whether or not 0C/century would even have been excluded based on the data.
So, now that I’ve convinced myself that I will convince even first time readers this test is worth doing, here is graph showing the results of the test:

Trends are calculated using OLS starting with the year indicated and ending with data up to April 2008. Uncertainty intervals computed using a method described by Tamino, and also in Lee & Lund 2004, Biometrika 91, 240–245. The trend is fit to a merged average of temperature from GISS, HadCrut,RSS,NOAA and UAH MSU, through April 2008.
Interpretation of Graphs
Using method Tamino endorsed in August 2007 and described briefly in Lee & Lund:
- Starting with data from Jan 2001 (the year the FAR was published) we see the IPCC AR4 projection of 2C/century for the first three decades of the 21st century, are rejected to a confidence of 95%. The best fit trend since 2001 is -0.5 C/century. Based on data since 2001, trends greater than 1.7 C/century /century and those lower than -0.5 C/century are excluded to the 95% confidence.
- Staring with data from Jan 2000, 2C/century is not excluded.
That said: Jan 2000 is very near a relative minimum for the data and results in a maximal trend. Also, other than being a ’round year’ there is very little in favor of selecting this year to test a projection in the AR4 (Fourth Assessment Report). It is one year before publication of the Third Annual Report (TAR), and data up to Jan 2001 are included in graphs of hindcast comparisons in the AR4. However, since 2000 is a round numer, people will want to see the result. That’s why I show it.
- Starting with data from 1995 and/or 1996, the years the SAR was first published and later finalized, 0C/century cannot be excluded to the 95% confidence level.
There is a warming trend, but it is not statistically significant. The 2C/century AR4 projection for the 21st century is also not excluded. However, this is irrelevant, as the AR4 projections do not predict 2 C/Century for that time period.
- Starting in 1990, the year the First Assessment Report (FAR) was published, we can exclude 0C/century is excluded to a confidence level of 95%. With regard to IPCC reports, this indicates that the warming trend since publication of the First Assessment Report (FAR) is statistically significant.
This is an important result, as it shows empirical record of statistically significant warming even if we exclude data that pre-exist the first IPCC report.
Readers are reminded: failure to falsify is of little consequences, as this is a feature of statistical tests. When data are limited, one frequently cannot exclude hypothesis even if they are wrong.
The best fit trends and uncertainty intervals are shown below.
| Time Frame | trend | Comment |
| Jan 1990-April 2008 | 0.8 C/century < m < 3.1 C/century | OC/century, falsified. |
| Jan 1995-April 2008 | -0.4 C/century < m < 3.2 C/century | |
| Jan 1996-April 2008 | -0.6 C/century < m < 3.4 C/century | |
| Jan 2000-April 2008 | -1.4 C/century < m < 3.8 C/century | |
| Jan 2001-April 2008 | -2.6 C/century < m < 1.7 C/century | 2C/century, falsified. |
| Note to the curious: I still prefer Cochrane-Orcutt, which applies to series when noise is AR(1). Today’s exercise is here to answer those who ask things like “Why don’t you use OLS?” See, I look at both. |
||
Caveats
Are there caveats? Tons. As I previously stated, these results and conclusions assume the method Tamino applied to similar data is now, or ever was, valid when applied to this data.
It’s possible the uncertainty bound are too small. Tamino and Gavin have suggested these uncertainty intervals are too small. If so, it may be that 0C/century cannot be excluded since 1990 and 2C/century cannot be excluded since 2001. However, Tamino used them when the suited his case, and has not quantified how much larger any uncertainty bands ought to be. Gavin’s estimate is based on a multi-model ensemble involving models including a range of parameterizations. Even with regard to any individual model, we can’t know if it correctly reproduces the variability of the true earth weather noise. In contrast, this empirical method estimates the weather noise based on weather that occurred on the real, physical, earth.
It’s also possible the official records for temperature data are flawed. Recent papers published in Nature suggest that Sea Surface Temperatures measured during WWII and shortly thereafter may soon be revised; the interview in Nature suggested there may be some revisions to more recent data.
Finally, when doing a statistical analysis, any particular result could represent an outlier. So, even if this method is valid. Statistical tests are always associated with specific confidence limits. The confidence interval for this test is set to 95%; we expect a to falsify even true results at a rate of roughly 5%.
In the meantime, I’ve used “official” data and applied the same method to test 0C/century trends that I apply to 2C/century trends.
Comments Closed: If you would like them re-opened, Contact Lucia


Comments
George Ismael (Comment#3239) June 5th, 2008 at 10:05 pm
Hi Lucia.
The bottom line of the table should be labeled Jan 2001 – Apr 2008, I think.
Keep up the great work!
MarkR (Comment#3240) June 5th, 2008 at 10:40 pm
Lucia
Given the questioning of the data, would it be possible to do the analysis according to Watts Gallery Category 1 top quality US station data?
This would give an idea of the effects given the use of (possibly) more reliable data.
I have a feeling that using the Watts Category 1 type data would lead to a proof that the IPCC forecasts have been statistically disproved for the Continental USA, and that the trend may even be minus.
Very difficult to explain for an area which is said to be the leading producer of man made CO2.
Nick Stokes (Comment#3241) June 5th, 2008 at 11:08 pm
Lucia,
My past objections to your falsifications haven’t been to your statement that a .2C/decade linear progression for 2001-2008 has been falsified, but to your claiming that this is an IPCC prediction. It isn’t. You’re now mostly being carefully to describe it as a IPCC “central tendency”, which is strictly correct, but meaningless except insofar as it sounds like a projection. And you certainly haven’t rejected “the IPCC AR4 projection of 2C/century for the first three decades of the 21st century”. You can’t, on present data.
But there is another objection, which is to the basis of all these tests – are the residuals iid normally distributed? My guess is that they are not – there seem to be too many outliers in the past data. There are standard tests. This is different from the issue of autocorrelation, for which you correctly use C-O. Normality is important, because you are testing whether something derived from the residuals is within the limits of the normal. If they really belong to a more fat-tailed distribution, say, then you’ll be falsely falsifying.
MarkR (Comment#3242) June 6th, 2008 at 12:42 am
Chris H (Comment#3243) June 6th, 2008 at 2:50 am
I just had a look at the CET (central England temperature) series that starts in 1659 and the OLS trend over 300 years from 1659 through to 1958 is 0.23C/century (1659 through to 2007 increases this to 0.24C/century so the end year is not too critical). Given that this is a long term trend that pre dates significant anthropogenic CO2 emissions (ACO2E) trend, shouldn’t you be using 0.25C/century as the base line instead of 0C?
Interesting that 1959 reappears again (my birth year as well btw)
fred (Comment#3244) June 6th, 2008 at 3:15 am
Nick Stokes -
One can rescue IPCC all right, by claiming that they never did predict the warming being tested for. You can also rescue the models by claiming that the error bars are too narrow and that they are consistent with various degrees of cooling.
However the problem merely shifts to another part of the wood. This is characteristic of a socially important hypothesis becoming critically inconsistent with observed facts. The result is that the price paid for preserving the hypothesis is too high. If we preserve the IPCC by widening the error bars, we thereby make the projections useless as indicators for immediate action – they become right, but at the price that they predict almost nothing because they exclude almost nothing, and so they no longer imply act now before it is too late.
If we take the tack that the IPCC did not forecast a given level of warming, fine, this saves the IPCC, but it does not save the prediction. It is interesting to test the prediction, regardless of who made it, and Lucia is showing that, no matter who made it, 2 degrees C per century is incompatible with recent observations. So, if there is minimal chance of this level of warming, what starts to look very pale and sickly is the proposition that catastrophe faces us if we do not act now. The IPCC may look better but the hypothesis doesn’t.
This stuff is going to continue and get worse. The strategies needed to save either the IPCC, prominent AGW proponents, or predictions of catastrophic warmings are going to get more convoluted and more extreme and the collateral damage to the hypothesis from using them more and more pronounced.
Another classic case of this in action was Tamino’s attempt at a defence of the MBH PCA method. It was downright embarrassing when you finally grasped what he’d had to do to try to defend it. The endless adjustments to the instrument record are in the same category. The sea temp adjustments are now moving into this category.
A few more years of static or cooling temps, and this will be a chapter in textbooks on the philosophy and methods of science, how hypotheses are formed, tested, defended and finally discredited. It has not happened yet, more data is needed, but we are well on the way.
Lucia, by the way, what happened to R? Is Excel just too easy and familiar, and time too short? It would be very understandable if so. I have not made a lot of progress with it myself.
Bob B (Comment#3246) June 6th, 2008 at 6:38 am
“If we preserve the IPCC by widening the error bars, we thereby make the projections useless as indicators for immediate action – they become right, but at the price that they predict almost nothing because they exclude almost nothing, and so they no longer imply act now before it is too late.”
Very nicely put. You could replace “IPCC” with any of the climate modelers pet models
lucia (Comment#3248) June 6th, 2008 at 7:07 am
Alan Blue–
Is there good data back to 1800? I think HadCrut only goes to 1850.
MarkR–
Is Anthony’s Station data processed and posted?
NickStokes
2C/century is an IPCC projection. 2C/century happens to be the central tendency so when I often discuss this particular metric and state that value. I have also looked at the uncertainty intervals. Using Cochrane Orcutt– which is more suited to this type of data, falsifies both the central tendency. Here is the graphical representation using data up to Feb.
With OLS, the empirical trend is a bit higher, and the lower uncertainty intervals don’t falsify. I’ve never said otherwise.
Nick– If you would like to do the hypothesis test another way, feel free to do so, and explain it by showing an example with real GMST data. Afterwards, I’ll be happy to reapply it to all tests in the table.
One of my points with this post is this:
This is the exact same test that bloggers like Tamino say prove there is (or was) statistically significant warming. It is the test that frequent commenters like Ray Ladbury are linking to when insisting claims of no statistically significant since 2000 should be ignored.
Whatever flaws the test has (and it has many), those flaws hold both to the proof that 0C/century is excluded at the 95% confidence intervals or that 2C/century is excluded.
lucia (Comment#3249) June 6th, 2008 at 7:33 am
The other question is this: Why were the uncertainty bars on the IPCC graphics and in their text, so much smaller than the ones now being used to claim the IPCC projections are high?
If the IPCC as a whole thought these wider uncertainties were meaningful, they should have communicated them. They did not. We don’t know the basis for their choice.
What seems most likely is that, after consideration, the IPCC thought the uncertainties in the figure above where the correct ones to communicate to the public. This is why I think testing the IPCC projection must be done using the figures and values in their documents, not by fishing out data from an archive, reprocessing it, figuring out what they should have or might have said, had they only been wiser, and then testing that.
rex (Comment#3250) June 6th, 2008 at 8:27 am
May 2008?
Nick Stokes (Comment#3251) June 6th, 2008 at 8:32 am
Lucia,
Again I agree, generally, that the proposition that the temperature is following a linear trend at .02C/year between 2001 and 2007/8 is, according to your reasonable analysis, falsified. On your IPCC plot shown here, I note that you’ve drawn the lines out to 20+ years, which certainly makes them look out of range. But that’s not what you found. They should only be 7 years long, which looks much less convincing. The AR4 says that the colored area corresponds to 1 sd, not 95% (which would be about twice as wide). That reduces the apparent discrepancy even more.
Figure 10.5 in the AR4 gives much better detail of the range than Fig 10.26, and also makes it very clear that they are not making a straight line projection.
So it’s not the test re 0.2C/yr that I would do another way – as I said once before, I think RP Jr’s plot using your mean is quite a good way of assessing the IPCC projection. I will, if I get time, test your residuals for normality.
lucia (Comment#3252) June 6th, 2008 at 8:51 am
Vincent: UAH has reported their May temperatures, but GISS, Hadley, NOAA and RSS have not yet done so. I’ll add May when all agencies report.
Nick–
Yes. The lines on my cartoon graphic are too long. Unfortunately, the IPCC only conveys their uncertainties graphically, not in words. (The central tendency is in words, and I quoted the paragraph back in March– they say 2C/century.)
So, to show the comparison, I slapped stuff on the graph. I don’t have sophisticated graphics packages, and it’s diffcult for me to draw decent slopes and make small enough lines.
Figure 10.5 is not their projection for the underlying climate. Those are traces of individual realizations of weather.
It id very important to compare like to like. That’s what I’m doing: Like to like.
The projections are based on multi-model means– and represent smoothed climate trends. My straight lines represent the possible smoothed straight lines should corresponde to smoothed climate trends.
If you don’t believe me, visit Tamino’s post “Garbage is Forever”. He’s explained this idea to those who doubted him using rather splenetic prose.
I quickly calculated the skewness since 1985. (I picked that date because my file went back that far. The 1990 average is computed using data from 1985-1995) The skewness is such that we would expect excess false- falsifications on the positive side. (That is to say: given longer term characteristics of the data, we expect to decree that we’ve falsified “0C/century” too often.)
I’ll run the Chi-Squared test for a normal distribution of the residuals. But we already know OLS is a bad method for this data– even if Tamino uses it. I’ll do that for Cochrane-Orcutt too.
But as I said, the purpose of using Tamino’s analsis is this:
If one accepted this “Tamino test” as convincing with regard to demonstrating that there is statistically significant warming from 2000-July 2007, then one should find it equally convincing for the 2C/century. Tamino, and a number of persistent commenter at other blogs, are telling people they must believe results of the the “Tamino method”.
I think the Tamino method is not-so-hot. But, regardless, whether good or bad, it’s strengths and flaws are identical when applied to testing 0C/century or 2C/century.
lucia (Comment#3253) June 6th, 2008 at 8:59 am
Nick–
On this:
Yes. The 2sd intervals should be about twice as wide. But I never said their lower 95% range was excluded. I’ve said the central tendency is biased high, and the 1sigma look high using Cochrane-Orcutt.
For some reason, the IPCC felt it was better to only show 1sd. This has an impact on their graphics, and creates a general impression on readers. The knowledgable ones may know enough to correct; those with less understanding of statistics won’t.
We don’t know why they chose to show the 1sd uncertainty intervals. But they did.
Also, when I refer to the uncertainty intervals suggested at other blogs now, Gavin is suggesting that some sort of meaningful 1sd uncertainty on 7 year trends is twice as large as shown in the IPCC graph.
Unfortunately, for Gavin, he must have never compared his model 7 year trends to data. For his to be true, absolutely, positively all the deviation from a straightline fit to data since 1880 is weather noise in the absence of volcanic eruptions, and variations due to human factors. How one could possibly perform attribution studies under those circumstances is a mystery.
Arthur Smith (Comment#3254) June 6th, 2008 at 9:11 am
One of the questions has been what is the fundamental “weather noise” – all the discussion about past 7-year trends etc recently was trying to get at what the error bars should be. But you can get at least a little bit of a picture from Lucia’s “Trends and Uncertainties” graph up above, if you look at the size of the error bars as a function of the length of the time series.
For 1990, the 95% CL interval is 2.3 C wide, for 1995 it’s 3.6 C, for 1996 it’s 4.0, for 2000 it’s 5.3, but then somehow mysteriously for 2001 it drops to 4.3 C. Isn’t that at least a little suspect, that somehow the 2001 – present measurement of weather noise is anomalously low for the period? Logically the error bars should monotonically increase as you reduce the length of the time series – at least until you get to such a short time period that autocorrelations interfere. Perhaps that’s what’s happening here?
George Tobin (Comment#3255) June 6th, 2008 at 10:08 am
1) You do not appear to be using Tamino’s method because your analysis is devoid of self-satisfied snarky condescension. You must be using some other method.
2) Chris H’s point that there is already a built-in, long-term “get us the hell out of the Little Ice Age” warming process/trend in place is well taken. The very fact that you start with an assumption of zero seems problematic.
3) I think that the current modest non-warming trend permits a prediction about alarmist predictions: The less confidence there is in the models, the greater the amount of predicted warming and greater the damage from catastrophic AGW.
There is the notion coined by judge Learned Hand (best jurist name ever!) that the amount one should be legally obligated to spend to prevent harm is equal to the risk of that harm occurring times the losses if it occurs (Burden=Probability x Loss, B=PL).
Applying this to AGW, if we assume that the ideological commitment to impose a rather costly economic and political order to achieve zero atmospheric carbon growth is constant (B), then if the probability of such warming appears to decrease, the scope of loss and damage (L) has to increase to keep B constant.
Thus according to this calculus, the shakier the prediction (P) then the higher the eventual warming and the greater the scope of planetary damage that must be predicted. In other words, The Constant B Model predicts that if the current non-warming goes out another 3-4 years, Hansen and the RealClimate Consensus Chorus will of necessity predict an eventual net 6-10 degrees instead of 3-5.
MarkR (Comment#3256) June 6th, 2008 at 11:39 am
Lucia
The SurfaceStations.org survey is 43% done.
http://www.surfacestations.org.....onlist.htm
There seem to be 23 category 1 stations surveyed so far.
It should be possible to get the data for those stations and do your comparison. Hope you can find the time and the inclination.
lucia (Comment#3257) June 6th, 2008 at 11:56 am
Arthur–
I agree the drop in the size of the uncertainty intervals is an oddity, but it’s what you get using the correction method Tamino uses.
What happens with this data is that during periods with large volcanic eruptions, the lag1 autocorrelation ρ is very large– owing to the volcanic eruptions. As we move out of that period, ρ drops. So, that factor of (1-ρ)/(1+ρ) drops and the “effective” number of data points increases faster than the real number of data points, N.
It is also an suprising that the estimate of ρ drops a lot when I start from 2001 instead of 2000. It does so in a way that the uncertainty for 2001 ends up less than for 2000. I agree with you this is odd!
So, those numbers are what you get with Tamino’s method. As you can read above, Martin Ringo thinks there are better methods (and I agree.) But, of course, the trick is identifying one and applying it.
I’ll re-do all this using C-O next week — I’m pretty sure we’ll get a similarly odd decrease in the uncertainty intervals. The reason I suspect this is that Tamino’s method and mine always gives similarly sized uncertainty intervals. The differences in the hypothesis test are due to the differences in the slope that arise from either ignoring it’s existence when calculating the slope (OLS) or accepting it’s existence. Tamino’s method sets rho;=0 when determining the slope but then accepts uses a computed magnitude when estimating the uncertainty intervals.
It takes just a tiny bit less work than using C-O. The extra step in C-O seems to usually bump up the estimate for rho; compared to the value after just OLS, but I don’t know if that always happens.
On the general size of the uncertainty bars: I do think this method gives undersized uncertainty intervals. That would apply to all cases shown though. I haven’t been able to quantify how undersized they are. The upper bound of the error based on comparisons to historical data without volcanic eruptions i — as far as I can tell– they are too small by a factor of 1.2 (on average).
Alan S. Blue (Comment#3258) June 6th, 2008 at 3:04 pm
Yes, you move out of the realm of purely instrumental measurements and into temperature reconstructions as you go back to 1800. Wikipedia’s entry on “Temperature Records” has a graph “Reconstructed Temperature”. And there’s also the entry “Temperature_record_of_the_past_1000_years”. Both have links to a fair amount of the tabulated data on that period. I favor glacier measurements over tree proxies.
lucia (Comment#3260) June 6th, 2008 at 3:42 pm
Thanks Alan–
If we move into reconstructions, I’m not going to stay away from that. Picking a temperature source right now causes enough controversy– but at least GISS, Hadcrut, NOAA, UAH and RSS aren’t wildly different. Picking a paleo reconstruction from others? I’m not going there!
Nick Stokes (Comment#3261) June 6th, 2008 at 4:37 pm
Lucia,
As I understand it, the IPCC Figs 10.5 and 10.26 are intended to express the same information (“projection”) about temperature. In 10.26, they are trying to show what happens to a whole lot of variables, so they use an sd band as a summary statistic for the model spread shown in 10.5.
I think there’s an important distinction there. A standard deviation is a summary statistic for the spread of any set of numbers. People do tend to then apply normal distribution thinking, but there’s no such connection implied. And for this kind of model spread, it’s not even really probabilistic – ie there is no underlying distribution at all.
I think it is eminently reasonable for the IPCC to choose 1 sd in its graph. It is after all, a standard deviation. Not everyone is looking at the graph with falsification in mind. When we give a value of 10±3, the ±3 represents a ±1 sd range.
I would test the residuals for kurtosis rather than skewness – it’s a more sensitive test for tail “fatness”.
Nick Stokes (Comment#3265) June 7th, 2008 at 3:23 am
Lucia,
I tested your residuals for normality. I used the values in your spreadsheet WattsFourMetrics, OLS residuals (E16:E100) and C-O (F16:F100). They included data up to Feb 2008. I’ll just give the C-O results. I used Excel skew() and kurt().
Skewness -0.48 kurtosis 1.33 The distribution is “fat-tailed” (kurtosis>0)
I used the Jarque-Bera test
The statistic JB=9.38 This is distributed chi-squared, two d.o.f.
So it is significantly different from zero with 99% confidence
It’s not normal, and deviates in the direction of higher probability of outliers.
OLS gave 11.38 – even more significant.
So your tests err on the side of falsification.
lucia (Comment#3267) June 7th, 2008 at 6:35 am
Nick–
Before dinner last night, I ran the ‘Chi-square goodness of fit’ on the data from 1990-now. I got skewness 0.36, kurtosis 0.78, and the chi-square test indicates the distribution would arise 25% of the time by random chance.
However, since I ran the chi-square tests using the histogram Excel auto-generated, I was going to run tests with more things before reporting back. (Most statistics books say bins should contain at least 5 samples for the chi-square test, so I need to adjust the ends.)
That said: I have been reporting that compared to historic data, this test seems to give uncertainty intervals that are about 20% too small for periods without volcanos. So, as I’ve been saying….I’m looking into figuring out a way to get better estimates.
Since this is quick I can report skewness and kurtosis for the various periods:
Year 1990 1995 1996 2000 2001
kurtosis 0.75 0.90 0.80 0.69 1.06
skewness 0.36 0.47 0.48 -0.49 -0.35
As for this being “my” test… the one in this post is Tamino’s. The other one is mine.:)
I’ll let you know what Chi-squared says for the five after I bin properly. (But probably next week. I don’t do much of this on the weekend. I’m take my mother in law off my father-in-law’s hands.)
avfuktare vind. (Comment#3270) June 8th, 2008 at 3:50 pm
Nick Stokes,
” And you certainly haven’t rejected “the IPCC AR4 projection of 2C/century for the first three decades of the 21st century”. You can’t, on present data.”
Well, you can of course reject the IPCC projection based on it’s failure to predict no net heat accumulation in the climate system during recent years, or based on the miserable spatial fit, or based on the missing heat spot in the upper tropic troposphere, or based on the shear improbability of sustained real world growth many times higher than during any previous period while the world goes back to coal as the primary energy source. You could also take into account the discovery by McKitrick – that stands unchallenged – that temperature trends are highly correlated to economic activity and that true trends thus are likely significantly smaller. At this point I can’t see a good reason to defend the IPCC.
We can spend our time digging for significance in numbers that are not measured accurately enough to capture the proposed trends, which is fun for those of us who like to learn a bit more of statistics, but wouldn’t we really be better off if we gain the insight that the united nations is just as brilliant at projecting climate as they are handling oil-for-food programs? Let’s get pass this and replace the IPCC favourite policy with one that doesn’t starve the poor, doesn’t feed the filthy in Europe and doesn’t promotes filthy industries in developing countries.
Nick Stokes (Comment#3271) June 8th, 2008 at 5:40 pm
AV maybe I wasn’t clear enough – I meant that you can’t reject a projection three decades hence without 3 decades of data. But more generally, my request is that we shouldn’t extend projections and then falsify them. Let’s state exactly what they said, and deal with that.
lucia (Comment#3272) June 8th, 2008 at 6:35 pm
Nick–
The IPCC provided projections in graphical form. Those projections showed a linear trend for the “underlying climate trend”, and it’s uncertainty intervals.
So, I certainty beleive they were projecting that the “underlying climate trend” would increase. They gave some probabilities– describing a central tendency in words, and ±1 sigma uncertainties in graphical form.
As “underlying climate trends” with a central tendency and uncertainty intervals, those are supposed to apply right now.
The central tendency sure as shooting looks inconsistent with the weather data.
No one has ever suggested the IPCC claimed the weather would increase with a linear trend. That said the weather we have had does not appear very consistent with an “underlying climate trend” of 2 C/century, with “weather noise” superimposed on it.
If you feel you cannot test this in less than 30 years, that’s fine. You need not. However, in that case, you should be aware that those who write these reports are perfectly happy to do comparisons in less than 30 years. They did “slide and eyeball” comparisons on TAR predictions when they published the FAR.
So, clearly, many people are going to develope the impression that the authors of the IPCC believe the final graphics in the document are “prediction/projections” of actual climate on earth, and that these can be tested in some way, by comparing to data.
Nick Stokes (Comment#3274) June 8th, 2008 at 10:30 pm
Lucia,
A haven’t yet twigged to your “slide and eyeball” reference. But I went back and looked at that post.
In your post you linked to FAQ 8.1. I couldn’t see any linear projection there at all. They’ve just run the models to 2000, then continued, marking the mean as they go. It’s true that the last part of the mean curve is straight, but I think that’s just a coincidence. There’s no reason to believe that it is not, as elsewhere, just the mean of the model results, which are shown.
But maybe you meant Fig TS26, which is inset. This is more apposite. The description of it is unsatisfactory there in the TS. But stuff in the TS comes from the chapters, so I tracked it down. The figure (the part you’re talking about) is 1.1, from the history chapter. A curious thing about it is that it has results for FAR, SAR, TAR but not AR4. The reason is that they are doing a comparison of the changes through the versions. And for that purpose they’ve used some simplified summary measures. I think that is one point I would make. There is a chapter on projections (Ch 8), and I think that is where info should be sought. Elsewhere they use reduced versions for comparing scenarios, history etc. You can’t expect them to carry the full error picture into those discussions, where it isn’t necessarily relevant.
But I was still puzzled about those Fig 1.1 plots, because they converge right back down to a point in 1990, and a mean of GCM’s should have significant scatter at all stages. So I looked up the TAR (Sec 9.3.3; Figs 9.13 and 9.14). The caption to Fig 9.13 gives the clue:
“Historical anthropogenic global mean temperature change and future changes for the six illustrative SRES scenarios using a simple climate model tuned to seven AOGCMs. ” A simple climate model is more along the lines of Lumpy. It means you can quickly track scenario differences etc, but smooths over weather noise. It isn’t a GCM result.
I noted too in the caption to Fig 1,1 that they gacve the trends for the FAR and SAR results, but said that this was no longer done in the TAR.
Now there’s a lot that is lacking in clarity of the IPCC’s presentation here. And it looks to be as if whoever wrote the description of TS26 probably didn’t understand it properly. But it still doesn’t amount to a projection that one should be testing.
Nick Stokes (Comment#3275) June 8th, 2008 at 11:50 pm
Lucia,
A follow-up – I’ve now read your original “slide and eyeball” post (Apr 18) which deals with these same TAR plots (9.13 and 9.14). I looked more into how they were computed and why. The main explanation is in Appendix 9.1 of the TAR. There is a bit of history regarding what was done in the SAR. They say “The justification for using the simple model for this purpose was the model’s ability to simulate AOGCM results in controlled comparisons spanning a wide range of forcing cases (for example SAR Figure 6.13). ” And about those models they say:
The variability shown seems to be just a result of parameter uncertainty and includes no weather noise. That’s why it starts out being zero. So these plots seem to be not a good choice for actually doing tests of projections. They were designed for scenario comparisons.
MarkR (Comment#3276) June 9th, 2008 at 3:31 am
So Nick, are you saying that the IPCC claim of 2C which has been falsified was never a “claim”, but it was in fact a “comparison projection”? And that the 2.0C per century isn’t to be taken seriously, and wasn’t based on anything of substance?
You’re not serious are you?
Boris (Comment#3277) June 9th, 2008 at 5:00 am
Wasn’t that the paper where he confused degrees and radians? I believe the correlation fell apart when it was corrected.
Lucia,
Comparing the earlier reports to 2C/century is slightly inaccurate, since warming from 1990-2000 was not expected to be 2C/century. I’m not convinced that the first 7 years of this century are projected to be 2C/century either, but the bigger problem is that lack of data.
And, wow, look at the difference between your 2000 and 2001 numbers. One year of weather can make a huge difference!
MarkR (Comment#3278) June 9th, 2008 at 5:32 am
Boris,
Stop throwing mud:
An honest mistake honestly corrected.
John M (Comment#3279) June 9th, 2008 at 5:36 am
Boris,
You have an interesting definition of “fell apart”.
http://www.uoguelph.ca/~rmckit.....ection.pdf
(Hmmm. Looks like MarkR beat me to the punch. Hey Lucia, great editing feature! Wish I could correct all my posts!)
Nick Stokes (Comment#3280) June 9th, 2008 at 6:58 am
MarkR,
My claim is that the IPCC did not say that there would be a .2C rise in this decade. That is what has been falsified. They did project a 2C rise for the next century. Or, more exactly, in Chapter 10, where the projections are set out, they said:
You’ll see that there are substantial uncertainty ranges. If they had stated a projection for the 7 years to 2008, the error range would probably have been relatively much larger. But we don’t know.
lucia (Comment#3281) June 9th, 2008 at 7:13 am
Nick:
I’ve always said what I am testing are AR4 projections for climate not weather. You are now telling me that, after careful consideration, you have concluded they are projections for climate not weather. So, it would appear that we now agree: These are predictions for climate, not weather. For that reason, they are stripped of “weather noise”.
This is why they are smooth, and I must compare to smoothed OLS trends. However, since some elsewhere (Rahmstorf/ Tamino etc.) prefer to show weather super-imposed on the predicted trends, I’m doing that also.
The figures I call “IPCC projections” are labeled “Projections” in the IPCC document. They IPCC conveys them that way, and discusses them in this context.
lucia (Comment#3282) June 9th, 2008 at 7:33 am
Boris–
That is why I said this above:
See, we agree.
I think First Assessment Report written in 1990 projected 3C/century. But I didn’t want to clutter up the graph to show that. Would you like me to check and add the 3C/century? It would be easy enough. You’ll note that’s not falsified based on data since 1990!
One interesting thing: If you examine the uncertainty intervals, it’s pretty clear that if you buy into Tamino’s method of testing trends, we can’t distinguish better than ±1C/century even after 17 years. Had Rahmstorf et al. included any uncertainty in determining the trend on their graph, readers would have seen that discussing whether or not the TAR 1.5C/century was “on” or not would have been a bit odd. (Of course, many readers were mystified at the idea that the TAR projections and models were truly “frozen” before the FAR and SAR were written. I’m willing to believe models and methods used to make projections in report “N” were frozen instantly after report “N-1″, but I absolutely do not believe they are frozen three reports earlier. So, as far as I’m concerned, the first plausible year to consider as “predictions” in the TAR is 1996.)
On the one year: Yep. One year makes a difference.
As I said: I picked the 2001 based on the document publications dates and before I did the first test. But, yes, had various reports been issued at different times, I would have picked different start dates, and that would make a difference. Also, 2000 is a round year, so habits bein what they are, people will want to show that. So… I’m showing both.
lucia (Comment#3283) June 9th, 2008 at 7:48 am
Nick–
The IPCC was very clear. The 2C/century is their climate projections for the early portion of this decade. They repeat them over and over in various portions of the documents, and consistently include them in sections with titles called “projections”.

Here is the graphical version:
These are the climate projection the IPCC made. These are the uncertainties the IPCC communicated. Other portions discuss the basis for their projections but these are the projections the IPCC did make.
lucia (Comment#3285) June 9th, 2008 at 10:12 am
Nick–
I did two tests, and applied to the data in this post.
1) Pearson Chi-Squared. This is the test in my undergraduate statistics book. Wikipedia deems this “the best known” test for normality (which is likely why it’s explained in sophomore introductory test.) It’s also pre-programmed into EXCEL using the CHITEST, but you need to create a histogram first. That first step is a pain in the neck, so I did this for 1990-now and for 2001-no only.
For 1990-now, I tested the distribution for normalcy by using EXCELs default binning. This did not exclude the hypothesis the data are normal. But that’s a silly test because EXCELs histogram function created bins on the ends that are too small to properly apply the test. So, I binned such that the end bins had an “expected” number of samples of at as close to 5 as possible — but using round numbers. I got this histogram and results:
Bj Cumm Full Ej
Bin Frequency 0.00 0.0
-27.5 5 0.02 5.2 5.2
-21.4 7 0.06 13.6 8.4
-15.3 9 0.14 29.9 16.3
-9.2 35 0.25 56.1 26.2
-3.1 42 0.41 90.9 34.9
3.1 35 0.59 129.3 38.3
9.2 39 0.75 164.1 34.8
15.3 24 0.86 190.2 26.1
21.4 10 0.94 206.5 16.2
27.5 5 0.98 214.8 8.3
10000.0 9 1.00 220.0 5.2
More 0
220 220.0
Chi Squared 12.10%
DIAGNOSIS: Could be normal
So, according to this tests, the hypothesis the data from 1990-now are normally distributed cannot be excluded.
I repeated for data from 2001-now:
88.00 88
Eij
Bin Frequency
5.13 -16.00 5
7.87 -10.67 7
13.44 -5.33 11
17.56 0.00 18
17.56 5.33 22
13.44 10.67 10
7.87 16.00 13
5.13 10000.00 2
More 0
CHI SQUARED 34.97%
DIAGNOSIS MIGHT BE NORMAL
So, we can’t reject the possibility the recent data are normal.
So, for the two cases I did, Pearson Chi-squared says they might be normally distributed.
2 Jarque-Bera test.: I applied this one since it’s the one you evidently picked for the data in a previous spreadsheet.
I used EXCEL “SKEW” to compute the skewness and EXCEL “KURT” to compute the excess kurtosis. (The help file in EXCEL indicates already normalized by subtracting 3. So, if you use this, you shouldn’t subtract “3″ when doing Jarque-Bera.) I found:
2001-Now: skewness -0.35, excess kurtosis 1.06 (this already has the 3 subtracted), number of data: 88, JB statistics 5.950, probability this would occur by random chance 5.1%. Since we have been using 5% as our criterion, we cannot exclude the possibility this is normal!
2000-Now. I get 5.1% again.
1990-Now: 0.4%: We can exclude this is normally distributed. (FWIW, everything before 1996 can be excluded as non-normal.
So, JB does say the data that is still affected by the Pinatubo eruption is not normally distributed. If we only stick with recent data, we can’t exclude that possibility (though, when we get more data, we may be able to. If it’s not normally distributed, we might be able to tell in a few months. But for now, we can’t exclude that possibility.)
Of course, this is for the OLS data fit and the data I used in this post today. I can start adding the Jarque-Bera test to later fits since it’s less time consuming than the pearson. But for now, with this data, we can’t exclude the possibility the residuals during “non-volcanic” periods are normal.
That said: I do find this method (used by Tamino) and my method do seem to give slightly undersized uncertainty intervals. But, strong kurtosis or skewness may not be the reason.
Nick Stokes (Comment#3291) June 9th, 2008 at 6:23 pm
Lucia,
I have been mainly looking at the AR4 chapters, particularly Chap 10. The statement in the SPM seems “simplified for policymakers”, and doesn’t quantify “about” or give error ranges, so it’s hard to test. However, I concede that it gives a basis for associating the decade trend of .2 C with the IPCC, so I’ll withdraw that objection.
On the normality tests, your kurtoses etc are a bit lower than mine, but I was using a version of your residuals that didn’t include corrections for ENSO, solar etc, so maybe that makes a difference. I suspect the issue with data from 1994 is not volcanic but the 1998 El Nino. Even with the correction, that will leave some big residuals.
I find that testing for normality unfortunately muddies the logic, unless it is a clear pass. Here we have the statements:
If the residuals are normally distributed (iid) then it’s 95% certain the slope is less than 2, and
There’s a 5% chance that the residuals are normally distributed.
That makes it sound worse than it is, because even if the residuals aren’t normal, there’s probably still only a small chance that the slope is 2 or more. But that can’t be quantified.
Boris (Comment#3292) June 9th, 2008 at 7:50 pm
Lucia,
So you knew it was irrelevant, but you still made the graph that way? I don’t get it, but I figured you noticed. I’m glad we agree that you chart is inaccurate.
Mark and John,
Yes, Michaels “honestly” corrected his mistake. I’m sure that he “honestly” reappraised the results. I know this because he “honestly” erased the scenarios from Hansen’s graphs in his testimony to congress. Sorry to sling mud, but that’s all there is to talk about with a guy like Michaels.
lucia (Comment#3295) June 9th, 2008 at 8:07 pm
Boris– The graph is not irrelevant. It’s perfectly accurate– it shows the IPCC projection of 2C/century as labeled.
Had the IPCC projected 2C/century during the 90s it would have fallen inside those uncertainty intervals. But they didn’t project those until much later. So, I guess if one wanted to say I am making the IPCC look better than they were, they could. But I think that issue is perfectly clear.
Still… if you like, next time I make this, I’ll show the 3C/ century projection.
RossH (Comment#3299) June 9th, 2008 at 9:22 pm
Boris – McKitrick and Michaels have openly archived their data and computations at http://www.uoguelph.ca/~rmckit.....ptemp.html (a practice to be commended). If you feel that there is a question as to the honesty of their reappraisal, you are free to check their work and re-perform their calculations. Until then, you should probably refrain from intemperate accusations.
lucia (Comment#3309) June 10th, 2008 at 6:57 am
Nick
The version I’m testing don’t correct for those things either. But, I did the version for this article– and it’s a HadCrut/NOAA/GISS merge. Usually, I do HadCrut/NOAA/GISS/UAH/RSS merge.
Also, depending on which sheet you got, the data chages as I update each month. So, it may well be that some months the normality test is “reject normal”.
I agree there may be a problem with normality– or they may not. The cases I tested so far show we can’t reject they hypothesis of normal. But, I’ll be testing.
On the “unless it is a clear pass” bit… What constitutes a clear pass? Like all statistics, things are a pass at one confidence level and a fail at another. Or, you report and state the p.
If the kurtosis were negative, we could be sure that the rejections were even stronger than we are seeing. They are positive, so maybe not. But, the test for non-normality says “might be normal”. So.. . .
Can you point to a test we can do when the data aren’t normal but we don’t have any particular theory about the distribution? If you can, I’ll be happy to do it. I the meantime, as I mentioned, I’ll run the tests for normality, and report should the data I’m using currently fail.
Larry Bolz (Comment#3336) June 11th, 2008 at 3:24 pm
I agree with George Tobin, it’s not really the Taminio Method because the “analysis is devoid of self-satisfied snarky condescension”. hahaha!
Lucia is implementing the Tamino Method. If anyone has a problem with the method, perhaps complaining to Lucia about her trying to recreate it should be complaining about Tamino using it in the first place. That’s hardly her fault. Seems Ad Hominem unfairly so. Like dismissing a work out of hand because of a bug.
Let us see what Wikipedia has to say on this thing about cosines and all.
So a blogger found a bug because the data analyst made it reproducible and enabled the bug to be found.
Nick Stokes (Comment#3337) June 11th, 2008 at 6:48 pm
Lucia, no, I don’t think that is possible. These tests have to involve distribution tails. You’re asking where a postulated trend sits in a tail. And you actually have very little direct data about the tails. So the usual thing is to assume a distribution; identify it’s parameters from the central moments, and hope that the tails will behave according to the assumptions. Even the normality tests really only check for discrepancies in the central values. They don’t validate the tail behaviour. So I don’t think the distribution assumption can be avoided, if you want a quantified answer.
lucia (Comment#3340) June 12th, 2008 at 6:07 am
Nick–
Thanks. That’s what I thought!
I’m looking at a couple of other things. Obviously, this data is both a) messy and b) limited in quantity. Worse, I’m certain the magnitude of measurement error evolves over time. After all, the recent Nature article discusses the bucket- jet inlet transition problem back in the 40s-50s. So, we can’t even necessarily infer the shape of the distribution from past data!
I’m going to try to resolve something to my satisfaction assuming the data are normal first– and then later move on to non-normality. *This* data is barely passing the JB test right now. I don’t know if the set using the 5 groups will. So, I know I do have to deal with that. But… in a little while.
Nick Stokes (Comment#3343) June 12th, 2008 at 3:20 pm
Lucia,
One thing about these normality tests is that, while they provide a statistic, they don’t really provide a way of evaluating it. But the conventional 95% or whatever doesn’t seem appropriate. It’s a 95% likelihood that the distribution is not normal. In other words, a 5% chance that what we want (normality) is true (not 95%).
Now I don’t know what an appropriate level should be. But 5% is low.
lucia (Comment#3344) June 12th, 2008 at 4:29 pm
Nick–
It’s pretty standard to use 95% to test null hypotheses like the assumption the distribution is normally distributed unless proven otherwise.
In fact, it’s rather odd to suggest we must through out the null hypothesis at some very low confidence level, but use 95% for the IPCC projections, which don’t have a history of ever being proven correct. (Admittedly, the lack of proof may be because there have been only 4 projections, and there is not even enough time to distiguish most from the null of 0C/century.
But still, if you are going to apply this sort of reasonsing to rejecting the hypothesis the distribution is normal, we should use the same confidence intervals for rejecting the IPCC projections!
NIck Stokes (Comment#3345) June 12th, 2008 at 8:04 pm
Lucia,
I think you’re missing the point here. The null hypothesis is the one you want to be able to reject. In your falsification, it’s the trend=2 C/century. Here the hypothesis that the test is rejecting is that the data is normal. That’s the one you don’t want to reject.
You can see the paradox; with your J-B test for 2001- you got 5.1%. You say that passes the 95% test. Suppose you has got 4%. Then you would have said it failed, as you did with 1994-. But if you apply a more “stringent” 99% test, they both pass!
lucia (Comment#3346) June 12th, 2008 at 8:20 pm
Nick–
When I ran the test, the hypothesis the data were normal was not rejected.
Of course that’s the way the tests are run. Setting to a high threshold makes it difficult to reject a null. When testing, I’ve treated the 2C/century and the normal hypothesis as null with p=95%. I then treat them the say.
I reject 2C/century if it fails at a confidence 5%. I reject normal if it fails at to a confidence of 95%.
This has nothing to do with what I want or don’t want. I set both to “null”. I test identically.
In both cases, the threshold is the confidence of 95% or α=0.05.
If your issue is that there there is a bright line dividing point on either side of α = 0.05– sure. There a bright line test. If I get 4.999% I reject the normal distribution. If I get 5.001% accept it.
But with respect to further analysis, applying t-tests etc., either I reject a normal distribution or I assume the distribution is normal and run the t-tests and do other things.
Absent a theory about an alternative hypothesis for the distribution, what I’m doing is pretty darn standard.
That said: No one says you are required to believe this or any test. You can convince yourself that the distribution is not known and that, consequently, no tests can ever, ever, ever be done. Etc Or, if you want to propose an improved method do so.
But until the normal distribution is rejected to 5%– the exact same leve I have been using for everything I’m sticking with it. It doesn’t make any sense to be stricter on this than on everything else.
Nick Stokes (Comment#3353) June 13th, 2008 at 8:35 pm
Lucia,
I don’t think your criterion is right, but I don’t know what is right. But I think if your going to keep testing for normality, you’ll have to deal with the fact that the logic is the other way around. Normally, as you go from 90% to 95% to 95% you’re becoming increasingly confident. But here, you reject the J-B test for 1994-2008, which came in at 99.6%, but accept the test for 2001-8 which returned 94.9%. At 95.1% it would have been rejected. So it isn’t a conventional 95% confidence level. At 90% you would have rejected them all.
Richard (Comment#3354) June 14th, 2008 at 1:54 am
A simple GLM model of the years between 2000 and 2008 for the UAH satellite dataset (global values) show that 2000 and 2008 are similar being much lower than the other years. The years 2001 -2007 are significantly higher than 2000 or 2008 but not from each other. Therefore apart from the jump from 2000 to 2001 there has been no significant warming during the 2000s thus far. In fact, there has been a significant cooling in 2008 (bear in mind this is only a partial set for the months Jan-May). I analysed the mean monthly values for each of the years. I am a big believer in the concept that the data analysed should be manipulated as little as possible as long the rules of homoscedasticity are not violated.
lucia (Comment#3356) June 14th, 2008 at 6:33 am
Nick–
Yep. As you go this way, you are becoming increasingly confident you should reject. That’s precisely how I apply this to both the IPCC projection and the test for normality. I set 95% for both.
If I thought the 1994-2008 was the same “population” as the recent period, I would be interpreting this the same way you do. The 1994-2008 was likely affected by volcanic eruptions. The stratosphere was clearing from pinatubo. I’m not surprised it’s not normally distributed with the outliers! But it’s not drawn from the same population as the “no-volcanic activity” period we’ve had recently. (I suspecte the dramatic sign switch in the very large skewness was also due to this change.)
I’m planning to test the “no volcano” period with JB. It’s longer, and I’m going to see what I get.
The 1994-2008 would have been rejected. But the recent periods would not have been rejected. That’s the way these work: you reject a “null” only if you are confident you should reject. Otherwise you accept.
The null hypothesis is favored– that’s is the way it works.
But Nick– is your concern the pre-2000 stuff? Or the post? The 1990-2008 is the only one that “rejects” 0C/century. You are arguing it’s confidence limits may be too small– which is true. When I get around to figuring out what to do with that, I will because it fails JB. That’s what I said I would do. But I haven’t figured out how to do that.
In contrast, the 2001-2008 rejects the 2C/century. For that set of data, normality passes the JB — meaning we can’t reject it yet. So, we are currently rejecting 2C/century, but not adjusting for normality, and that seems fine.
The “pass” on JB is admittedly borderline for the 2001-2008. Normally, if it weren’t for the volcano earlier on, I’d reject based on more data from the earlier periods. But we have a perfectly good physically based reason to expect the two periods to be different.
I’m planning to look at the 30s data– but it’s not going to happen instantly. As I said: I’ll reject when I get a clear indication that normality should be rejected. But I’m not lowering the criteria below 95% for normality. It doesn’t make sense to do this unless I also lower it for the IPCC 2C/century null or the 0C/century null. I’m using the same criteria for all and in the same way.
Nick Stokes (Comment#3358) June 14th, 2008 at 8:03 am
Lucia,
I’m not arguing about the particular cases. I think you’re right that 2001- was closer to normal than 1994-. I’m just pointing out that the logic runs backward – the “less” normal corresponds to the higher “confidence” (99.6%).
Well, that you can reject the null hypothesis (trend=2), because if it were true, there’s only a 5% chance that you would have got the observed values.
But with the normality test, we’d like to reject the possibility that it isn’t normal. But the chance of that, in the 1994- case, is 99.6%. There’s only a 0.4% chance that it is normal. So we can reject that with great confidence. But it’s the one you want.
lucia (Comment#3360) June 14th, 2008 at 8:35 am
Nick–
This isn’t backwards. It has to do with the convention. The confidence relates to how confident we are that the rejection is correct. So, 99.6% confidence means that using this rule, we would only incorrectly reject 0.4% of the time.
The value tells us nothing whatsoever about how often we would incorrectly accept a null hypothesis when it is false. To get that, you need to look at the β (type two, aka. false negative) error.
We can’t estimate the β level unless you have a theory for an alternate hypothesis. We don’t have one.
I agree that it often seems weird to people that we don’t reject a null hypothesis even though it might be wrong. But that’s the way it’s done.
The treatment is exactly the same for the IPCC projection. I don’t say “reject” unless the result of a test says it is wrong with a probability of 95%. If we applied the rule you want to apply to the normal distribution, we’d be rejecting it at 50% also. Then the error bars in the picture above would be about 1/3rd the size shown. But there is a reason we don’t do this.
Instead, we “don’t reject”. I’m using the same standard for rejecting the normal distribution as for the IPCC projections of the 0C/century. I’m applying it the same way. The “strictness” or otherwise, works the same.
Nick Stokes (Comment#3372) June 15th, 2008 at 3:58 am
Lucia,
I’ll try to rephrase with some formal logic.
If A is a proposition that you want to establish at 95% confidence, then you do this by showing that if notA (the null hypothesis) is true, the observed data would occur less than 5% of the time. You infer notA is false, so A is true (with 95% confidence).
This was the case for the IPCC projection: A=”trend=2 C/century)”. You correctly show that if notA were true, the data would occur less than 5% of the time. So you say A is true with confidence 95%, and reject the null hypothesis.
But for the normality test of the 2001-8 data, the J-B test says that if B=”the residuals are normally distributed” were true, the observed data would occur only 5.1% of the time. B is the null hypothesis (negation) of the proposition A=”the residuals are not normally distributed”, and that is what you have just not quite established at 95% confidence. The proposition that you want to prove (B) is actually the null hypothesis of the test, and all the test says is that you can’t quite reject B. It doesn’t say B is true.
lucia (Comment#3373) June 15th, 2008 at 6:17 am
Nick–
A null hypothesis is one that we assume is true until proven false. I am treating the normal distribution as null. That’s standard. I am also treating the IPCC projection as null. That’s standard.
Well, you got this bit right: A=”trend=2 C/century)” has been treated as the null hypothesis.
I’ve assumed A is true until it is shown untrue with 95% confidence. I have shown that if A were true the data would occur less than 5% of the time. So, I say A is untrue with confidence 95% of the time and reject the null.
I do the same for the normal distribution.
Nick Stokes (Comment#3377) June 15th, 2008 at 3:27 pm
Lucia,
There’s the difference! In the trend test, you assumed (for the purposes of the test) a null hypothesis, found that it was rejected, and deduced that the opposite must be true. For normality, you assumed normality as the null hypothesis, found that you couldn’t quite reject it, and deduced that normality was affirmed.
For the trend test, when significance is not reached, you don’t say this is evidence for a trend of 2C/century (the null hypothesis). You can’t; it would be just as much evidence for 0C/century, or 1C.
I don’t have an answer here; I think it’s good to do normality testing, and it is reassuring when the test statistic comes out favorably. I don’t know what measure should be used, but I don’t think “narrowly escaping rejection” is right.
lucia (Comment#3378) June 15th, 2008 at 4:46 pm
Nick–
No. I don’t deduce it is affirmed. I deduce it is not rejected. There is a difference.
However, because it’s a null hypothesis, I continue to make the assumption it applies. This is standard.
It is confusing. But basically: I make an assumption. I would not proceed if the assumption can be clearly shown to be inapplicable. But otherwise, I make the assumption.
I would do the same for tests for serial autocorrelation, tests that another variable matters etc. This is a very, very standard practice in statistics Nick. You don’t add in complications or fixes for odd things unless there is clear evidence you need to. (There are actually reasons for this. Lots of tests get worse if you start diagnosing faults that might be due to chance.)
With regard to “narrowly escaping”, the problem is either a) you proceed on the assumption that normality applies or b) you assume it doesn’t and try to find a correction. It’s not possible to do anything “in between.” So, one sets a threshold, and uses that threshold.
Yes. This does mean that the analysis path diverges, and I proceed differently for 94.999999% vs 95.000000001%.
So, basically, I proceed with the assumption of the normal distribution (which is a very, very common assumption) until it’s proven not to apply. I do the same for serial autocorrelation, heteroskedasticity, blah, blah, blah!
For what it’s worth: If you had a theory suggesting some specific alternative distribution, I could repeat the analysis using that probability distribution function. I could also test whether the experimental distribution seems to fit better than the normal distribution. But since neither you nor I know of one, right now, I’m using the normal distribution.
As I said: I am going to be checking some other data sets and testing for normality. But…well… until I have a confirmed non-normal with no volcanic influence, I’m assuming the residuals are normally distributed. (Meanwhile, I’ll also be looking for things to do about non-normal because, let’s face it, it’s looking like I might need to figure out what to report!)