The 88 Month Trend (Whether you like it or not!)

Let’s face it, whether we agree on uncertainty intervals or not, we are all watching the temperature trends.

Of course you know April temperature fell relative to March. (The interesting thing is that, at some agencies, March temperature fell relative to March!)

But how much has that affected the trend? Here’s the graphs showing the trend based on the five measurement group merge (GISS-Land/Ocean, HadCrut, NOAA/NCDC, UHA and RSS) based on the 88 months beginning in January 2001. The solid green line showing a trend of -0.5C/century is obtained by applying ordinary least squares (OLS) to the merged data. The orange line showing a trend of -1.0 C/century is obtained by applying Cochrane-Orcutt to the merged data.

Numerically the current trends based on the merged data:

  1. Cochrane-Orcutt: -1.0 C/century and
  2. Ordinary Least Squares: -0.5 C/century.

Notice no uncertainty intervals? Thats because I’m working on them.

Since last month, JohnV suggested we examine how the uncertainty estimates based on the regressions compared to data from long historic time periods with no stratospheric volcanic eruptions. If I understand correctly, the question wants to ask is:

Based on historic data, if we determine a 30 year trend obtained based on a regression what size uncertainty intervals on 7 year embedded trends result in falsification 5% of the time.

This is a rather non-standard questions, and we haven’t even refined the question sufficiently well to give a truly defensible answer. It’s not clear I’ll get a truly defensible answer even if I refine the question; there just isn’t much data in the thermometer record. There aren’t many independent 30 year periods, and there aren’t many sustained periods with no stratospheric volcanic eruptions.

Nevertheless, it would seem that looking at a historic period for this might give us a rough estimate of the uncertainty intervals– at least to the extent that we might distinguish between the sorts of suggestion we are reading at blogs.

As it happens, during the thermometer record, I was only able to find one 33 1/2 year long period that is clearly unaffected by volcanic eruptions. I reported earlier that I found that if I used the uncertainty interval determined using Cochrane-Orcutt, the mean trend during that period falsified 20% of the time, with the uncertainty intervals stated to be 95% intervals. This suggests the uncertainty intervals were a bit too small, and corresponded to the 80% confidence level.

Looking at that same intervals, it appeared a standard error of 1.5 C/century might corresponds to the 84 month trends calculated using NOAA/NCDC, GISS and Hadcrut only. These somewhat larger than the 1.1C/century standard errors we obtained from the recent regressions to the mean based on five measurement groups.

However, we now have 88 months of data– a bit more than 84 months. So, I repeated the analysis for 88 month trends and found the standard deviation in 7 month trends calculated using Cochrane Orcutt was 1.28C/century. Because these were overlapping trends, it appears we have the equivalent of (roughly) 5 1/4 independent calculations of the trend, so I scaled this up using the square root of (N/N-1), where N the number of equivalent independent calculations of the trend. This results in a standard error of ±1.4 C/century for the NOAA/NCDC, GISS, Hadcrut merge.

I then applied these uncertainty bounds to the 7 year trends inside the “volcano free” period and simply counted the number of times the 7 year trends in the “volcano free period” would result in a rejection of the meant trend for that period. I obtained a 2% false rejection rate. This might suggest that 1.4C/century is generous when to estimate the 95% confidence intervals for this particular question; (95% confidence intervals result in 5% false rejection rates.)

Of course, all the above is just a SWAG. (Scientific Wild Ass Guess). However, it’s not based on any assumptions about the color of the “noise”. The 2% false rejection applied equally whether I used OLS or CO — even though each gives slightly different estimates of the trend at any particular time.

So, while these uncertainty bars could be wrong, the results do suggest that– absent stratospheric volcanic eruptions– the standard deviation between 7 1/4 year trends and the 30 year trend in which they lie, is not some humongo-normously large imponderable number. Rather, it’s about ±1.4C/century. So, even if 7 years is a short period of time, we can likely exclude some theories about underlying trends. In particular, if the 7 year trend is sufficiently cold we can suggest the IPCC prediction of a central tendency of 2C/century seems highly improbable.

Application to this months trends.
Even though this is a SWAG, if we apply the 1.4 C/century to the NOAA/NCDC-GISS-Hadcrut-RSS-UHA merge, we might estimate that there is a 95% change that, 30 years from now, the mean trend will fall within the following bounds:

  1. -3.8 C/century < m< 1.8 C/century using Cochrane-Orcutt to account for red noise.
  2. -3.2 C/century < m< 2.3 C/century using OLS.

With these larger 95% uncertainty bars bars estimated based on simple calculation of the standard deviation of trends, one based on Cochrane-Orcutt (to deal with the substantial AR(1) lags), it appears that the IPCC’s 2C/century is highly improbable. In particularly, it is excluded to a confidence of 95%, based on real weather noise. But, OLS says 2C/century is still in the running ok.

In any case, until such time as I can look into some issues, these are pseudo error bars.

So, who knows? Maybe after 30 years, the IPCC projection of a central tendency of 2C/century will pan out. We shall see. (Since the various bloggers seem to be all for demonstrating their confidence in various predictions by offering bets, maybe if the modelers are truly confident the central tendency over 30 years is 2C/century, they’ll take an even money bet. If the OLS trend over the first 30 years of this millenium is greater than 2C/century, they win. If it’s less than 2C/century, they lose. If 2C/century is really the central tendency, and 7 year trends far off the mark are just insignificant, commonly occurring blips, that even money bet ought to be attractive, right? )

Can I get better uncertainty intervals?
Who knows? I’m going to try. I’m currently assessing whether the estimated standard error of ±1.4 C/century for 88 month intervals is too large, too small or about right. I am trying, as much as possible to base the estimate on observations of the actual earth with minimal input from models.

It’s clear I’m not going to get a better estimate of the uncertainty bands quickly. So, in the meantime, at least you can see the trends and compare to the variability during the past (and only) sustained period without stratospheric volcanic eruptions.

Now, let’s have fun with numbers!
As long as I have regressions, I can do is this: I’ll suggest numbers you might want to watch during June when the May temperatures get published. Assuming the temperatures from Jan 2001-April 2008 don’t change:

  1. If the average temperature for May rises more than 0.16C over April, the calculated OLS trend will rise next month. Otherwise it will fall.
  2. If the average temperature rises more than 0.04C, the CO trend will rise next month. Otherwise, the calculated trend will drop.

So, be watching the data as it comes out. Any bets on whether the two trends will rise of fall next month? El Nina has to end sometime!

48 thoughts on “The 88 Month Trend (Whether you like it or not!)”

  1. While I am fully aware that weather /= climate, I suspect the May data may not be more then 0.16C, but close based on the chilliness in the eastern US and the warmth in the western US. That’s my bet. Not sure how the rest of the globe fared last month though.

    It’ll also be interesting to see if we get any effect from Chaiten but I don’t think that’ll come, if at all, until next year.

  2. It’s seemed cool around here. But then…. I always think it seems cool during May. I love summer and I’m perpetually worrying about my vegetable starts freezing. I think we are free and clear now though.

  3. There’s been a persistant trough over the eastern US. I think it’s on its way out now and the warmth will start.

    I would have to check the weather service next week when they release some data but the mid-Atlantic/Northeast has been wet and cool, mainly an extension of the wet winter from heck.

    If the trough in the east persists through the summer however it’ll have an influence on hurricane tracks.

    I keep hearing La Nina will end but it very well may persist through the fall, plus the new influence of the negative PDO now.

    But back to climate: Based on the 7 year trend I’m having a hard time believing the 2.0c/century central tendancy. However if you placed your begin point in 1998, I think you’d get that 2.0C. I think…

  4. Lucia,

    You state:
    “I reported earlier that I found that if I used the uncertainty interval determined using Cochrane-Orcutt, the mean trend during that period falsified 20% of the time, with the uncertainty intervals stated to be 95% intervals. This suggests the uncertainty intervals were a bit too large, and corresponded to the 80% confidence level.”

    Don’t you mean a bit too small. If the mean trend over the whole period is considered the true trend, then you would want to reject that trend 5% of the time based on the sub-sample trend being outside the 95% confidence interval.

    [Because the subsample is actually from the larger sample and you are testing the equality of two statistics, not a statistic and a parameter, the test is a bit messier than that. The easiest way to do this is to create a dummy variable for the subsample and run the regression
    Y(t) = B0 + B1* time + B2*dummy*time,
    then if the coefficient estimate of B2 is significant, you have a rejection. This is just a version of the Chou test of the stability of the coefficient estimations over the entire sample period.]

  5. Lucia,

    Second comment and suggestion:

    Why don’t you go back to the beginning of you data set and take non-overlapping 7 or 7.33 or whatever year intervals. Calculate the trend in each of these and then calculate the 30 year trend from the same start point. This will give you a distribution of 30 year trend deviations from 7 (or so) year trends with the same start point. These deviations are exactly the kind of variable you are talking about with regard to a comparison of a 7 (or so) year trend with a projected 30 year trend only the projected is replaced with the actual.

    Yes this will have heterogeneous forcing (and who knows what else), but the average deviation will give you a feel. You can then adjust it to the extent that you estimate the heterogeneities in the forcings bias you estimate.

  6. Marty–
    Thanks for noting the “too large” where they should have been “too small”.

    I know the test of the 30 year against 7 year periods is messy. That’s why all of this is entirely “pseudo”! But, I’ll try the second suggestions. I know I can divide th 30 year into 4 non-overlapping chunks and test a few things. That’s a good idea.

    (I wish we had more data though. This is such a tiny snippet no matter how I look at it!)

  7. From the data here:
    http://discover.itsc.uah.edu/amsutemps/ :
    create 400 or 600mb graphs and compared to ALL previous years, the 400mb shows all of 2008 coldest on record for ALL data since measurements began (except one week in Feb). The May 08 temps are looking very similar to January 08 so May 08 much colder me thinks. Guys/gals the world is cooling not warming and it ain’t due to La Nina anymore so please explain modelers…. what are you going to do now? I personaly believe that both skeptics and AGW are dead wrong to even try to speculate on climate change within a period of <500-1000 years, so in our lifetimes there will be no climate change unless there is one due as per a Dalton or similar but that does take time too doesn’t it? LOL ie leave weather forecasting to meteorologists

  8. Lucia “maybe if the modelers are truly confident the central tendency over 30 years is 2C/century, they’ll take an even money bet. [etc]”

    C’mon Lucia, that’s a sucker bet, and it doesn’t represent your view, nor the views of your opponents.

    You assert “falsification” of the IPCC “central tendency” of 2C/century. Your opponents say a.) it’s a forecast in the range 1.5-2.5, not a central tendency, and b) by asking them to take only the high side you’re taking half of their bet for yourself.

    Why don’t you recast it to represent the reality of your position. You win if the slope is less than 2 – 1.4 = 0.6C/century, they win otherwise. ie. you win if the IPCC is “falsified”, they win otherwise.

    Oh, and monetize it. Put your stake and their stake in escrow.

  9. JM,
    What opponents?
    Suggesting someone bets “central or above” and the other one takes “cenral or below” should be a 50%-50% bet. The only thing that makes it a suckers bet is that we have 7 years of data already in. That’s the sort of bet being suggested by others elsewhere.

    My position has never been the slope is anywhere near 0.6 C/century. It’s that 2C/century falls outside the range.

    Clearly no one needs to take this bet. But if bets are going to be proposed, why not this one? FWIW: I don’t think betting on these things tells use anything about confidence in the predictions. It tells us some people a) like to bet and/or b) like to proposed ridiculously biased bets where they win even if they are totally wrong.

  10. “What opponents?”
    The people you’ve thrown out the challenge to. If you feel ‘opponent’ is an unsuitable term, and you’d prefer something like ‘counterparty’ or ‘player’ I’m perfectly happy with that – but most people would understand that the person taking the other side of a bet is an opponent.

    
’Suggesting someone bets “central or above” and the other one takes “cenral or below” should be a 50%-50% bet.’

    Perhaps, but it doesn’t reflect the views you’ve expressed on this site nor the objections that your ‘opponents’ (sorry) have expressed.

    You say the IPCC ‘projection’ is falsified, they (you know who they are) say you are wrong.

    You’re challenging them to back in their views (on the basis you’ll back yours in). The bet should therefore reflect the opposing views – namely your view that IPCC is falsified, their view that it is not.

    The 50/50 part is only your proposed stake.

    You’ve identified the IPCC position here as 2C +/- 1.4C and stated you believe the actual result for 2000-2030 will be lower than that. Why not bet your actual belief? Why do you propose a bet that the result will be 2C or less? That’s not a fair bet, you would win it in many cases even though the IPCC was vindicated and you were not.
    On the other hand if you don’t like your chances at 0.6C or less, don’t propose even money, try a different stake. How about 19-1? ie. They stake $19, you stake $1. That way your stakes match your odds.

    “It tells us some people … like to proposed ridiculously biased bets where they win even if they are totally wrong.”

    Pot, kettle, black.

  11. JM–

    I have never, ever suggested the IPCC projection as 2C/century ± 1.4 C/century. I also have never suggested my view is the tendency is less than 0.6C/century.

    I don’t know what the trend really is– but if I need to pull a number out of my hindquarters for the purpose of a bet, I’d bet it’s currently ±1.3 C/century. If it’s higher than that, then in 2030, I’d be happy to bake you dozen chocolate chip cookies. (If you actually want them, you pay shipping). If it’s lower, you can bake me the cookies. (I’m not sure I trust your baking skills, so I’ll happily let you distribute them to your neighbors.)

    I have said 2C/century is inconsistent with the data since Jan 2001. Gavin’s post at RC notwithstanding, it appears other climate scientists are at least conceding there is something puzzling about the “stall” in measured warming. You will note that yesterday’s article in Nature suggests the measured trend could be low due to bias associated with transtion from engine intake measurements to buouy measurements.

    This at least concedes the measurements show puzzling non-warming. Presumbably, if the authors of that article believed that 0C/century for 7 years was entirely consistent with IPCC projections, there would be no reason to even seek any explanation. They’s just say, “Hey! That’s within weather noise for 7 year trends!”

    If they are correct that the apparent non-warming is due to slowly creeping measurement bias during a transition from one measurement system to another, this would mean my hypothesis test correctly detected that the measurements say “it’s not 2C/century”.

    The test I did can’t tell us whether the difficulty is a bias introduced by those taking measurements or due to a truly flat temperature trend.

    So, given the outcome, it would be wise for those in charge of measurements to look toward data quality issue. If this is due to biased data at Hadley/GISS/NOAA/RSS/UHA, it behooves them to look into data quality sooner rather than later.

    If everyone keeps telling themselves they should not check data after 7 years, measurement biases will never be detected and corrected. In that case, we could end up with 30 years of data polluted by a bias arising from transition between one type of instrument and another.

    Even if any bias correction is defensible, fixing 10 -30 year old published data in the direction that supports your theory, can never ally the doubts of those who don’t believe your theories.

    So you, and others, may think what you wish about the idea of testing hypothesis as we go along. I think it’s necessary– even if only to prod those taking data to check their measurements for bias.

  12. Lucia, I’m gonna ping you on this:

    “You will note that yesterday’s article in Nature suggests the measured trend could be low due to bias associated with transtion from engine intake measurements to buouy measurements.”

    That’s disingenuous. They’re referring to a difference in technique from measuring sea surface temperature immediately after WWII. Basically a step change around 1945 that’s been previously hard to reproduce in models. That’s why an explanation was sought.

    Your preferred period is 2001-2008 is it not? What’s the relevance?

    “I have never, ever suggested the IPCC projection as 2C/century ± 1.4 C/century”

    Then what does your entire section under the heading “Application to this months trends.” mean? You know, the section where you propose the bet?

    Anyrate, back to the bet.

    So here’s the deal. 19 to 1. Your model (spreadsheet) applied to a 30 year period of observations.

    You win if the observed temperature change using an OLS trend in your model is outside 2C +/- 1.4C, I win otherwise.

    You make no changes in your spreadsheet, I withdraw all my previous objections – except for one (which I haven’t previously raised, sorry). You apply the diurnal correction to RSS and UAH. I think that’s fairly non-controversial.

    You get to do the analysis and send me the spreadsheet. Disagreements resolved by a committee of two – you nominate one, I nominate the other, each being active and informed on the debate, ie. minimum condition is they have a blog focussed on climate. OK?

    One last thing:

    “fixing 10 -30 year old published data in the direction that supports your theory,”

    I don’t have a theory. Neither do you – all you’re doing is number crunching.

    So the bet’s on? (Sorry, gotta be money. I’m a better cook than my spouse who used to be the head chef in a pretty chic restaurant but unless you want to pay your own airfare this will have to be cash settled).

  13. JM–
    The ±1.4 C/century is my estimate of uncertainty in 7 year trends during periods with no-volcanic activity associated with the weather noise on the real planet earth. This is an entirely different number from the IPCC’s estimate of the uncertainty in their projections.

    Good luck finding people to bet with. Even better luck finding people to make a diurnal correction to RSS and UAH data. 🙂

  14. Hi Lucia,

    JM does have a point – if you have really eliminated a trend of 2.0 C/century at the 95% confidence level, then from your perspective, you should be willing to accept a 1:19 payoff on a 2.0+/2.0- bet (i.e. you put down $19, you win an extra $1 if you are right, you lose all $19 if you are wrong). On the other hand, somebody from the IPCC side would equivalently be expected to accept (as you suggest) even odds on a 2.0+/2.0- bet. Given the likelihood of measurement problems and short time intervals etc. both sides could reasonably compromise somewhere in the middle – either a lower cutoff point, or intermediate odds – say 1:8 instead of 1:19. Are you up to it? It really is a question of confidence in your analysis (well, and whether you’re a betting person or not).

  15. Arthur:
    I’m not sure what you are suggesting. What’s this notation: “2.0+/2.0- bet” mean in words. (I’m not a gambler. Or did you just lose the – sign? )

    JM’s proposal to me is:

    You win if the observed temperature change using an OLS trend in your model is outside 2C +/- 1.4C, I win otherwise.

    I haven’t run the numbers but I’m not seeing how my winning $1 and only if the temperature is outside that window includes loads of trends I say are possible– even likely.

    I haven’t eliminated 0.6C/century. I haven’t eliminated 1.0 C/century to the 95% intervals. I haven’t eliminated 1.1 C/century using any method, any uncertainty intervals ever. I could show the cumulative distributions, but based on the numbers in this post, if I were the betting type, I might take roughly 19:1 odds, if using OLS Arthur proposed this:

    I put 19 chocolate chip cookies in the pot, he/she puts in 1 cookie.

    If the temperature trend at 30 years is either above 2.3 C/century or below -3.2 C/century, JM wins the 20 cookies. (Net profit to JM of 19 cookies. Lucia is saying– using OLS– this has a 5% chance of occurring.)

    Otherwise, Lucia wins the 20 cookies. (Net profit to Lucia 1 cookie.)

    Those are the 95% uncertainty bands I just posted in the post above.

    Why would I bet against the temperature range of 2C/century ± 1.4 C/century, which over laps the range I say in this post is possible based on the new uncertainty bands? Sure, the bit from 2.3 C/century to 3.4 C/century is outside the range I say possibly consistent with the data– but that still leaves the 0.6C to 2.3C/century range right smack dab in there.

    I’ll admit as soon as the Nature article came out, I suddenly didn’t think 1.4C/century uncertainties are warranted– but now I have no idea whether it might not be decided the current data are off.

    The Nature interview hints at possible revision of current SST temperatures upward. So, even if I bet what I outlined above I’d still insist on a “If the temperature measuring authorities don’t ‘revise’ the current temperature upward for reasons just mentioned in that Nature article!”

  16. Lucia: “Why would I bet against the temperature range of 2C/century ± 1.4 C/century,”

    Because you’re claiming falsification. If you were confident of your claim, you would take the bet.

    If not, you should retract your claim.

  17. JM–
    With regard to the specific bet you propose:
    a) Did you even read the current post? With OLS (which seems to be the method you prefer) 2C/century doesn’t falsify.
    b) If I were a betting gal, I would always pick the intervals I currently post. Bettors who bet mid-game always include up to the minute data when they bet.
    c) I never, ever, ever posted the intervals you pulled out of a hat. So, I would have never accepted that bet at 19:1 odds!
    d) I just posted the range mya analysis says contains the mean with confidence 95%.

    The specific intervals you propose are entirely unrelated to anything I have ever said. Obviously, if I’ve said 1.3 C/century is inside the bands possible, as is 1.0 C/century is inside. Given what I’ve said, it would make no sense for me to take the range you suggested — at 19:1 odds!

    Reread the cookie bet in my most recent comment. If you insist on OLS, that comment describes what would make sense at 19:1 odds. It’s based on the current data, and I posted the 95% intervals publicly.

  18. Lucia

    I think you’re getting too hung up on 19:1 odds (although as Arthur points out, it’s a risk neutral bet that you should have no problem taking).

    Why don’t we do it this way: even money, 30 year data. You take IPCC falsifies, I take it doesn’t

    In other words you back your position on a bet you should win (if you are right) 19 times out of 20.

  19. JM–
    I’m not ‘hungup’ on the 19:1. I’m just responding to specific bet you proposed which connected a specific set of odds to a specific interval.

    Yes, if I were the betting sort, I’d bet even money that the trend will be less than 2C/century during the first 30 years of this millennium. I said that many, many comments ago. . . It appears that, using your crack rhetorical skills, you’ve forced me to agree with myself! 🙂

    FWIW: I also said that my analysis does assume the data from 2001-2008 are moderately stable. If you read the earlier posts, you will note the caveat about the hints in the recent Nature article mentioning buckets/ engine inlet/ buoy and SSTs.

    On betting in general, do recall, I already said I don’t bet money on things like this. I should add: I specifically I don’t bet with individuals and never have. I don’t bet on cards, I don’t don’t bet on sports. I admit I did buy a few lottery tickets about 20 years ago. 🙂 I was once persuaded persuaded by friend to drop 4 quarters in a slot machine at a meeting in Tahoa, and I found the experience utterly irritating! So, other than in specific circumstances, I find betting an irritating experience, and for the most part plant to leave it to those who find it enjoyable.

    I’ve said publicly– using my real name, rather than nothing but untraceable initials, or a pseudonym that I think 2C/century has less than 1 in 20 odds. I think that constitutes a more of a bet than what’s thrown in by many.

  20. Hi Lucia

    Please excuse me if I have misunderstood, but it appears to me that your general approach is to assume that the predicted temperature trend has no variance and then to look at data from 1 January 2001 onwards to see whether it agrees with the predicted trend. At each point you wish to test a hypothesis that the observed trend is different from the (constant) predicted trend, and you will do these by constructing 95% CIs for the observed trend – if these exclude the predicted trend (a constant), you reject the null hypothesis.

    If this is your approach, it seems to me that you should have structured the spending of your alpha. It looks to me like you spent all the alpha on your first look at the ?7 year data, by using 95% CIs, and can’t look at these data again for testing the hypothesis.

    Sorry if this has already been raised and dealt with. I mentioned it to Ian Castles at another web site discussion, and he referred me here.

  21. Garnolda–

    If you concern is whether or not they are independent test, the answer is the test with 1 month of new data is not independent of the previous test. But, provided we don’t deceive ourselves and believe they are independent, there is no problem with “spending” alpha. We’re just seeing what the outcome happens to be now.

  22. Yep that’s fine if you are just looking, and putting 95%CIs is just looking at these. But you acknowledge that you can never test the hypothesis again on those data?

  23. Garnolda:

    I acknowledge the hypothesis tests aren’t independent. Failing two in a row would mean the probablity that something is falsified goes as (0.05)(0.05) = 0.0025. That would imply something that really has a probability of 5% magically has a probability of 0.25%.

    So, repeating doesn’t cause the probability to drop this way.

    Of course I can keep looking at data and describing what the test would be based on today’s data. Anyone else in the world can do the same. Data doesn’t get thrown out the window, becoming unavailable to the entire universe, just because I ran a test and said “Hey. Look at the outcome of this test!”

    That would give me way too much power over the destiny of the universe.

  24. Thanks Lucia

    Data doesn’t get thrown out of the window, no.

    But multiple testing of accumulating data is a very common problem, and their are rules for how it should be dealt with. One cannot keep repeatedly testing accumulating data to evaluate the same hypothesis.

    This problem arrises frequently in clinical trials, where interim analyses are specified in case a new treatment is substantially better or worse. When interim analyses are specified, one allocates the alpha error between the (one or more) interim and final analyses to preserve a final alpha of 0.05.

    Here’s a reference to a recent summary of issues relating to multiple testing which covers interim analyses:

    Multiplicity in randomised trials II: subgroup and interim analyses .
    The Lancet , Volume 365 , Issue 9471 , Pages 1657 – 1661
    K . Schulz , D . Grimes

    This is to prevent repeated testing of accumulating data (e.g., each new patient, each new month) with the same hypothesis in the hope of finding a statistically significant result (i.e., a form of cherry picking).

    If you had stuck to your original implied plan of testing at 5, 8 and 10 years this would have meant something like testing at 0.001 at 5 years, 0.01 at 8 years, and whatever is left of your 0.05 at 10 years. Looks like you spent all 0.05 on your first look.

    If you want to test the hypothesis again, you need to do it prospectively from the end of the last data series which you tested.

  25. Garnolda–
    I don’t think I ever implied I’d test at 5, 8 and 10 year intervals.

    I’m also not submitting repeated trials as separate instances for the FDA.

    I understand your concern about drug makers not testing a hypothesis as each patient comes in, vis-a-vis FDA trials. In the case of drug companies permitting that would make, it is possible for a drug company to test, and stop collecting data when they get the result they like.

    That is not the case here. Whether I like or dislike an result, new data will come in next month. Unlike a drug company trying to game the approval process, I can’t stop the world on its axis and prevent new data from coming in.

    Moreover, any body on the entire planet can look at data at any time. They will. There is no reason I am required to stop. Your analogy with the FDA and drug testing is poor in this instance.

  26. Thanks Lucia,

    It doesn’t matter if you were planning 5, 8 and 10 year analyses (though it is in fact essential that you do prespecify when you will look, to prevent cherry-picking the analysis when the observed data look ripe). It doesn’t matter whether or not you are resubmitting to the FDA or any other organisation.

    All that matters is that you are re-testing the same hypothesis on accumulating data.

    An example provided in the paper I referenced in my earlier post notes that “.. if an investigator looks at the accumulating data at alpha=0.05 at every interim, then the actual overall alpha level rises with the number of challenges – e.g alpha=0.08 after two challenges, alpha=0.11 after three challenges and alpha=0.19 after ten.”

    The point is simple. Every time the accumulating data are challenged on the same hypothesis it costs alpha.

    It doesn’t matter whether you look at it this week, someone else next month, and yet someone else the next month ad nauseum. If it is the same hypothesis being tested, the alpha for that hypothesis will gradually be exhausted. You could pretend that it wasn’t, because it was being done by different people, but that would change the reality, would it. If you want more alpha, choose a new start date without using the same data.

    The only way to prevent against these endless problems of cherry-picking is to publish a protocol pre-specifying when you will analyze (prospectively-collected) data, how many challenges you will make, and what alpha you are willing to consume at each challenge.

  27. Lucia

    I suspect you don’t gambling for the same reason I don’t – the house has the odds. But here a.) you have the odds and b.) you’re offering to back in your analysis (which is a good thing)

    “I think 2C/century has less than 1 in 20 odds. I think that constitutes a more of a bet than what’s thrown in by many.”

    Then we’re agreed, even money on 30 years, you take falsification, I take confirmed.

    20 cookies – let’s monetize them at $1 each shall we.

    So are we done?

  28. JM–
    What’s with suggesting “Then we’re agreed we’re agreed?” rhetoric. Is this a telemarketing script where you simply constantly tell the customer they’ve just agreed to your proposal?

    You’re suspicion about the house I don’t gamble even when the house doesn’t have the odds.

    However, if I were to bet on anything to do with climate. I also would only bet home-made chocolate chip cookies. I would only bet with someone else who posts predictions at a public currently maintained blog under a their real name. That blog must have an Alexa rank no greater than twice that mine has on the day of the bet and the Alexa rank must apply to the blog individuall. (Mine has a rank of 201,938 right now. That is: if you are on blogger, and just opened it, you need to wait until the Alexa gives you an individual rank. )

    I would only do even-cookie bets to ensure both people need to invest the time to bake cookies. Then we could mail the cookies to a third party — maybe Arthur would agree, and he could store them for 30 years and at the end of time, who ever wins gets 30 year old cookies. 🙂

    I would have other conditions should someone step forward. But, basically, I’m not in the habit of betting tangible things, and I am certaintly not going to start entering real bets with commenters. There are bloggers willing to do that– contact them.

  29. Lucia “There are bloggers willing to do that– contact them”

    It’s your own bet. You made the offer in this posting, I just took you up on it and now you’re not just quibbling about the details, you’re backing away as fast as you can.

    “he could store them for 30 years and at the end of time, who ever wins gets 30 year old cookies.”

    Well that’s a fair concern. Tell you what, why don’t we do it with existing data – say the 30 years between 1978-2008?

    That way you’d get fresh cookies.

    And you’d be backing your analysis against my doubts. So unless you lack confidence in your own analysis, I can’t see where the problem is.

  30. Gardola–
    So, you are saying the Jan 01-Jan 08 data can no longer be used by anyone to test the 2C/century hypothesis? Inform the IPCC their tests must begin with data after Jan 08. Bloggers challenging the data are going to cause them big problems.

    FWIW, Roger Pielke Jr. has repeatedly suggested that the IPCC should publish a protocol pre-specyfing their test methodology to prevent just the problems you suggest. I agree with him.

    I think they should explained how they would determine the “true” temperature for 2000 or 2001 when they published in 2007 instead of leaving that in the air. (Different methods of determining past “true” temperature exist in the literature.) I think they should have explained what specific test they would use. Test the trend? Classic slide and eyeball? What error bars they would consider etc.

    Other bloggers seem to scoff at the idea the IPCC should be bound by any such pre-published protocol.

    If there were a such a pre-published protocol, I would show how the data are tracking according to that metric. In the meantime, there is no published protocol. In the meantime, I will be following the tradition in climate science: Compare to the latest data. As it happens, people publish analyses at conferences using data from TimeA- Time B. Later, when they publish their paper, they are not bar from extending to TimeC. In fact, in the case of climate science, the reviewers insist the show the latest data.

    In climate science, insisting people violate the FDA rule is the method of avoiding cherry picking. The alternative is to let people sit there, watching data until they get a result they like. Publish a result, and if your FDA rule applied, take that data out of the running for testing hypotheses. The reason the rules are different is the ways drug companies can game data are entirely different from the ways climate scientists can game data.

    But, should you get the IPCC to adopt this rule and insist on pre-published test protocols, I’ll be happy to go along! 🙂

  31. JM-
    First: It’s not “my” bet. You are the one proposing we actually bet. My only point in the post is that if bloggers are going to propose bets, and claim the proposal proves their confidence, they shouldn’t be proposing such obviously lopsided bets. Most the money bets I see proposed at blogs go like this:

    “I pick the person I want to bet. I word the bet. I structure it so I win 90% of the time if I’m right, I win 70% of the time if you are right. Oh… but the bet will be an even. And if you don’t take it, I’ll suggest that means you have no confidence!”

    Second: I don’t know what motivates others or you to bet in general. I don’t even care. The only reasons I would bet are to amuse myself, or for bragging rights. I don’t bet real money. So, if something came along to induce me to bet, it would be cookies or nothing. It would be bets involving future data, not past data.

    This is like winning ribbons at the fair. You pay the entry fee– at most you win a valueless ribbon. So, the prize is bragging rights. Amusement, fun, whimsy or bragging rights are the only things that would motivate me to bet. Also, the home baked cookies by the blogger is essential to my participation. I only bet people who are willing to invest their personal time. So, money bets would be out of the question for me.

    Third: I think I’ve the sorts of conditions that might tempt me to bet chocolate chip cookies clear. As far as I can tell, you don’t have a real name, you don’t have a blog with traffic, you aren’t willing to post your own predictions at your blog, or explain how you came by them, link me, show photos of you at the oven baking the cookies, yada, yada, yada.

    This means, currently, you don’t fall in the category of people with whom it might amuse me to bet. Should you begin blogging and posting your own opinions, I might consider betting chocolate chip cookie with you. But… probably not. The bet has to be fun and funny.

  32. We are not betting here, but the Kyoto nutcases are betting they can take your money if they scare you into thinking the Earths temps will rise 2C/century. If the rise is only 0.6C/century–.06C/Decade—I doubt they can get the world all worked up that the sky is falling.

  33. Dear Lucia

    Your last post appears to me to be saying “I know what I am doing is wrong, but everyone else is doing it, so I will too”. I can only agree with you that the IPCC and anyone else should be pre-specifying their analyses. If the IPCC is doing the same as you, then I would argue that they too are wrong.

    You and every person in the world is welcome to test and re-test the data as much as you like – I don’t really care. You (and the IPCC and whoever else is using these methods) should however recognise that a testable hypothesis (which you claim to have specified) comes with a limited quantam of alpha – you spent yours on the first analysis. Any subsequent analysis you perform on the accumulating data is spending alpha in excess of the 0.05 you used up on your first look, so you are no longer hypothesis testing – you are just playing with the numbers.

    You keep implying that I am saying that no-one else can look at the data. That is not what I am saying. Anyone can challenge the same data to test different pre-specified hypotheses. If you generate 100 such challenges, however, there will be 5 that are statistically significantly different by chance, and there will probably be far more than that if you don’t pre-specify.

    What I am raising is a general principle of hypothesis testing – it may make you feel better to pretend that it is an ‘FDA rule’ that doesn’t apply to climate science, or that you don’t have to do it because someone associated with the IPCC doesn’t, but that is merely avoiding the issue. If you disagree with the principle that multiple testing of accumulating data must be done within an alpha budget, then please show me why this principle does not apply to your analysis.

  34. No Garnoldo:

    A don’t think what I am doing is wrong, but that I do it because everyone does it. I am saying everyone does it in climate science because, if you apply your FDA rules, the results are absurd. It would permit anyone to quickly post an analysis using data as it comes in and “freeze” the data in a way that prevented hypothesis testing by others.

    I believe different fields select different conventions according to what statistical ploys are possible. Climate scientists can’t use the ploy of “stopping” trials when results are good– drug companies can. The FDA sets up certain rules based on the sorts of things drug companies can do in drug trials. Climate science picks different rules.

    You keep implying that I am saying that no-one else can look at the data. That is not what I am saying. Anyone can challenge the same data to test different pre-specified hypotheses. If you generate 100 such challenges, however, there will be 5 that are statistically significantly different by chance, and there will probably be far more than that if you don’t pre-specify.

    I agree you are not suggesting they can’t test a different hypothesis. I never suggested you were implying that.

    I am suggesting you are suggesting my actions prevent them from testing the same pre-specified hypothesis I tested. You said this repeatedly, and elaborated on it.

    As I (and many others) are constantly applying tests to the IPCC”s published hypothesis, your restriction, if it were generally valid, would give me (or countless researchers, bloggers, grad students etc.) the power to prevent the IPCC from testing their own hypothesis. The moment we test their hypothesis, according to you, the alpha is all used up for that hypothesis.

    As only a few hypothesis are of interest, that’s absurd. The consequence is that applying logic and statisticial reasoning to the possible behaviors in their particular fields, climate scientists insist that one use the most recent data. In the event of delay during publication, you will see that the papers are updated with the most recent data.

    This is the convention that prevents cherry picking under the circumstances that apply in climate science.

    However, if you believe the FDA rules should apply to climate science, it might be wise for you to contact the IPCC and explain your reasoning to them. I suspect you will find people explaining that the types of gaming possible by drug companies are exactly the opposite that is possible in climate science. Or who knows? Maybe you will convince them.

  35. Arthur-
    I mostly did what you suggest, but limited to non-volcanic time periods.
    I just tried that but restricting to periods with no volcano eruptions. Of course, I looked at the time periods with eruptions too. It is very clear most the big excursions in 7 year trends are due to the eruptions. It’s amazingly distinctive.

    Still, initially, it appeared my method resulted in undersized error bands. So, increased my uncertainty bands to 1.4 C/century. That’s what we have here — these are larger than the 1.1 C/century I would get otherwise. (Though some of the inflation was known to be due to using the 3 data services back in the 20-40s, where there are 5 now.)

    But then, immediately after I admitted my uncertainty intervals looked to small based on this comparisonto historic data, the Nature Article came out alerting us to the problem w ith the data during the early-mid 40s.

    Guess which period of time made my uncertainty bands look too large? Yep. If we eliminate the sudden downward plunge in temperature during the now-infamous bucket to jet inlet transition, my initial uncertainty 1.1C/century uncertainty bands look just fine compared to previous periods without volcanic eruptions!

    I’m a bit at a standstill on this because
    a) I am convinced I can’t use periods with loads of major stratospheric volcano eruptions (others might not be- but I am!)
    and

    b) the rather short period without volcanic activity has just been shortened until the data-authorities figure out what they think the temperatures in the early-mid 40s should be corrected to!

    So… in principle, I agree with part of your idea. But in practice, what little data I have supports the smallish uncertainty intervals.

    In the meantime, we can watch the weather. I’m also creating other graphics to see ideas that gives us about trends etc.

  36. You’d trust me to keep chocolate chip cookies in escrow? Yum 🙂

    Sure. Of course, if you ate the cookies, you’d have to fess up and replace them. I suspect this is the true solution to “stale cookies”. (It’s also probably the solution to these bets getting out of hand. I’m sure you don’t want to get yourself in a position where you need to back thousands of cookies thirty years from now.)

  37. Lucia

    Let me understand this.

    You’re offering a bet with a stake of 20 chocolate chip cookies to the proposition that your model (spreadsheet) falsifies the IPCC for the period 1978-2008 (because you fairly reasonably insist on fresh cookies, and I suppose expect resolution within a reasonable timescale)

    But, you’ll only accept takers who have blogs with a ranking no more than twice yours.

    Is that right?

    I’m sorry I can’t meet that last condition. I don’t really know why you restrict the takers like that – it doesn’t seem relevant to me. Especially since you put up the challenge to demonstrate confidence in your own analysis in the first place.

    Can I conclude from that somewhat bizarre last minute condition, that you aren’t confident of your analysis?

    I hope someone else steps in one day. I’d like to see the results.

  38. JM–
    Alexa ranks no more than twice mine.

    If someone who is expressing their own views publicly is willing to step up, I am willing to negotiate bets. My chocolate chip cookies are very delicious, but they take time to bake. If I start making bets with every anonymous blog visitor who wants to challenge, I will be spending my live in the kitchen. I need to create some criteria for even considering bets.

    The reason I limit to bloggers stating their position on the probable warming in some quantitative sense is that I want to be able to exchange paired bets. If I am to back my position, they state and back their view. If the think it’s 4C/century, fine! If they think it’s 0C/century, fine. We can negotiate to come to some bet we agree represents both our positions.

    But, otherwise, it’s one sided. And that I am afraid is the problem with bloggers flinging out these challenges. They won’t risk their own positions.

    In any case, simply don’t have time to bake cookies for every one in the blog-o-verse who might decide to propose endless series of ill-defined meaningless bets in the hopes of winning the cookies.

    With regard to you: If I got to bet live grass in my backyard is green right now, this instant, and you bet it wasn’t, I still wouldn’t place the bet. Yet, rest assured, I’m very confident the grass in my yard is green right now. (My desk faces the window.)

  39. Lucia: ” I will be spending my live in the kitchen. ”

    Only if you lose.

    I’ll take that as a concession that you aren’t confident of your analysis.

  40. JM–
    In normal bets the goods are placed in escrow. So, the deal is, when the bet is placed, I bake my 20 cookies. The other person bakes 20 cookies. We mail them to Arthur. So, yes, if I were to accept many bets, I would be baking lots of cookies.

    Arthur said, “Yumm”. If I took too many bets, I think Arthur would become very, very fat. Otherwise his residence would fill to overflowing with stale cookies.

    Clearly, some sort of legal requirement describing Arthur’s precise responsibilities would be required. He might be permitted to act as bankers do, and sell the cookies in a bake sale for his favorite charity, and then buy back replacement cookies from some suitable bake sale.

    Obviously, Arthur would need to be consulted about his willingness to act as escrow agent under these circumstances. The responsibilities are clearly potentially burdensome.

  41. “In normal bets the goods are placed in escrow.”

    Sorry, I thought we’d resolved that when you refused to monetize it, and when you agreed to 1978-2008.

    There’s no need for escrow.

  42. JM–
    You proposing something and my refusing your proposal does not constitute me accepting your proposal. 🙂

  43. “You proposing something and my refusing your proposal does not constitute me accepting your proposal.”

    Oh, I agree. But my acceptance of your offer of cookies is offer and acceptance. A contract in other words.

    You put up the bet, you put up the structure, I accepted it.

    You then placed what I consider to be an unreasonable and inexplicable condition on it – that I have a blog with an Alexa rating no more than twice yours.

  44. JM–
    I have no idea how you developed the notion that I suggested I would bet on anything associated with the 1978-2008 time span. You suggested that span. I specifically rejected any bets on past time. See comment 3155 “So, if something came along to induce me to bet, it would be cookies or nothing. It would be bets involving future data, not past data.

    I’ve specifically explained that, I would not bet you and explained why. You are not required to accept my reasons and can speculate what my refusal might mean. The hypothetical discussion has been fun. But given your general tendency to become utterly confused over who said what, I’m no longer discussing this betting with you.

  45. “the notion that I suggested I would bet on anything associated with the 1978-2008 time span”

    Well because you didn’t object to it until now.

    Secondly because you proposed a 30 year time span in your bet.

    Thirdly because you complained about waiting 30 years for a resolution because your cookies wouldn’t be fresh. You also noted that the data series would probably not be stable over that time. A reasonable objection IMHO.

    Fourthly you *do* use past data rather than restricting yourself to currently non-existant future data – your analysis starts at 2001 (which overlaps with 2008 btw)

    What’s wrong with starting at 1978? If you were really confident of your analysis I mean.

  46. Good morning Lucia

    1) The implications of multiple testing of accumulating data are statistical facts, not just ‘FDA rules’ – though I would hope that FDA rules also take these facts into account.

    2) Multiple testing of the same hypothesis on accumulating data

    In the first post where I raised this issue, I had in mind collusion: Author A testing 4 years and agreeing with Author B to test at 8 in the attempt to both to get their own quota of alpha. This is no different to an individual/research group testing twice at 4 and 8 years. The implication in both cases is that the alpha after the second test is 0.08. This can and should be prohibited, such that individual authors are restricted to an alpha budget of 0.05. Authors privately colluding to circumvent this restriction are unprincipled and cannot effectively be defended against.

    You raise valid problems of multiple authors testing publicly available data on identical hypotheses. If these authors have independently pre-specified analyses, they are entitled to seek to publish them. The only defence against extreme numbers of articles testing the same hypothesis is the journal review structure. If author A planned an analysis at 8 years and Author B got in at 4 years with the identical hypothesis on data commencing on the same date, Author A should be aware of and discuss the implication – alpha for the hypothesis after the second test is 0.08 – but cannot be held responsible for the decision of Author B to publish at 4 years. In the end, editors and reviewers have to do their jobs: not publish poor studies, not publish studies where beta is too low etc. – the relevance test should limit the frequency of multiple reports of the same hypothesis on the same accumulating data.

    The blogoshpere will need to be treated as if it doesn’t exist as some people will do foolish things like test a hypothesis every month until they find a nominally statistically significant result. This is wise, in any case, because the blogosphere is unaccountable (it publishes what the blog owner thinks is valuable, without independent peer review).

    3) Updating data between abstract and conference

    If climate researchers are updating their hypothesis tests on accumulating data between abstract submission and conference dates they should either plan their alpha budget accordingly. Failing that, everyone should recognise and acknowledge that they have exceeded their alpha-budget.

    You seem to be arguing that presenting the most up to date data is the only defense against cherry-picking. Pre-specifying analyses (including start and end dates) on prospectively-collected data is far better. Adding a month or two of data to a data series lasting many years and where you can see all of it (except the last one or two months), is irrelevant if the start date is carefully chosen in a biassed fashion.

    4) So where do YOU stand?

    Having tested the hypothesis at 7 years using 95% CIs you used up your alpha of 0.05 for testing this hypothesis. Each time you test again, you are increasing the alpha above 0.05, incrementally with each test. Simple question: do you agree with this statement? If not, why not?

  47. Garnolda:
    I have not argued that using up to the minute data is the only defense to cherry picking. I am arguing that in the context of climate science, it is a necessary defense.

    If author A planned an analysis at 8 years and Author B got in at 4 years with the identical hypothesis on data commencing on the same date, Author A should be aware of and discuss the implication – alpha for the hypothesis after the second test is 0.08 – but cannot be held responsible for the decision of Author B to publish at 4 years.

    In climate science, authors do not file test plans to do such and so statistical test after N years. That’s done in drug trials.

    I think I already said where I stand. I am following the convention in climate science: Tests if hypotheses involving use of things like “official weather data” based most current data are considered more definitive and, in a large sense, replace the old ones.

    I believe this idea makes sense given the way tese data are disseminated in climate sciences. There is no way an individual scientist wishing can, by “freezing” a trial prevent others becoming aware of future data. As such, any result of any hypothesis test can be revisited by anyone at anytime based on newer data.

    If you wish to impose rules that make sense for drug companies on climate science, you are are going to have to convince all the people testing theories on the effects of SST on hurricanes, AGW on SST’s & etc.

    They will almost certaintly disagree with you. Some may suggest that to prove your point you adapt the analysis in the paper you discuss to consider the actual current practices in climate science with respect to GMST:

    1) No scientist can stop the on going accumulation of data, ever. (This is a reality with regard to GMST. It is not so for a drug company).
    2) All are required to re-do analysis including the most recent data at the time of publication.
    3) If two analyses use otherwise identical techniques, differing only in the end date, the one using up-to-date data are considered currently definitive.

    Then, see how that affects the probability of people making a false positive (or false negative) error at a particular time.

    Should you show this practice results in excess false positives, you will likely convert climate scientists to your point of view. Meanwhile, I suspect you will discover that, since 1-3 actually apply, they work perfectly fine.

Comments are closed.