Is 2007 Highest Melt Index? Statistical Significance.

Update: This post has been revised right now because I realized that I missed an issue in the analysis. I’m adding a figure and revising the numbers. I also changed the question from Q1(a) because there is a very slightly different version of the question that relates to “What is the probability 2007 was a record?” That question is slightly different from the question of rejecting a null hypothesis.
— — —
On a previous thread, Steven Mosher asked:

“The greatest melt extent over the last 2 1/4 centuries occurred in 2007; however, this value is not statistically significantly different from the reconstructed melt extent during 20 other melt seasons, primarily during 1923–1961.”

How high would 2010 have to be to be statistically different from the reconstructed melt?

The answer is: Let me reword your question so it can be answered. 🙂 I know this might sound like “reframing”, but unfortunately, sometimes questions need to be tweaked a little for me to figure out an answer. So, the question I’m going to try to answer is this:

Q1a: How high would the reported ice melt index for 2010 have to be in order for us to reject the null hypothesis that the ice melt index for 2010 is less than or equal to the highest ice melt value that would have been recorded by a satellite prior to 1979?

I think this is the testable question is the one most closely connected to the notion of “statistically different from the reconstructed melt”, but people could mean other things.

To simultaneously explain how I get the answer, I’m going to sneak up on it by beginning with easier problems.

Compare ice melt index for 2007 to ice melt index for 1928 only.

It is conceded by all that 2007 has the highest recorded ice melt index in FKM’s time series for ice melt computed by combining values from 3 research groups. Chip sent me data for FKM figure 2 and the specific ice melt index for 2007 happens to be $latex \displaystyle MI_{2007}= 2.01521789 $. (From now on I’ll round– but the spread sheet hadn’t rounded from prior computations and oddly enough the precise numerical results at the end of the full computation are affected by rounding.)

So this is the maximum value for any melt index based on satellite measurements.

Meanwhile, FKM also estimated the value of the Melt Index that the satellite would have reported if it had been operating prior to 1979. The highest estimate is from 1928. For that year, FKM’s reconstruction gives an estimate of $latex \displaystyle MI_{1928,est}= 1.64 $. This value of 1.64 is FKM’s best estimate of the Melt Index a satellite would have reported had it been operating in 1928.

Noticing the value for the melt from 2007 is the higher than the maximum value prior to operation of the satellite (i.e. $latex \displaystyle 1.64 < 2.015 $ ) one is tempted to conclude the melt index in 2007 must be higher than the value that would have been recorded by the satellite if it had been operating in 1928. Unfortunately, we can't be certain of that because 1.64 is an estimate based of what one would have measured by satellite and includes considerable uncertainty and 2.02 is a more direct measurement by satellite. (For purposes of further discussion I'll assume the satellite measure melt index perfectly. ) Now, we will turn to hypotheticals. Let's define a "true" melt index for 1928, as $latex \displaystyle MI_{1928,true} $, which corresponds to the value a satellite would have reported if it had been operating in 1928. The magnitude of $latex \displaystyle MI_{1928,true} $ was never observed but this unknown magnitude is the value 2007 needs to beat to be called a record. Because this true value was never observed, we can only call a record based on some probability that we reject the null hypothesis that the 2007 was less than or equal to this true value for 1928. We'll now test this null hypothesis $latex \displaystyle H_{0} $ against an alternate hypothesis $latex \displaystyle H_{1} $: $latex \displaystyle H_{0}: MI_{1928,true} \geq MI_{2007} $
$latex \displaystyle H_{1}: MI_{1928,true} < MI_{2007} $

If we reject the null hypothesis we will accept that the melt index for 2007 is greater than the true value for 1928.

Now, let’s estimate the probability that the reconstructed value $latex \displaystyle MI_{1928,est} $ could be 1.64 or lower if the true value was equal to the value obsered in 2007. That is, we are computing the probability $latex \displaystyle MI_{1928,est} =1.64 $ assuming $latex \displaystyle MI_{1928,true} =2.015 $ . When doing this I’ll assume the error $latex \displaystyle e = MI_{1928,true} -MI_{1928,est} $ is normally distributed with mean zero and a standard deviation,$latex \displaystyle \sigma_{e} $, I can obtain from figure 2 in FKM.

Update: This is the heart of the major revision. The main issue addressed is the assumption the error would have a mean of zero conditioned on the assumption that $latex \displaystyle MI_{1928,true} =2.015 $. This assumption is inconsistent with the reconstruction itself. ———-

To determine the appropriate magnitude for the mean value of the error conditioned on the assumption $latex \displaystyle MI_{1928,true} =2.015 $, we must think about the conditional probability and what happens when we regress the true (i.e. satellite) Melt Index against temperature. Under this view, the “true” satellite reading can be high either because the ‘explained’ part of the melt index was higher or because the “unexplained” part is higher than average. The explained part is the magnitude of the reconstructed value; in this case that is $latex \displaystyle MI_{1928,est} $. The unexplained part is the error. When reconstructions are accomplished the errors are uncorrelated from the explained or reconstructed values. However, they are correlated with the true value.

The fit between the errors and the true (i.e. satellite) values in the reconstruction is shown below:

Note that the errors are higher for higher magnitudes of the true value! (This would not be the case if we examined the fit between the errors and the reconstructed value of the melt index. Those are uncorrelated as required of a reconstruction.)

The consequence of the correlation between errors and true values in MI is that under the present null hypothesis, if the true value in 2005 had been 2.015, the mean error would be anticipated to be positive, not zero. The result is the cumulative probability that I need to adjust my previous assumption to :

I’ll assume the error $latex \displaystyle e = MI_{1928,true} -MI_{1928,est} $ is normally distributed with mean 0.702 and a standard deviation,$latex \displaystyle \sigma_{e} $, I can obtain from figure 2 in FKM.

The mean value of 0.702 corresponds to the expected value of the residual if $latex \displaystyle MI_{1928,true} =2.015 $, which is what we are assuming for this particular test.

—————–
All further revisions will be indicated by strikeouts and replacements where recognition of this detail affected the result.
End explanation for major revision.
———-

FKM indicate that the upper $latex \displaystyle 2\sigma_{e} $ uncertainty interval for $latex \displaystyle MI_{1928,true} $ is 2.89. (I know those are really $latex \displaystyle 2\sigma_{e} $ uncertainty intervals because I asked.). So, I compute $latex \displaystyle \sigma_{e} = 0.629 $ This is an estimate of the standard error which was based on the residuals from the reconstruction. It is known that the sample variance of a sample population is unbiased, but the standard deviation is biased low. I will not try to correct for this bias and I’ll just assume the standard deviation of the errors is equal to the standard deviation based on the residuals from for the reconstruction.

Based on this standard error, I turned to the normal distribution, and found if $latex \displaystyle MI_{1928,true}=2.015 $ the probability the reconstructed value for 1928 would greater or equal to the value in the reconstruction is 72.7% 30.5%; (i.e. $latex \displaystyle \left( MI_{1928,est} \right) $ ). This probability is less than 95% so we fail to reject the null hypothesis and we do not call 2007 a record based on the fact that reconstructed (i.e. estimated) value of 1.64 would is within the uncertainty range for the observed value in 2007.

Getting closer to answering Q1 above: If we are actually going to do a traditional hypothesis test and call a record based on consideration of 1928 only, if we agree reject at p=0.95 and do the test 1 sided (as I set up with my null and which seems fair to me) and we think they only thing required is to beat the maximum reconstructed value, we would need the $latex \displaystyle 2.67+0.7< MI_{2007} $ (If someone insists on two sided we need $latex \displaystyle 2.87+0.7< MI_{2007} $.)

But there are other years!

It may see odd to some, but we can’t call a record based on besting the maximum value in the reconstruction only. Let’s consider two years.

Repeating the previous calculation using values from 1931, I get p=0.730.34. This is less than 0.95 which is the cut off we use for a 1 tailed test. So, we don’t reject the hypothesis the true MI for 2007 is less than the true value for 1931.

Now consider this: For 2007 to be a record, $latex \displaystyle MI_{2007} $ must be greater than both $latex \displaystyle MI_{1928,true} $ and $latex \displaystyle MI_{1931,true} $. If we assume the errors in the reconstructions are i.d.d.,1 then assuming the null hypothesis is true, the joint probability of the observed event is p= ( 0.76)( 0.73)=0.10. This is well below 0.95 traditionally required to reject a null hypothesis. So, we are even further from rejecting the hypothesis that the MI in 2007 is less than or equal to that for both these years.

Someone is going to notice there are many years in the reconstruction and to call a record, $latex \displaystyle MI_{2007} $ must be greater than the true value of all of them. So, we can progressively consider each additional year. If we include only the 20 years that happen contain the MI for 2007 inside the ±95% uncertainty intervals, for the uncertainty in the reconstruction, we find the probability that we would that $latex \displaystyle MI_{2007} $ exceeds the $latex \displaystyle MI_{year,true} $ for all 20 years of those years is p=.02%.

This indicates that we can’t reject the possibility that 2007 falls within the range for the reconstruction. In fact, even if we only had data from the maximum 20 years in the reconstruction, given the uncertainty in the reconstruction, 2007 may well fall comfortably in that range.

But those rooting for 2007 to not be a record are going to want me to include all years in the reconstruction. After all, even if we would have rejected the possibility that 1927 had a lower melt index than 2007 (p=0.98), there is a possibility that it didn’t. If we account for all those years, under the null hypothesis, we get p=12.6%4e-04% which is well below the threshold of 95% we would use to reject the hypothesis that $latex \displaystyle MI_{2007} \leq MI_{all,true} $.

Notice: I haven’t actually answered Mosher’s question. To do that, I would have to create a loop and iterate over my R script to find the actual value of the MI that would result in p=95% when tested against all the values in the record. I’m too lazy to figure that out yet… So, for now the answer is “Let’s not worry about that until the satellite records a Melt Index of at least 2.67. Which it hasn’t. Yet.”

Are there any caveats?

Of course there are caveats. The list of caveats I can think of immediately:

  1. Errors might not be normally distributed.
  2. I used the sample standard deviation. This is biased low and includes uncertainty. Correcting for the low bias should reduce the p values.
  3. I neglected any error in the satellite MI. I suspect including them will reduce the p values by a small amount for this analysis. (Note: it should not be neglected if you wish to compare satellite MIs to each other.)
  4. The whole thing is based on a particular reconstruction for the past MI. It’s possible a record might be called if the results of the reconstruction were different. The best estimates would be obtained from if someone was able to develop more accurate reconstruction with lower noise. When someone publishes one, we can re-do this.
  5. As everyone knows, there are those who think the MI for 2010 will change everything. The MI for 2010 is not the MI for 2007, so we can repeat the test when data for 2010 become available.

I hope this answers Mosher’s question. If Mosher really had a different question, I’ll be happy to try to answer that (rewording if necessary.) But for now:

  1. If someone who is rooting for a record wants to insist on doing the comparison to only the maximum year in the reconstruction: Wait for a melt index of 2.67 averaged over three groups used to create the reconstruction. Or wait for a new reconstruction to be published.
  2. Those rooting against a record and insisting on comparing to all the data– which is fair– if we get a MI over 2.67, we can repeat the calculation and see what p value we get. In the meantime, no one can call a new record based on the FKM2011 reconstruction.
  3. If someone comes up with a new reconstruction, we can repeat a similar calculation and also debate the merits of the various reconstructions.

Updates: Because this is a revision, it’s worth emphasizing this extremely low value does not correspond to the probability that 2007 is a record. This is the answer to a “fail to reject” type question.

To estimate the probability 2007 was a record we will instead look at the probability at least 1 of the previous years has a true melt index above 2.015. In contrast, this analysis assumes any of the years did achieve the true value seen in 2007 but had a lower reconstructed value owing to errors in the reconstruction. I’ll be posting the probability that 2007 was a record: for that analysis, the assumption the errors have a mean value of 0 conditioned on the known values of the reconstruction is fine. The probability the 2007 was a record is 12.6%. (It’s also a more reasonable question to ask of a noisy reconstruction.)

—-

1. Note: The question of whether errors have temporal auto-correlation came up in comments. The correlogram for the errors is shown to the right and suggest the temporal autocorrelation is small.

2. In comments at Skeptical Science, Tom commented on the assumption of normality.

21 thoughts on “Is 2007 Highest Melt Index? Statistical Significance.”

  1. Out of curiosity, about how high would the 2010 Melt Index have to be to confidentially say it was higher than 2007?

  2. I’ll stipulate that your approach to comparing 2007 with reconstructed 1928 makes sense. But I don’t agree with your extension of this approach to calculating the odds that 2007 was a record when considering multiple years of the reconstruction (e.g. adding 1931).

    Your analysis assumes independence of the reconstructed values for 1928 and for 1931. This seems likely to be invalid to me. If the FKM proxy method gives a high (or low) value for 1928, that same method probably also gives a high (or low) value for 1931.

    If this is so: then the probability that 2007 is a record year when compared to all years in the reconstruction may be only slightly less than the odds that 2007 is a record year when compared to the single highest year in the reconstruction (1928).

  3. AMac, the issue you raise should be considered, but unless one knows errors are likely to be biased, not random, you can’t assume any sort of correlation. You’d need to at least look at how the original certainty levels were calculated to have any idea if you could make the calculations more precise.

  4. Amac-

    Your analysis assumes independence of the reconstructed values for 1928 and for 1931. This seems likely to be invalid to me. If the FKM proxy method gives a high (or low) value for 1928, that same method probably also gives a high (or low) value for 1931.

    That’s not the assumption I made. I only assume independence in the residuals for the reconstruction (i.e. errors.) That could be checked– but I didn’t. Also, I didn’t mention this assumption and I ought to both mention and check.

    I’m pretty sure fact that the temperature or melt indices themselves are certainly temporally autocorrelated this particular question of whether or not we can call a record. It’s true warm periods contain many warm years and it’s during those periods that records are broken. But the question about whether or not a record was broken is related to the error in the reconstruction. If those aren’t correlated, the fact the MI themselves are correlated doesn’t matter.

    BTW: I’ll probably eventually be doing some Monte Carlo, and I am certain my claim only temporal auto correlation in the errors in the reconstruction matters. That’s best checked by looking at autocorrelation of the residuals– which I did not do. (But should have. )

  5. Well… that was painless. This is the correlogram for the residuals of KFM’s reconstruction:

    So… it looks like the assumption that errors are uncorrelated is defensible.

  6. A note for readers who don’t have a maths/stats background. (Do we have such readers?).

    Failing to prove that 2007 is a record is not the same as proving that 2007 is not a record. It just means that we don’t know.

  7. Michael J–
    Yes.I’ll be addressing the second question of whether we can also show that 2007 is probably not a record. Guess what the answer is? 🙂

  8. Probabilities and rankings are interesting partners. They raise the possibility of all sorts of interesting questions and calculations, like “which year is most likely to have had the highest melt index?” or “what does the uncertainty range on the 10-year running average look like?”

  9. Re: Michael J (May 7 12:35),

    > Do we have such readers?

    … Yes!…

    > Failing to prove that 2007 is a record is not the same as proving that 2007 is not a record.

    From the body of the post,

    Based on this standard error, I turned to the normal distribution, and found the probability that MI-1928.true > MI-2007.true is 72.7%. This is less than 95% so we fail to reject the null hypothesis and we do not call 2007 a record based on the fact that it can’t be distinguished from 1929 1928.

    I interpret this as “Based on the FKM proxy reconstruction, there’s a circa three-in-four chance that the melt in 2007 was higher than the melt in 1928. And there’s a circa one-in-four chance that it was lower, or the same.”

    Similarly for comparisons of 2007 to multiple years in the pre-satellite past.

    Does that stand up?

  10. Nebuchadnezzar

    They raise the possibility of all sorts of interesting questions and calculations, like “which year is most likely to have had the highest melt index?” or “what does the uncertainty range on the 10-year running average look like?”

    Some of these are easiest: If we examine years prior to 1979, the year most likely to have the highest true MI is 1928. Given the assumptions above (which are rather mild and rather usual- like Gaussian errors). It’s the most likely to be the warmest because it’s the warmest in the reconstruction.

    Based on the computation above: Of all years, the year most likely to have the highest true MI is 2007. That is: given what we know, it’s more likely 2007 was the record year rather than 1928. But that doesn’t mean it’s actually likely 2007 was the record year.

    Sometime early next week, I’m going to discuss the probability 2007 actually has the highest Melt Index. Unless I’ve screwed up, the likelihood that 2007 was the record year is… get this p=12.6%. Very low. The assumptions used to demonstrate that will be exactly the ones used in the post above.

    The discussion needs to be deferred because I need to explain something some people will find paradoxical. I’ll need to explain the paradox and why there is no real contradiction. Showing this will require several figures and words.

    Oh– the uncertainty in the 10 year running average can also be computed. That’s actually very easy.

  11. Lucia, “Based on this standard error, I turned to the normal distribution, and found the probability that MI1928true>MI2007 is 72.7%. ”

    You are saying that if MI1928(est) = 1.64 and MI 2007 = 2.02, there is a 72.7% chance that MI1928true>MI2007? I don’t think that is possible. No matter what the uncertainty is in MI1928(est), only 50% of the confidence interval about MI1928(est) is greater than 1.64. Thus the probability that MI1928true>MI2007 must be less than 50%.

  12. Here’s a naive question.

    Does it really matter if the melt run is higher in 2010 or 2007 or 1928?

    We can measure things a lot more accurately these days but our measurements still have problems and are based on relatively short timescales.

    Who knows what the extent of the Greenland melt run was during the MWP and how it varied over those centuries?

  13. Re: Owen (May 7 14:51),

    As you note, Lucia wrote,

    Based on this standard error, I turned to the normal distribution, and found the probability that MI-1928.true > MI-2007.true is 72.7%. This is less than 95% so we fail to reject the null hypothesis and we do not call 2007 a record based on the fact that it can’t be distinguished from 1928.

    It seems to me that the use of > (greater than) is inadvertent. The logic of the post holds, if Lucia meant < (less than).

    I misread, and was taking it that way until you pointed this out.

  14. Shouldn’t the null be that 2007 is indistinguishable from 1928(est)?

  15. Notice: I haven’t actually answered Mosher’s question. To do that, I would have to create a loop and iterate over my R script to find the actual value of the MI that would result in p=95% when tested against all the values in the record. I’m too lazy to figure that out yet… So, for now the answer is “Let’s not worry about that until the satellite records a Melt Index of at least 2.67. Which it hasn’t. Yet.”

    #######
    or you could calculate it for 2.0, 3, and 4 and then we could bet on the exact number

  16. AMac (Comment #75680)
    May 7th, 2011 at 4:07 pm
    It seems to me that the use of > (greater than) is inadvertent. The logic of the post holds, if Lucia meant < (less than).
    I misread, and was taking it that way until you pointed this out.
    ———
    That would make sense.

    I don't have access to the original paper, but I wonder how good the estimate of uncertainty in MI1928(est) might be. I can't imagine it would be any good at all (which I guess the sigma of 0.629 means). I read this whole business to mean that we cannot demonstrate that 2007 is statistically greater than 1928 (or other pre-satellite years) because the error of MI estimation of those years is so high.

  17. Owen–
    Oopps… I’m going to have to go back and tweak some words. I wrote language that’s going to make more sense for Monday’s post when I talk about the probability of a record rather than the issue of whether or not we can reject. Sorry about that– I came quickly taking a break from watching a movie. I’ll tweak the language a bit later.

    One the whether or not the reconstruction is “any good” issue: all reconstructions have uncertainties. You can’t assess whether it is “any good” based on knowledge of the standard error (i.e. size of residuals) only. If you are going to think about the 0.63 you need to put that in context of the full spread of the true Melt Indices, and looking at the correlation coefficient “R” is a better metric than the magnitude of the standard errors. (I’ll report the “R” later– not right now.)

    But more than that, there are more important things than correlation coefficient. For example: You want to think about whether a reconstruction might be biased in some unknown way — that’s worse than having an R in the low range.

  18. Layman

    Shouldn’t the null be that 2007 is indistinguishable from 1928(est)?

    No. For Mosher’s question, there are 2 plausible nulls.

    One is: The satellite measurement for 2007 is equal to the “true” value for 1928. We can’t know the “true” value for 1928, we only know the ‘est’ and I model the error as white noise. So, I can test this null.

    The other is the satellite measurement is less than or equal to the “true” value for 1928.

    I’m testing the latter because it’s the only one we can test when we start considering a record relative to melt indices in more than 1 year.

  19. Owen–
    I’m getting an “R” value of 0.8. (This is the standard deviation of the reconstruction / sd(melt index) ). So this reconstruction explains about 65% of the variation in the melt index leaving the remaining 35% unexplained by the proxy.

    Whether this is “good” or “bad” depends on whether there are other better proxies possible and what you are hoping to learn from the proxy. Certainly this is useful relative to nothing at all– and we’ll see Monday or Tuesday that it is useful for detecting whether or not 2007 is probably a record.

    Getting back to “good” or “bad”, it’s really, a more important factor would be whether or not there is any strong bias over time introduced by the method and that can’t be detected by “R”.

Comments are closed.