C Randles posted his forecast of the NSDIC September Ice Extent at Neven’s blog and is interested in comments. He dropped by here to ask:
Your 4.47 is very close to my 4.45 posted at
http://neven1.typepad.com/blog…..ution.htmlI got my RMSE down to .134 but I could be in the land to overfitting with far too many parameters I can fiddle.
How do our RMSE compare or any other comments?
To focus on Crandles questions, I’m going to begin by assuming that readers are already familiar with the general features of my weighted method for predicting the minimum ice extent; and know the general outlines of my submission to SEARCH. Those wanting more details should read the previous posts (or ask in comments.)
For today’s post, I’ll merely note that for the SEARCH submission, the weighted model was based on
- NSDIC extent v(current 7-day ave JAXA Extent,current 7-day ave CT Area) w=58.4%
- (NSDIC extent- current 7-day ave JAXA Extent) v July 31, PIOMAS Volume w=6.4%
- (NSDIC extent- current 7-day ave JAXA Extent) v current CT Area. w=35.2%
The more detailed information to address CRANDLES question is below, with the residuals to the individual fits shown in bold:
| model | AICc | prediction | σR | σT | bias | conditional standard error | weight |
| NSDIC vs.Extent & Area | 863.1 | 4.47 | 0.159 | 0.177 | -0.006 | 0.177 | 51.0% |
| Loss v. Volume | 866.8 | 4.45 | 0.172 | 0.196 | -0.022 | 0.197 | 7.8% |
| Loss v. Area | 863.5 | 4.49 | 0.164 | 0.181 | 0.012 | 0.182 | 41.2% |
| Weighted | NA | 4.48 | NA | 0.180 | NA | NA | NA |
I think with respect to Crandles question the answer is as follows:
For each candidate model, I ran a linear regression between the regressand and the regressors. The standard error for the residuals (i.e. standard deviation) is listed as “σR” and shown in bold. This is an estimate of the variance around the “true” fit between the regressand and those regressors. It would be reported as the standard deviation of residuals in EXCEL. Of candidate models tested, the one with the lowest standard error for residuals was the regression between NSDIC Sept. extent and 7 day average JAXA & 7-day average CT today. For this case, the standard error for residuals was 0.159. This is a little larger than you, CRANDLES got. I might have been able to find a fit with lower standard errors for residuals but I didn’t attempt regressions with 2-day, 3-day or any other choice of smoothing days.
Having said that, I don’t report these errors. R permits me to easily report confidence intervals for a prediction of a value “Y” at current values of “X”. The confidence intervals include the standard error due to the residuals (i.e. the ‘noise’ around the fit) and the additional error due to our uncertainty in the fit parameters assuming the fit is “true”. That is: suppose we can, for some reason, be certain the functional relation between “x” and “y” is of the form y=mx+b but don’t know the values of the trend and intercept (m and b). We then collect N (x,y) pairs of data and estimate the magnitude of “m” and “b” based on the data. If we assume the residuals are independent of each other, we can estimate the standard error our estimate of the parameters “m” (and also “b”.) Once have the estimates for (m,b) and their standard errors, we can estimate the future value of “Y” given some known value of “X” using Y=ms X + bs where “s” indicates the estimated values based on our sample. But we can also estimate an uncertainty arising from the uncertainty in our estimate. (Doing this in EXCEL is conceptually easy, but also, something of a pain in the rear. But, it can be done. )
For each of the three candidate models, the confidence intervals for my prediction of the NSDIC September value are provided in the column labeled “σtot“. Note these are larger than the standard error of the residuals for a given candidate model. These confidence intervals are what I would report if I selected one candidate model from the collection and used that to base my forecast. My forecast would then be giving confidence intervals under the assumption that my best candidate model is somehow “true” with the only uncertainty arising from the residuals to the fit and the uncertainty in my knowledge of the fit parameters.
However, I don’t do that. Instead, I find the candidate model with the lowest corrected AICc coefficient, and only consider it ‘the most probable’. Others regressions are considered less probable. Under this view, the probability distribution for the full model could be considered to be the sum of the product of the probability distribution of each candidate model and its weight. Some features can described by visualizing.
In the figure below, the product of the weight of each candidate model and it’s probability distribution (assumed normal) are shown in red, green and blue. The ‘red’ trace is 0.509* the normal distribution centered around the forecast computed based on the 7-day Jaxa extent and 7-day CT area. The vertical red line is the forecast I would report if I believed this model was “true”. Similarly, the blue and green traces are probability distributions for other models weighted by their weights. The vertical traces indicate the forecast I would report if I believed they were “true”.
For my forecast, I did not report the red, blue or green lines, but rather the value corresponding to the black vertical trace.
Now, scan up to the table above. Notice the columns indicated as “bias” and “conditional standard error”. The “bias” column indicates the amount my forecast based on the black trace will be biased if a candidate model is “true”. That is: Suppose the “red” model really is “true”. In that case, my forecasts would certainly be better if I just used the red model and forecast the ice extent based on the vertical red trace. But I didn’t– I forecast according to the black trace. So, if the “red” model is correct, but I use the “black” weighted model, over time we’ll see the observed extents will come in about -0.006 millions square kilometers below my forecasts. Also, over time, we’d also observe the rms error in my forecasts will be the square root of the sum of the squares of σtot and the bias. This value is listed under conditional standard error.
Because my prediction for NSDIC September NH Extent was based on 3 candidate models, my final total error is obtained by taking the square root of the weighted sum of the squares of the conditional standard errors. This value worked out to 0.180 million square kilometers, which exceeds the 0.159 million square kilometers for my “best” candidate model.
Although the stated uncertainty intervals are higher than I might claim by predicting based on the “best” candidate model, I think using the weighted method captures the uncertainty I believe exists in our knowledge of the “true” model to predict ice extent. Although prediction based on extent and area appear ‘best’, I’m not confident the model is “true” as I might be if we were dealing with known physics like “F=ma”. So, this method expands the uncertainty intervals in a way that is consistent with my level of doubt, but also permits me to take credit for regressions that really do seem to have some ability to forecast.
I hope this answers Crandle’s question without having gone too far afield! I’m happy to entertain other questions about this method including criticism. I have my own criticism about the method, but I’m not going to dwell on them in the main post.
Note: If you are puzzling why the weighted prediction submitted to search and the one discussed here differ slightly, it’s because I’m retroactively recalculating and today I know a few more days worth of Cryosphere Today data than I did when I computed for the SEARCH submission. The addition of a few days of Cryosphere Today data shifted the prediction and weights slightly.

Uhm, what is a “verticle?” 🙂
A verticle is at right angles to a horizonticle.
Thank you for the post. I suspected I would have narrower error range. I do have a worryingly high number of tunable parameters:
6.6 for weighted average of extent and area,
10 for area,
92 days for averaging
Regression slope and
Regression intercept
With that many parameters the general shape is going to be very good. However, I don’t see how I could or have tuned them to get the year to year variability predicted reasonably well given other sources of noise that I haven’t suceeded in trying to model.
eg
1. Winds can be dispersive or compacting.
2. Cloud cover can vary changing actual energy captured vs potential energy captured.
3. Ocean heat transport might vary.
4. Heat captured might melt more area if the relevant ice is thin.
5. Isolated areas may melt out almost regardless of energy available.
6. Other aspects of weather like temperature.
7. Other distribution issues like energy near ice being more likely to melt more ice.
8. Remaining ice salinity.
and probably a few more.
That a fair bit of the year to year variability is predicted seems to suggest to me that potential energy capture is at least one of the major sources of year to year variability. Is this a sensible conclusion to reach?
If that is reasonably sensible, does this rule out albedo effect being relatively minor compared to ocean transport?
crandles–
On the physics type part:
I’m not trying to model so much– just see what falls out of the fit.
Given the quantity of data available on the web, and the relatively short time span accessible to the two of us (and others commenting at Neven’s), I’m not sure we can really tease out which mechanistic explanations make the most sense based on the data. I suspect we might be able to rule out some notions, but many plausible sounding explanations will be fairly consistent with the data we have. (Ice modeling groups may have more detailed daily data– but I don’t!)
So, all I’ve been trying to do since about july/Aug is is try to include regressors that could, hypothetically, be related to some physical process I could imagine affecting melt. Of course, I avoid including “ridiculous” regressors– like say, extent regressed on Dow Jones Average. With respect to physics, I can usually think of some physical explanation why any of the regressors I pick ought to be “best” in some sense, but I honestly can’t say I favor one regressor over another based on what I know about phenomenology of ice extent (which is not a lot.)
You noted the relationship between remaining area and albedo. But area could also reflect the size of the region with temperatures that are lower for other reasons– for example, the reason a particular region might be ice free is that warm water may have entered through ocean transport so that region is a little warmer. If ice does get blown there, it is liable to melt. In contrast the albedo effect suggest the water got warmer because it’s not covered with ice. (In fact, both could act together. But I don’t have any idea how much warm water might flow into the arctic because I haven’t looked. So I don’t know orders of magnitude to help me exclude one theory relative to another. Others may know.)
Like you, my fit doesn’t include have regressors that are likely to anticipate things like winds being dispersive or compacting. I’ve thought of adding Arctic oscillation, but I haven’t done that yet. For next year’s prediction, I need to hunt down whether we have any direct measures of water temperatures that are easily available, but I didn’t find any.
I don’t really know. It seems plausible to me that variability in how much energy is captured matters. Cloudiness has to matter at least a little– and neither of our fits includes anything to try to capture that. But also, having watched snow and ice melt when it rains in March, I also know violent storms — which are often accompanied by clouds can melt snow and ice in a variety of ways. So, weather is going to matter a lot– especially for 2 week forecasts where the specific weather doesn’t average out.
When predicting extent– especially changes in the next two weeks– compacting and ocean transport must matter relative to other factors. There is nothing in either of our fits that tries to anticipate how much material might suddenly be blown out the Fram– and that could matter.
I think even if we both do this through next summer, we are going to have quite a bit of residual unpredictability.
Crandles–
On these:
You should be able to read off the uncertainty in the regression slope in EXCEL and use it pretty easily. Call it “se_m”. For a simple linear regression:
y = mx + B
the additional uncertainty due to your uncertainty in the slope is:
dY_m= se_m * (X- mean(X)) where “X” is the value of the regressand you are using to predict the future value of “Y”.
Add this dY_m to the sum of the square of your errors from the rms. (Notice that if the “X” value you are using to predict “Y” is far outside your data, you the error in your estimate for the mean trend matters a lot; if the X value is sits nice and tight inside your data, not so much.)
EXCEL is a PITA for getting the appropriate uncertainty in the constant, because you don’t really give a hoot about the uncertainty in B as shown above. But you can add in the correct value by finding multiplying the standard error for the residuals (which you already have) and multiply by sqrt(1 + 1/N) where N is the number of data points used for the fit. (Do this before adding in the uncertainty from lack of knowledge of the trend, m.)
I used a multiple regression for extent and area means ‘the math’ recognizes the fact that I include the equivalent of your tunable factor of 6.6. After that, my AICc for that fit accounts for the fact that I’ve found the best fit given both things.
What are your 10 for area? It’s interesting that 92 days gave the best predictor for averaging. I was tempted to hunt around for the best averaging day, but I figured I’d just stick with 7 because that corresponds to the averaging window for the quatloo bet. (Nothing to do with physics! ) But in reality, my fit ‘hunts’ around a little too. It now includes:
July NSDIC areas & extents, August NSDIC areas& extents, 7 day & extents areas and 1 day extents.
On Aug. 30, I didn’t have the Aug NSDIC values, so obviously, I didn’t use that to predict Sept. It happened the method didn’t “pick” July NSDIC values because the AICc values corresponded to a probability less than 1/20th the “best” predictor had. But the method did include those. If I hadn’t tossed the fit based on AICc, I’d get a slightly different result and slightly wider confidence intervals.
The fact is: I don’t really know how to account for the fact that I “hunted” a little for the averaging time that gave the fit with the smallest residuals and you “hunt” even more. I hope for my weighting by AICc to avoid dangers of overfitting, but I also wonder how including all sorts of semi-redundant areas might skew things!
The 10 for area is 10 million km^2. I used the function max(0,10-CT area).
I don’t think that 10 matters much as max(0,14-area) was better than regressing with 31 Aug area. I probably could have just used CT area * sine wave function to avoid having extra tunable parameter.
I was mainly doing X-area to concentrate on area inside Bering Strait. Also to make it measure of open ocean that I could multiply by (1+AO) to see if that could usefully approximate cloudlessness.
Having chosen max(0,10-CT Area) then the 92 days doesn’t matter much because area is usually over 10 million at 1 June so adding or subtracting a few days makes little difference for most years. I really didn’t bother tuning it – it was simply my first guess.
I suspect there is probably a better function than max(0,10-CT area) that gives more weight to potential energy capture for areas close to ice than for areas further away from ice.
.
Re “You noted the relationship between remaining area and albedo. But area could also reflect the size of the region with temperatures that are lower for other reasons– for example, the reason a particular region might be ice free is that warm water may have entered through ocean transport”
Well yes, but that wouldn’t explain why area is a better predictor than extent. I think it is through understanding this difference that I found a better predictor.
.
Re 92 day average. Tamino has indicated that using July average extent is better than using Aug average extent to predict falls. This would be expected if the sensible period to use is 3 or more months.
You seem to use 7 day averages or hunt a little. I use a 1 day average of extent and area to get the most up-to date extent information to calculate the change in extent and thereby minimise the period of change that needs to be predicted. That is a very short average while for the regressor I use a very long period – 92 days because the albedo effect might build up over that time and the numbers seem to support that.
Maybe simply adding June July and Aug average areas would work reasonably well without needing to sum multiplications for each day.
.
“I think even if we both do this through next summer, we are going to have quite a bit of residual unpredictability.”
Absolutely, yes, we would.
I get the same thing if I predict NSDIC September. On the other hand, I get an even better prediction of the ‘remaining loss’ if I use the most recent 7-day average JAXA extent. That’s the “loss v area” that survives above. So I’m not entirely sure we can be confident about teasing the physics out of the fact that we get a better fit predicting September using July rather than August. That’s not to say we shouldn’t try– but I’m just not sure that it’s possible to exclude alternate hypotheses.
Of of the difficulties is that with both area and extent, I can imagine be multiple reasons why area is a “better” predictor.
For example: we know in the short-term, extent has to become a good predictor. Tomorrows extent will be more correlated with area than todays.
But by the same token, the spread out ice that makes the difference between extent and area is compactable and expandable. So, based on geometry — as opposed to physics of actual melting, the area represents a sort of ‘base’ of ice that might spread out or compact. There is a possibility that if we just created a toy model of ping-pong balls that could compact or spread out, over some long time range a metric like “area” would better predict the future extent than the extent.
Mind you, with ice, the melt clearly matters. But there are geometric issues too. Also, there may be reasons I haven’t thunk up. So, while I wouldn’t say suggest that your notion of the albedo effect is wrong (in fact, I think it sounds plausible), I’d be cautious about considering it confirmed. It might be supported, or there may be other reasons for what we see in the data. (There might be reasons neither of us have thought of.)
That said: Mostly, I’m just looking at correlations and to some extent suspending judgement on mechanisms that cause the correlation. I’m not utterly suspending judgement– I ponder them a bit silently– but I’m cautious because unless some factor really pops out with a huge AICc coefficient, I’m not sure we can exclude other things that might also explain the data.
Thank you for post and replies.
Yes correlation does not prove causation. It is only a might be supported.