Dealing with Volcanic Eruptions when Testing Models.

The problem:
Testing whether IPCC simulations faithfully reproduce the climate trends exhibited by earths temperatures is complicated by stochastic variations (i.e. noise) overlying the climate (i.e. deterministic) response. These stochastic variations appear in both the simulations and observations and arise from the earth’s own internal variability and the stochastic external forcings. Because these exist, evaluation of the fidelity of simulations must be statistical in nature.

Strong eruptions of strato-volcanoes number among the strongest of stochastic forcings and elevate the amount of noise contained in temperature time series. However, unlike stochastic variations arising from the non-linear nature of the earth’s climate system itself, it is possible to observe when volcanic eruptions occur, estimate the anomalous radiative forcing due to the volcanic eruption and account for the deterministic portion of the response of the climate to these eruptions. One method climate modelers use is to run repeat cases of AOGCMs with differing initial conditions and averaging.

A graph showing the average of 55 runs from AOGCM’s forced using modelers estimates of 20th century forcings and the projection into the 21st century is shown below:

Figure 1: Multi-run average over 55 simulations.
Figure 1: Multi-run average over 55 simulations.

(Note, for blog purposes, the multi-run average includes some runs that did not simulate volcanic eruptions.)

The estimated effect of the eruption due to Pinatubo stands out clearly in this graph.

So, far, this is of course not a problem: Models attempt to simulate the effect of volcanic aerosols on the response of the climate. They appear to have some success.

However, we now turn to the issue of testing whether the model simulations correctly reproduce the long term trend in temperature. Suppose we take the approach of first determining the linear trend in the surface temperatures– and estimate for the uncertainty including the deviation due to the volcanic eruptions. We then determine the linear trend in surface temperatures– and estimate for the uncertainty including the deviation due to the volcanic eruptions. These using these two trends, we pools the uncertainties and test whether the two trends are similar.

In principle, this method sounds fine.

In this case, we have a problem. Because the assumptions underlying any trend analysis of this sort include this one: The deviations in repeat samples should be uncorrelated from each other. That is: the fact that all or most model runs share the deviation due to Pinatubo when it occurred and all hypothetical repeat samples of the earth’s surface trend would include Pinatubo eruptions violates this assumption.

I discussed this issue way back in October 2008, and explained that when the deviations from lienar behavior from repeat realizations of “weather” are correlated accross samples treated as independent, the effect is to make the statistical test insensitivity. That is: it will fail to reject models that are wrong too often.

As analysts, scientists or simply rational people, our goal should be both to make as few mistakes as possible. This means we wish both to reject models that are wrong and while accepting those that are right. Both errors have negative consquences. In frequentist statistics, standard practice is to select the rate at at which we reject models that are correct. Typically one might chose a type I error of α=5%. Once this is chosen, we should see the method that rejects models that are wrong at the highest possible rate. Unfortunately, if we use the method of of comparing trends oulined above, when we can trick ourselves into thinking our type I error is α=5%, when it actually is something lower. In the process, we also increase the type II error.

This is sub-optimal.

So, what to do? Well, the answer is actually easy: Analyze the difference between the observed surface temperatures in simulations. These series of differences is not expected to exhibit any shared response to Pinatubo because the shared response in each counter act each other. (This is, by the way, a very standard way to deal with this issue. )

However, as we have seen, a climate blog warrior elsewhere decided to do something more complicated: Apply a “correction” to the earth’s surface temperature based on a lagged linear regression on volcanic aerols. To understand why that method is inferior to the more straightforward, and mathematically trivial method of analysizing the difference between observations and simulations, we need to discuss the sort of relation we expect bewteen volcanic aerorols, forcing and the earth’s temperature. Let’s dive into that.

What sort of relationship do we expect bewteen volcanic aerosols loadings and forcing?
Stratospheric aerosols shade the surface and we expect aerosols to cool the earth. However, owing to the heat capacity of the earth’s climate system, the effect is not instantaneous. This notion is reflected somewhat by multiple linear regression that incorporate a time lag. But how well would this work?

The answer to that is, “It depends.” Specifically, it will depend on the way the earth’s climate system responds to externally applied forcing and on how the forcing is applied.

Let’s look at two cartoon examples for ways forcing are applied; both will use a single lump heat capacity for the earth with a specified time constant. The choice of an earth with one time constant for simplicity– not realism. What is important is that the earths’ climate does exhibit a time constant, and that time constant is not extremely short.

Example 1: If a we apply some sort of perfectly sinusouidal forcing, (and nothing else) we can create time varying temperature v. aerosol loadings curves that resemble the yellow (forcing) and green (temperature) tracings below:

Figure 1: Response to Sinusoidal Forcing
Figure 1: Response to Sinusoidal Forcing

Note the response of the earth’s temperature anomaly adopt the sinusoidal shape of of the forcing, but exhibits a lag. If we identify that lag, we find an almost linear relationship between the earth’s temperature and the applied forcing:

Figure 2: Correlation at optimum detectable lag.
Figure 2: Correlation at optimum detectable lag.

Note the correlation between temperature and forcing in this cartoon example is almost perfect. There is a detectible hysterisis which arises from the discretization into 20 bins in my Excel spreadsheet. (If there is little noise from other processes, any decent analysist could probably tease ‘noise’ out.)

This is the sort of qualitative behavior we expect for the annual cycle of the earth. In this case, using a lagged linear regression to correct for the forcing would work almost perfectly: the best correction I can obtain using my excel spreadsheet for the cartoon problem i shown in blue. The only residuals in the corrections would arise due to slight uncertainties due to discretization. (In my cartoon, this arises becasue I apply use 20 bins for the sin wave and can only get lags in even bin increments.)

Example 2: Now, let’s apply an irregular forcing. In this cartoon example, I will make a pseudo-volcano erupt for 2 time periods out of every 17 with the amount of ‘aerosol’ in unjected selected by the “rand()” function in excel. The aerosols present in at time “t” will drop out with some exponential rate of decase. The forcing will be proportional to this aerosol loading. Fiddling with parameters, I can rather easily concoct a ‘temperature’ & ‘forcing’ v. ‘time’ graphs that looks like this:

Figure 3: Temperature excursions due to irregular forcing.
Figure 2: Temperature excursions due to irregular forcing.

Note that even though the temperature in this cartoon example is absolutely, totally and completely the result of the forcing function– and some sort of strong causal relationship would be evident to anyone looking at the graph, the correlation between forcing and lagged temperature is poor for any an all lags. At the optimum temporal lag the correlation between temperature and forcing looks like this:

Figure 3: Best lagged correlation for Temperature v Forcing.
Figure 4: Best lagged correlation for Temperature v Forcing.

In this case, a correction based on a linear regression at an optimum lag will do a poor job accounting for the instantaneous effect of volcanic aerosols on the temperature anomaly. Compare the blue “corrected” temperature to the yellow original temperature in the graph above; you will see the correction does almost nothing.

Could we do better?
Yes. If we have a phenomenologically based correction that faithfully represented the response of the earth’s climate, we could correct the effect of “volcanoes” just as successfully as correcting the effect of periodic forcings like the annual cycle. What is even more interesting is that if one had a phenomenologically based correction for the perfectly sinusoidal forcing, that would also result in a “corrected temperature” that is also zero at all times.

So, when available phenomelologically based corrections are always preferable to corrections based on lagged linear regression.

We could try lots of cartoon examples. The general rule will be: To the extent that the forcing is regular and periodic, the relationship you will be able to find a fairly decent correlation bewteen forcing and lagged temperature: In this case both physics based and simple lagged fits will ‘correct’ for the effect of a parameter fairly well. In contrast, to the extent that the forcing is irregular in magnitude a-periodic or episodic, you will find the correlation between forcing and or temperature will tend to be poor. I suspect many readers will notice that volcanic erruptions occur at irregular intervals and at different levels of forcing. This suggests a correcting for volcanic forcin by applying a lagged correlation will be a poor choice.

Does this mean that an analysist can’t come up with a good way to deal with episodic forcings?
Of course an analysist who knows what they are doing can deal with episodic forcing– at least sometimes they can. In the case of comparing results of AOGCM runs to observations, they can do so easily!

Let’s consider what we claim to be trying to do. Suppose an analyst does wants to compare an observatioin of surface temperatures affected by a volcanic forcing and a simulation of the exact same time period. In particular, she would like to avoid ignoring the known treating the climate signal due Pinatubo in both series as uncorrelated across the two series. The reason is that if she treates these Pintubo dip in the observation and the exact same dip in the simulation as uncorrelated, that correlation across the two series will distort the analysis.

The classic solution to this problem is to recognize that the models have been driven by volcanic forcings, and the method of eliminating the common signal in the two series is to subtract one from the other and analyze the difference in the two series. Moreover, what is intersting is that the method of subtraction will work both when the applied forcing was sinusoidal and might have been corrected successfully using a linear correlation to lagged temperature and will work when the applied forcing is irregular — as volcanoes are.

So, if one wishes to devise a test that is senstivite and minimizes the rate of both false positive and false negatives in outcomes of hypothesis testing there is no reason not to analyze the difference between models and simulations. I can think of only a few reasons one would not rely on analyzing differences. These are a) they didn’t think of it, b) they want to devise a test that is insentive, but appears to compensate for effects like volcanic eruptions. In the second case, you might see someone applying a linear regression to lagged temperatures as a function of volcanic aerosols an using this to “correct” for volcanoes.

This gets us back to where we started: If we are trying to create a senstivite test to detect differences in the climate response and obsevations, we might just as well start out by, well, subtracting and comparing the differences! Once this is done, no further correction for response to volcanic aerosols is required. The “correction” for the response of the earth’s climate to volcanoes in implicit in the analysis method.

Is analyzing differences a fancy novel thing? Nope. It’s done all the time. Whenever we are interested in whether a population of paired items are similar or different, taking differences is the most sensitive way to test.

What do the differences between multi-model runs and observations look like?

Below I show the difference between a multi-model run average of 20th century runs extended into the 21st century using the A1B scenario and observations is shown below:

Figure 5: Difference between observations and multi-run mean.
Figure 5: Difference between observations and multi-run mean.

Note that the excursions due to volcanic eruptions vanish. After subtraction, what remains is the sum of 1) the difference between the multi-model mean climate response in the model runs and the earths climate response, 2) the stochastic variability of the earth’s climate system and 3) the residual stochastic variability in the models remaining after averagin gover the runs.
If model simulations correctly capture the earth’s response to applied forcings, then the first listed item will be identically zero, and should exhibit no trend. Note also: The best fit trend for the difference between Hadley observations and the multi-run mean is negative. That is, we could formulate the hypothesis that the trend in the graph shown above is zero, and test that hypothesis that the trend show in the graph is really zero.

On Friday, I revealed results of my test– but I am not explaining my analytical choices. I think I have now explained why I test the difference. In upcoming posts, I will explain how I deal with temporal autocorrelation in the residuals.

26 thoughts on “Dealing with Volcanic Eruptions when Testing Models.”

  1. One reason to avoid use the models is if you want to have an GCM independent result. For example you’d want to do that if you are arguing with people who have a strong distrust of GCMs.

    Lagged regression is the simplest but as you show in fig 4 there are problems. (Although fig4 probably exaggerates the problem because the eruptions are so closely spaced.) Yet another alternative would be to run the volcanic forcing through a very simple model (like the one you used to generate figure 3’s temperatures).

  2. Aslak–

    If I wanted a model independent estimate of the trend– yes, I would not use the models. But if all I want is a test of the difference between projections and observations, I would use the difference in the first step. There is no reason to defer taking the difference to the final step of a t-test.

    Figure 4 is a cartoon to show one of the two extremes to whether or not a lagged regression works.

    Yes– we we could run the forcing through a simple model like the one used to generate the cartoons. That would be better than a lagged linear regression. The only issue is: Why do it if you are testing GCM’s which supposedly already capture the physics?

    To the extent that any method fails to capture the effects of volcanoes you are left with excess noise (which will be autocorrelated). Subtracting the GCM result means you pretty much use one method insteady of having a problem arising from the mismatch between what method 1 (i.e. the gcm) says the volcanoes do and method 2 (i.e. the multiple regression and/or the simple model.)

    But a simple model would be better than lagged regression. Once could use one of the tuned simple models discussed in the IPCC documents too.

  3. Hi Lucia

    I am a lurker and occasional poster. Sorry for the totally OT comment, I have a stat question for you.

    Can you let me know if there is a formula to compute the standard error (and hence the t-stat) of a variable which is a fraction of two normally distributed variables.

    Eg: if m = x/y, and x and y are normally distributed variables (coefficients of control variables from a regression) is there a way to calculate the t-statistic of m using the standard errors, covariances and coefficients of x and y.

    Thanks
    Guy

  4. gdfernan–
    Of hand, I don’t know the best way to do that. Can you look at the ratio to check it’s distribution? It may be that x and y aren’t really normal, and we don’t know what the distribution of the ratio is.

    Also, what hypothesis do you really want to test. It may be possible to do a non-parameteric test.

  5. A while back I also attempted to remove the volcanic signal from the the data. I think it would work if you isolated a single event, but I never managed to get that far. At any rate, it is eerie how much your graph of simulated volcanoes looks like any real graph of AOD versus Temperature, except backwards, since positive AOD is negative forcing.

    The other problem with Volcanoes is the “wall” at 0. If anything other than the volcano is effecting temps when there is no aerosol loading, you will get a huge spread of values that relate in no way to the actual signal.

  6. I posted the following message (after some earlier posts) on http://tamino.wordpress.com/2009/12/31/exogenous-factors/#comment-38367 where the discussion was about “exogenous factors” like volcanic eruptions.

    “In her latest post, Lucia explains in detail why it is more accurate to treat stochastic forcings, like volcanic eruptions, by analyzing the difference between the surface temperatures in simulations and model runs.

    See http://rankexploits.com/musings/2010/dealing-with-volcanic-eruptions-when-testing-models/ “

    After an hour, Tamino posted this:
    “parallel // January 11, 2010 at 7:15 pm | Reply
    [edit]
    [Response: No more advertising for Lucia’s garbage. Take it elsewhere.]”

    This confirms that no dissent is allowed. Apparently the Team has not learned that such behavior does little to win hearts & minds.

  7. It seems to me that In the figure “Temperature excursions due to irregular forcing”, the green line (temperature) plummets before the yellow one (forcing). Is that right?

    Also, there appears to be some confusion with figure numbering.

  8. [Response: No more advertising for Lucia’s garbage. Take it elsewhere.]”
    This confirms that no dissent is allowed. Apparently the Team has not learned that such behavior does little to win hearts & minds.

    Grant isn’t on the team – maybe an occasional water boy, but that’s about it. He’s not all that good at math either, IMO.

  9. Lucia – feel free to snip the above if that was overly rude/blunt. My apologies if that’s the case.

  10. Lucia,
    “Analyze the difference between the surface temperatures in simulations and model runs.”

    Do you mean “Analyze the difference between the surface temperatures and the model runs”?

  11. The ratio that gdfernan asks about is, if I am reading the post correctly, a Cauchy. Assuming the x and y are unit normal, then the ratio is Cauchy with location 0 and scale 1. Its mean, variance and higher moments are not defined. Wikipedia has a very good write up on both the Cauchy and Ratio distributions.

  12. parallel–
    The irony of Tamino’s insisting you can’t discuss what I posted is that the main post claims to be a refutation of a very brief comment of mine

  13. “The irony of Tamino’s insisting you can’t discuss what I posted is that the main post claims to be a refutation of a very brief comment of mine.”

    Touche!

    Of course it’s not that you can’t discuss anyone else’s work on “Tamino”, it’s just that you can’t say anything critical of Grant Foster’s views. Ludicrous but typical of all the AGW websites afaik. Are there any exceptions?

  14. I no longer frequent blogs of ill-repute, so I can only comment on the comments made here about what was said on the execrable Tamino. The reported reaction could be straight out of the CRU-mails…dismiss any dissent, disrespect the dissenter with insults and ad homs, and most importantly never, ever, ever engage in a constructive critical discussion of the dissenter’s thesis or argument. Maybe Michael Mann will pat him on the head and throw him a scrap from the big table.

  15. Dear Lucia:

    Thank you for hosting your stimulating and useful blog and for encouraging comments on your articles in it. I prize your efforts toward scientific truth on the issue of man-made global warming.

    For the future, I advise caution on use of the term “forcing.” Though usage of this term is common in the literature of climatology, this usage implies the dubious philsophical position called “mechanistic reductionism.” Under this position, the “forcing” is the cause of some effect.

    When scientific research is properly constituted, the cause of an effect or whether, in fact, there is a cause are both doubtful. To imply the existence of a cause through use of the word “forcing” is to presume information not possessed by the writer.

    Terry Oldberg

  16. gdfernan (Comment#29768) January 11th, 2010 at 1:22 pm
    Hi Lucia
    I am a lurker and occasional poster. Sorry for the totally OT comment, I have a stat question for you.
    Can you let me know if there is a formula to compute the standard error (and hence the t-stat) of a variable which is a fraction of two normally distributed variables.
    Eg: if m = x/y, and x and y are normally distributed variables (coefficients of control variables from a regression) is there a way to calculate the t-statistic of m using the standard errors, covariances and coefficients of x and y.
    Thanks
    Guy

    Yes: This is one of my favorite problems

    DOH. Now I forgot how to do it. romanM should be able to help.

    Arrg. he and I talked about it on CA.. crap cant remember where.

    it invloves a taylor series expansion. you also need to estimate the correlation between x and y..

    for X/Y if x and Y are negatively correlated your Error starts
    to blow up and if positively correlated it shrinks..

    So think if X gets big when Y gets small you got problems.

    Solution is in this book ( at least it was in the earlier addition)

    http://www.bizrate.com/mathematics-books/mathematical-statistics-and-data-analysis-by-john-a.-rice-%28package-duxbury-pr%29–pid346255564/

    or just go to RomanM’s site and ask him.

  17. bender, Tammy is in the “ignore them” mode of climate science denialism.

    Expect some intermediaries to come in and argu his case poorly

    What it is with these guys and showing up to defend their work.

    Lucia is a tiger

  18. Dear Lucia:

    You argue that the difference should be analyzed “…between the surface temperatures in simulations and model runs.” As this difference is nil, I’m led to believe you meant something different. My guess is that you meant we should analyze the difference between the simulated temperatures and the observed temperatures.

    Analysis of this difference is the norm in IPCC climatology. However, this norm has at least one weakness. The weakness is that the normative behavior has not the potential for falsification of the model. Falsifability is, however, the mark of a model that is “scientific” in character.

    Terry Oldberg

  19. Terry–
    When driving climate models, volcanic aerosols are treated as “forcing”. They are causal in that sense.

    I happen to believe volcanic aerosols are also causal in climate.

    But even if you think they aren’t and do caution against the use of forcing, what generic term would you suggest for those features which are thought to cause changes in the mean climate by affecting the level of radiation absorbed by the earth? No one can stop using one word unless another is suggested. The new word has to have some advantage over the one you wish to replace.

  20. Terry,
    “The weakness is that the normative behavior has not the potential for falsification of the model.”

    I really do not understand. If the predicted trend in temperatures from a model is different from the measured trend, why can’t this difference falsify a model at a stated level of confidence?

  21. Jonathon, yes, I enjoy both the Pielkes’ blogs. But I wouldn’t count either of them as AGW blogs, any more or less than Lucia’s – all of which are rational, scientific, open and intelligent.

    I think most of us acknowledge the obvious ways in which humans are affecting the climate. We simply reject simplistic, emotive global moral panics about the matter when the science is obviously still very far from settled.

Comments are closed.