Multi-Model Mean Trend for Models Forced with Volcanic Eruptions: Mega Reject at 95%.
Friday’s post compared the multi-model mean trend based on simulations of AOGCM runs to the observed temperatures. In comments, I mentioned that I had also filtered based on models based on volcanic eruptions. The graph below shows that if we compare the multi-model mean trend based on models driven by the more realistic 20th century forcings– i.e. the models that that included volcanic aerosols, and pretend to have selected the start year at random, but then do it for every year between 1960 and 1995:
What does this graph say?
I know these d* graphs are cryptic. The variable d* is the difference between the multi-model mean and the observation (aka ‘error’), normalized by the ‘standard error’. We would expect d* to fall between -1 and 1 in roughly 67% of repeated trials.
I’ve placed traces to indicate the magnitude of d* that corresponds to the 95% and 90% confidence intervals. If you examine the graph, you’ll see the multi-model mean based on models driven by volcanic forcing is rejected at the 90% confidence level for every year between 1960 and 1998. We get some fail to rejects for shorter trends — but it’s well known that type II error (i.e. failing to reject models that are wrong) is common when one short trends. Needless to say, this multi-model mean trend is rejected for 2001.
What if we pick 95% as our confidence intervals?
Well… then we don’t reject this multi-model mean in 1974, or 1996 and for a few years after. So, if you feel bound and determined to save the reputation of the models, you should think up reasons why 1974 or 1996 are the “correct” years for testing models over the “longer term”, while simultaneously claiming you picked these entirely at random.
At the 95% confidence interval, the multi-model mean using volcanic cases only are rejected if we happen to use 2001 to compute the initial trend. What if 2000 is the right start year? We reject the multi-model mean based on cases with volcanic forcing.
So, to those who think these “rejections” are due to selecting a short period for analysis: Nope! These rejections are due to the observed earth temperature veering away from the projected values.
Written by lucia.Comments Closed: If you would like them re-opened, Contact Lucia


Comments
Boris (Comment#11897) March 16th, 2009 at 6:59 pm
Didn’t Santer note that the d* test is “too liberal” and
Bill Illis (Comment#11898) March 16th, 2009 at 7:19 pm
Its not surprising to me that the models containing volcanic forcing perform worse than the models that exclude them.
Including a large negative forcing from volcanoes (which can extend out over up to 15 years as in Krakatoa) allows the models to build in larger GHG forcing and still keep the hindcasts reasonably close.
When the negative volcanic forcing starts to wear off and gets closer to Zero again, the models are way off the actual temperature trends of the time. Given there is long time periods between the big volcanoes, they can be off for long periods of time.
This is the problem GISS Model E now faces since it has been nearly 18 years since Pinatubo and their large GHG forcing cannot be offset by ever increasing negative forcings anymore. Hence a small downturn in temps from a few La Ninas and a declining AMO has put the Model far off track now.
And the estimated volcanic impact built into Model E (and the other models I assume) is far higher than the actual impact volcanoes have on surface temperatures so this again, allows them to build in even bigger GHG impacts and still stay on track for periods of time.
Since the volcano impacts are based on actual measurements of optical density, I’m assuming the optical density data is contaminated by volcanoes’ impact on the stratosphere where it is clear volcanoes have a very large impact but they just don’t on surface temperatures.
lucia (Comment#11902) March 16th, 2009 at 9:08 pm
Boris–
a) Yes. Slightly.
b) His analysis assumes the “underlying trend” is linear, all residuals from linear are “noise” and are AR1.
c) If the “underlying trend” (i.e. the trend you get after averaging over many samples) is sufficiently non-linear (as can occur when volcanos erupt) but all residuals from linear are treated as “noise” (as in Saners method) the result in a test that is “too strict” and
d) It’s possible to show that for surface temperatures since 1979, the non-linearities associated with volcanic eruptions overcome the issue Santer as discussing.
e) The test become “too strict”.
It is impossible to quantify the exact magnitude of the “strictness” or “liberality” without precisely without knowing the true shape of the “underlying trend. However, it can be estimated.
bender (Comment#11904) March 16th, 2009 at 9:12 pm
Boris, climate is behaving in a way we don’t understand. Our model for the internal “noise” is incorrect. Chaos cascades across all time scales. Time to take off your blinders. You’ve been approximated to.
Scooter (Comment#11905) March 16th, 2009 at 9:20 pm
We have to proofread? OK…
First paragraph: “modesl”, and ends with a partial sentence “every year between 1960:”
Second paragraph: “We would d* to fall” is “expect” missing, or is “d*” a word?
Third paragraph: “that d* is falls outside ±1 constantly.”
vg (Comment#11908) March 17th, 2009 at 2:44 am
VG (00:38:15) : Your comment is awaiting moderation
Has anybody noticed (because I have… for the PAST TWO YEARS, every day). This picture has not changed nearly for every day!
http://wxmaps.org/pix/temp8.html.
The vast part of South America has been below anomaly nearly every day for the past two years check it yourselves…

also I herein defend my previous statement if this ain’t cooling I don’t know what it is!
http://www.osdpd.noaa.gov/PSB/.....6.2009.gif
BTW I ain’t biased one way or the other just guided by data as I is just an ol’ boring scientist LOL…
vg (Comment#11909) March 17th, 2009 at 2:49 am
Lucia: BTW just a comment on above graph. Previous were too small (on click) now they are way too big! (on firefox anyway)
Scooter (Comment#11922) March 17th, 2009 at 11:53 am
What does this graph say?
It says that this model is consistently running at temperatures higher than most of the error range of this temperature record. The model is centered around the 95% confidence interval of the record rather than around this record’s actual values.
It also shows that the model lost track of reality around 1998.
SteveF (Comment#11924) March 17th, 2009 at 11:58 am
Lucia:
So what does this type of analysis say about the probability that the models are truly flawed? Is it the number of years outside the 95% confidence intervals divided by the total number of years tested (eg. 43/48 = 89%)? The deviations are all to the high side, rather than normally distributed; does this suggest that the models are almost for certain not correct?
lucia (Comment#11927) March 17th, 2009 at 12:06 pm
SteveF– This isn’t an analysis yet. It’s just a graph. I always like to look at the data first. I’ll be adding this to some bar-and-whiskers plots later. Then I’ll say what it means!
Alexander Harvey (Comment#11950) March 17th, 2009 at 12:52 pm
The phenomenon of the model mean being a better model always worries me.
It would be difficult for me to give a meaning to a mean of the runs from different models, being as it were an average of oranges, satsumas, tangerines, etc.
Here is a silly example: If model (a) for a new aeroplane predicted that it would tend to perform a nose up stall and model (b) predicted a tuck under dive. Would you build it because the average was straight and level flight.
I think that maybe we just magnifying the “restraint” of hindcast by averaging the runs, That is we are getting the 20th Century record because it is a restraint common to every model. Eliminate the differences between the models by averaging and the 20th Century is almost bound to emerge.
This would not be so for multiple runs of a single model. The weather noise would be reduced but unless parameters were changed but in ways that were still well constrained to the 20th century the tendency to converge towards the temperature record would not be there in the same way. Of course if the parameters are changed each run essentially comes from a different model.
I may either be talking rubbish but it all seems very odd to me that the averages of oranges with other fruit is more meaningful when common sense says it should be almost meaningless.
Alex
Michael Hauber (Comment#12023) March 17th, 2009 at 11:02 pm
I’m not sure I’m interpreting the claim here correctly; I think you are saying that there appears to be a statistically significant difference between modelled results and actual results.
If this is the correct interpretation then I think the next question is ‘why did we get this difference’.
The answer cannot be anything that would be classified as random noise according to whatever assumptions you are using to calculate your confidence intervals.
So one possible answer is ‘Co2 sensitivity is overestimated’.
Wouldn’t another possible answer be that ‘Aerosol cooling has been underestimated’. IPCC quote quite large error bars in their estimates of Aerosol forcings, and I would assume that pollution would have been increasing aerosols in a close enough to linear fashion that this factor would not be excluded under your noise assumptions?
lucia (Comment#12036) March 18th, 2009 at 6:00 am
Michael– I don’t know if the aerosol estimates in the SRES a1b were met, exceeded or anything. But if they weren’t that’s could be the cause.
There are other candidates: The models get the variability wrong. (That’s still a problem with the models.) The mixing in the model oceans are wrong. Taking out the small variability due to the solar cycle made the difference. (This still may not save the models from criticism because their controls don’t show much solar variability either.)
Alexander Harvey (Comment#12174) March 19th, 2009 at 2:07 pm
Michael,
You ask:
“Wouldn’t another possible answer be that ‘Aerosol cooling has been underestimated’. IPCC quote quite large error bars in their estimates of Aerosol forcings, and I would assume that pollution would have been increasing aerosols in a close enough to linear fashion that this factor would not be excluded under your noise assumptions?”
Good question, I am not sure that anyone has a firm grip on the amount of reflective aerosols in the historic record. So I am puzzled by what assumptions can have been made by the various modelling teams.
According to Stern (an economist I believe) SO2 emissions have trended down since 1990 and as it is a short lived polutant the masking effect should have started to turn down almost simultaneously. That leaves a problem as it seems to imply that global warming should have moved into a higher gear. This seems to be true if you look at 1990-2000 but false if you look at 2000-2009.
If anyone knows of a more accurate estimate of SO2 than Stern’s please let us know. FWI the GISS forcings seem a bit vague and unconvincing to me.
Alex
lucia (Comment#12176) March 19th, 2009 at 2:23 pm
Alex– Who knows what’s happening with earosols in China recently? To some degree, people can only synthesis data sometime after it’s all collected and compiled. Certainly, in 2001, the IPCC had to predict aerosols, and they could have guess wrong about development in Asia.
Simon Evans (Comment#12182) March 19th, 2009 at 3:20 pm
Alex,
I posted this on another thread:
“Science Daily reports here on a new data base of aerosol measurements (published in ‘Science’)
…the team notes, that their finding of a steady increase in aerosols in recent decades, also suggests an increase in sulfate aerosols. This differs from studies recently cited by the Intergovernmental Panel Climate Change showing global emissions of sulfate aerosol decreased between 1980 and 2000. ”
http://www.sciencedaily.com/re.....140850.htm
Ryan O (Comment#12184) March 19th, 2009 at 3:51 pm
Simon, you have to read that one carefully. The IPCC showed a decline of sulfate aerosols. The article says nothing about how close the overall aerosols are when compared to IPCC – it simply notes that one category of aerosols – sulfates – appears to be higher.
.
With that being said, I do think the total number is actually noticeably higher than the IPCC predictions . . . but the source may not be anthropogenic. The solar wind and magnetosphere have been behaving quite oddly as cycle 23 wound down, resulting in the solar wind not sweeping out space dust as efficiently. Space dust is actually a considerable chunk of total atmospheric aerosols (~ 100 tons annually). I’ll try to dig up a few references because they’re interesting.
Simon Evans (Comment#12188) March 19th, 2009 at 4:31 pm
Ryan, I’ve not read the paper yet, but the abstract is here -
http://www.sciencemag.org/cgi/...../5920/1468
That seems to be indicating an assessment of total impact in clear sky conditions – am I missing something? Aerosols affect cloud formation as well, of course, so that’s not the whiole story anyway.
I think we’d agree that there’s considerable uncertainty remaining. If the models have aerosols significantly wrong then it probably follows that ocean response is wrong (otherwise they wouldn’t ‘fit’) and all that follows……
Ryan O (Comment#12198) March 19th, 2009 at 5:53 pm
No, I don’t think you’re necessarily missing anything . . . the paper very well might show that aerosol trends don’t match the studies cited by the IPCC (I haven’t read the paper either). But the press release about it doesn’t say that. All the press release says is that the amount of sulfate aerosols is higher than the study cited by the IPCC and does not mention any other type of aerosol – meaning that the total aerosol content in the atmosphere may not be significantly different.
.
By the way, I looked all through WG1 Ch. 2 and 3 for a detailed graph of aerosols but couldn’t find anything. Do you know if the IPCC actually put together a specific projection?
Simon Evans (Comment#12202) March 19th, 2009 at 6:20 pm
In terms of modeled projections,there are a couple of paragraphs in Chapter 8 (section 8.2.5), which don’t say too much (some papers cited which I’ve not looked at). More to do with modelling the efficacy of a given forcing rather than projecting trends, I think. The GISS model seems to assume tropospheric aerosols as constant from 1990, as here -
http://data.giss.nasa.gov/modelforce/
- does that project into the future, against rising GHGs? Hmm, not sure.
Alexander Harvey (Comment#12213) March 19th, 2009 at 7:15 pm
Ryan O,
You wrote:
“Space dust is actually a considerable chunk of total atmospheric aerosols (~ 100 tons annually).”
I think there is something wrong here. As I understand it the release of sulphur is around 50 Tgs = 50,000,000 tons annually.
Alex
Alexander Harvey (Comment#12217) March 19th, 2009 at 7:30 pm
Simon,
From the article,
“According to the researchers, the visibility data were compared to available satellite data (2000-2007), and found to be comparable as an indicator of aerosol concentration in the air. Thus, they conclude, the visibility data provide a valid source from which scientists can study correlations between air pollution and climate change.”
I can only wonder how good a proxy this is?
Around here, the UK, visibility is a factor of moisture, (fog, mist, rain, snow, etc.). It used to be sulphur, but that was a long time ago. I am not saying that they are wrong but I do wonder what “found to be comparable as an indicator of aerosol concentration in the air” means if anything.
Any idea where we can find there data?
Alex
Alexander Harvey (Comment#12219) March 19th, 2009 at 7:49 pm
According to the GISS forcings, reflective aerosols have cancelled out at least 50% of WM-GHG forcings for the majority of year since 1945. Only since 1978 have they been below 50% (in GISS world).
Where did these data come from? As I understand it the models need gridded data to feed their data appetite. Where does that come from? As far as I am aware these sort of data simply do not exist. If someone knows better please supply a link.
Alex
Alexander Harvey (Comment#12221) March 19th, 2009 at 8:00 pm
Lucia,
What is happenning in China recently is not the problem unless you mean since 1990 as recently. I have been looking for gas concentration data for some time now and with the exception of CO2 at a handful of locations worldwide, the Stern SO2 data, and some data on CH4 the cupboard is pretty bare. If the people that are running the models have this data I wish they would tell, if not what are they feeding into their models?
If you look at the GISS forcings 1880-2003 there are a lot of straight line segments indicating a lack of detailed knowledge. These data are crucial and if the models get the 20th century right without any clear idea of the values it is simply a miracle and “heavens be praised”.
Alex
Alex
Simon Evans (Comment#12234) March 20th, 2009 at 4:52 am
Ryan O (Comment#12198)
I was being a bit dense last night (UK time). The relevant graph and discussion is in the Third Assessment Report (the AR4 used the same scenarios, of course, but didn’t detail them again). Here’s the graph -
http://www.grida.no/climate/ip.....ig5-13.gif
and discussion is in Chapter 5 (projections in 5.5) -
http://www.grida.no/publicatio...../index.htm
Simon Evans (Comment#12235) March 20th, 2009 at 5:07 am
Alexander Harvey (Comment#12219)
Alex,
Not sure on the data source for the paper I linked to. I gather we should get better tropospheric aerosol measurements going forward from this year with the launch of the Glory satellite.
If you’re interested in digging further, this 2007 Hansen paper gives a description of GISS modelE which may partly answer -
http://pubs.giss.nasa.gov/abst.....tal_1.html
Page 2290 describes aerosol inputs, and comments
Wow!
Alexander Harvey (Comment#12239) March 20th, 2009 at 6:22 am
Simon, Wow indeed. I expect that paper is pretty authoritive.
For the benefit of others, it continues:
“Therefore the smallest and largest forcings within the range of uncertainty differ by more than a factor of three, primarily because of the absence of accurate measurements of aerosol direct and indirect forcings.”
The modellers have to put real figures into their models and I simply wonder what figures they use. It would seemto be something that ought to be written on the bottle so we know what we are being asked to swallow.
It is well known that the GISS figures have an interesting property in that if you form orthogonal sum and differeence forcings between (reflective aerosols + aerosol indirect effects) and the sum of all other forcings you get an increasing forcing (the sum) and a forcing that dips after 1950 and recovers around 1990 (the difference). These allow for the post WWII temperature drop to be accounted for to some degree. Unfortunately performing regressions using such difference data is notoriously risky.
Alex
Alexander Harvey (Comment#12246) March 20th, 2009 at 8:11 am
Lucia,
I just googled for: models reflective aerosols record
Looking for an authoritive source and the first link in the list was “This Thread”.
It is a funny old world.
Alex
lucia (Comment#12247) March 20th, 2009 at 8:16 am
Alex–
Wow! I currently beat NASA on this! That’s google for you.
Simon Evans (Comment#12249) March 20th, 2009 at 8:35 am
Lol! I guess we just have to write ‘aerosols’ a lot [aerosols aerosols aerosols aerosols...] and we become authoritative. Actually, Lucia wrote ‘earosols’ the other day, so we should be attracting the keyboard-challenged and possibly some people with wax problems as well.
lucia (Comment#12251) March 20th, 2009 at 8:40 am
Simon,
Oddly enough, typing errors bring traffic. Google mostly goes by inbound links to the individual page, but also sees how often the word goes but to some extent, considers the site as a whole.
Alex used a 4 term search. There’s a good chance my page has more links from pages with all four words than the NASA page.
My next goal: Dominate the 3 term searches. . .
Simon Evans (Comment#12252) March 20th, 2009 at 8:49 am
Lucia,
If we mistype aerosols enough times we’ll end up with something rude, which could bring a lot more traffic
bender (Comment#12254) March 20th, 2009 at 9:08 am
You’re acting like a pair of arseoles.
Ryan O (Comment#12267) March 20th, 2009 at 10:40 am
Alex,
.
I misspoke. The actual number is about 100 tons per day – or ~ 35,000 tons annually. Still peanuts when compared to sulphur releases, but it’s the point of entry that matters. Most anthropogenic emissions start at ground level, where rain, mist, and other precipitation can settle them out of the atmosphere – limiting their effectiveness at cooling. Only a percentage ever reach the stratosphere. Stratospheric aerosols will have a much greater effect than lower tropospheric aerosols. Since the space dust starts in the stratosphere, it has a greater effect per unit mass.
.
Here’s a couple of references. The Farley & Muller ones actually go so far as to attribute past ice ages to the aerosol effect from dust. I personally don’t have enough knowledge to evaluate the plausibility, though I know the idea of space dust greatly affecting the climate was (and maybe still is) considered a bit nutty. Regardless, it does have some effect; albeit perhaps much less than what’s claimed by Farley and Muller.
.
http://www.sciencemag.org/cgi/.....0/5365/874
.
http://www.nature.com/nature/j.....6153a0.pdf
.
http://www.nature.com/nature/j.....8600a0.pdf
.
http://www.nature.com/nature/j.....7107b0.pdf
.
http://www.sciencemag.org/cgi/.....7/5323/215
.
http://www.sciencedirect.com/s.....7c6142039d
.
http://md1.csa.com/partners/vi.....cookie=yes
Alexander Harvey (Comment#12283) March 20th, 2009 at 1:26 pm
Ryan O,
That is a much better figure and there is the question of persistence. Reflective Aerosols are normally considered to be washed out with a time constant no longer than a few days (a week or less) so the amount in the atmosphere from SO2 is likely to be of the order of == 1 yr so they are in the ballpark as far as concentrations go. Thanks for the information.
Alex
Ryan O (Comment#12285) March 20th, 2009 at 1:44 pm
Alex, sorry about the confusion. I tend to make many such errors of fact when I blurt out something without first verifying that my memory is correct.
Alexander Harvey (Comment#12297) March 20th, 2009 at 6:08 pm
Ryan O,
It is not a problem, I have found out things I would possibly never have uncovered without your post.
Alex
Bill Illis (Comment#12302) March 20th, 2009 at 7:01 pm
Aerosols should be reducing temperatures according to the theory (and common-sense) and according to the optical depth measurements used to calibrate it …
… but temperatures haven’t decreased where the theory and the data indicates it should be occuring.
So, nice theory.
Natural white clouds outweigh any human-produced brown clouds by a factor of (I don’t know but it is a big number).
Alexander Harvey (Comment#12388) March 21st, 2009 at 9:46 pm
Does anyone know where I can find the SRES A1B emission profiles in tabular form?
I have found a chart here:
http://www.cccma.ec.gc.ca/data.....cing.shtml
That shows SO2 trending sharply upward post 2000. This would of course tend to lower the temperatures. If we are not on that trajectory, which I doubt we are, then the SRES A1B scenario is not a good model for 2000-2009. If anything the “model” world ought to be getting warmer faster than the A1B model runs as we know that nothing significant has happened to the atmospheric CO2 trend.
Alex
Bill Illis (Comment#12403) March 22nd, 2009 at 12:01 pm
Alexander Harvey,
Here are the concentration levels (as opposed to emissions).
http://data.giss.nasa.gov/mode.....CC.A1B.txt
You can get a little more detail on each individual GHG and different scenarios here.
http://data.giss.nasa.gov/modelforce/ghgases/
CO2 is actually slightly behind the AIB profile as is N20 and Methane is well behind.
The different aerosol levels including Sulfates can be obtained here.
http://data.giss.nasa.gov/modelforce/trop.aer/
http://data.giss.nasa.gov/modelforce/strataer/
Alexander Harvey (Comment#12407) March 22nd, 2009 at 2:41 pm
Bill,
Thanks for the information. I am looking at the links now.
Alex
Simulated and Observed 8 Year Trends. | The Blackboard (Pingback#12460) March 24th, 2009 at 10:42 am
[...] March, 2009 (10:42) | Data Comparisons Written by: lucia After I posted the analysis comparing simulated trends with varying start years all ending now, someone suggested I perform the same analysis, but examine all 8 year trends. That is: Look at the [...]
With All Due Respect: Ain’t Gavin a Hoot? | The Blackboard (Pingback#12502) March 25th, 2009 at 1:41 pm
[...] if Gavin wants to criticize the result of either the Michaels/Knappenbeger method of analysis or my less prominent one, that’s fine. However, it’s obvious neither result was obtained by cherry picking the [...]
Why ‘failure to reject’ doesn’t mean much: Type 2 error. | The Blackboard (Pingback#12612) March 28th, 2009 at 2:26 pm
[...] In case you are wondering, this does have implications vis-a-vis Gavin’s critique Pat Michael’s testimony to Congress. But I will not be discussing that in detail. Instead, I will limit myself to discussing the statistical power of the tests I showed in a previous blog post. [...]