
Screening fallacy:
If you sieve for hockeysticks
that’s just what you’ll get.
As many know, Climate Audit discussions identified a problem with the way Gergis et al sifted their data for hockey stick. Once again, application of mathemagical screening that can make hockey sticks out of trendless red noise results in a report that a hockey stick emerges from proxies. I described the screening fallacy to Josh, sent him a link to an image of a shaker table and Josh did the rest. Voila!
Does anyone want a mug? Josh sent me a high resolution version of the image!
It’s almost like that’s all Climate Science produces is hockey stick-shaped drawerings.
Ahem. 😉
Andrew
The interesting (well, at least to me!) things about this episode are:
1. Nobody ‘close’ to the paper could see that there were problems.
2. No reviewer saw the problems with the paper’s methods.
.
IMO, it is classic case of confirmation bias at work, and sadly characteristic of main stream climate science. May the ghost of Richard (‘bend over backward to show why you might be wrong’) Feynman haunt them for eternity.
.
So what will it be? Retraction? Modification? Arm wave? My jaded experience says… a violent (and endless) arm wave is by far most likely. Expect Real Climate to enter the breach, guns ablaze, with a studied side commentary on big energy funding of ‘deniers’.
Lucia –
Any idea of the cost of shipping a mug to Piccadilly Circus, London W1?
Anteros– I think the thing to do is get someone in the UK to offer from a UK site and I offer from the US site. That saves on shipping– but we can see!
Always hockey sticks!
At least the clever folks who faked crop circles in the UK moved up from simple shapes to breathtaking representations of fractal graphics, including the Mandelbrot set!
And they were nice and honest enough to eventually admit to what they’d done.
Some climate scientists are soooo unimaginative….
Yes, you can make hockey sticks out of trendless red noise. They look like ___/. The selection makes the / and the trendless noise makes the ___.
But scientists make HS’s that look like ~~~/. Selection makes the /. That’s the part we knew about from thermometers, and the proxies are selected (and scaled) to make sure they can get it right. But the information is in the ~~~. That wasn’t selected (you have no reference information).
Yes, you can make hockey sticks out of trendless red noise. They look like ~~~/. The selection makes the / and the trendless noise makes the ~~~.
This comment by Jim Boulden at RealClimate under their (Fresh Hockey Sticks) post underpins that cartoon nicely.
“[Response:At the risk of having this statement completely misunderstood and mangled by the usual suspects…if you only had one tree out of 10K that responded well to temperature, and you found and cored that one tree, you would have legitimate evidence of a temperature signal. Fortunately, the situation is nowhere remotely so extreme as that, and that’s because, in fact, that many trees respond in this way, and therefore you only need some couple dozen or similar to get a signal that emerges strongly from the noise at any given location. And why do many trees respond this way? Because, lo and behold….temperature is a fundamental determinant of tree radial growth in general, i.e. a fundamental tenet of tree biology.–Jim]
#97514
“Yes, you can make hockey sticks out of trendless red noise. They look like ~~~/. “
No, Lucia did it. It’s the blue curve. Dead flat, pre-training, as red noise should return. That’s the info in that curve. And OK, the / is wavy, because Hadcrut isn’t really a /.
It’s dead flat pre-training in the limit N -> infinity. And 27 < infinity.
Pre-GISS: ~~~~/\/
/
Post-GISS: ~~~/
Well, that feeble attempt at humour failed miserably. Blinking left-justified comment editor. 🙂
27 is indeed less than infinity. And the number of proxies used in the early period was far smaller than 27.
Lucia’s red noise was also <∞. She selected about 12 out of 154 and averaged.
Here’s Lucia’s summary of the blue curve:
“Because the “proxies†really are not sensitive to temperature, you will notice there is no correspondence between the blue “proxy reconstruction†and the yellow Hadley data prior to 1960. I could do this exercise a bajillion times and I’ll always get the same result.
…
Also notice that when I do this, the “blue proxie reconstruction†prior to 1960 is quite smooth. In fact, because the proxies are not sensitive, the past history prior to the “calibration†period looks unchanging.”
@Nick Stokes (Comment #97513)
June 13th, 2012 at 12:19 am
I think you must be wrong, Nick, because Josh drew a cartoon.
Nick>
Do go read about fractals, there’s a good kid. This thread has become the best demonstration so far that you simply don’t understand the simplest parts of statistics, although you may be au fait with cargo-culting some more advanced techniques.
Let’s see if this helps you: if I flip an unbiased coin a hundred times, and it comes up heads each time, what are the odds it’ll come up heads again? Hint: the odds are independent of previous outcomes.
Anteros, I can put on a mug here in the UK via http://www.cafepress.co.uk/cartoonsbyjosh
NS has repeatedly pointed out at CA that there is no option but to preselect, because until you have a sample with a good correlation to temps in your training period, you cannot scale the tree rings against temperature.
So this part of the process cannot be omitted (as per NS).
Has anyone tried taking tree ring data, selecting as per team rules to get the proxy set that gives a nice temp response, and calibrating with this set (all as per team rules) but then applying the rules so derived against the entire proxy set?
Lucia,
Based on reading comments over at CA, I think some people fail understanding of the statistical error caused by selecting on the dependent variable (aka screening fallacy) because they incorrectly think that the growth ring signal in trees is the independent variable. They cannot seem to grasp that the tree ring signal is the dependent variable.
John
Nick Stokes (Comment #97513) “…and the proxies are selected (and scaled) to make sure they can get it right”.
So, now to reverse the pedantic thinking. Yes, Nick, one can do things to make sure they CAN get it right, but there is no known way to show that they DID get it right.
See, you have taught me more about the careful choice of words. Ta!
Lucia:
Have you written your own summary of the Gergis debate beyond your excellent 2009 Tricking yourself into cherry picking piece? The posts at CA are now too confusing. It seems to me that a piece that spells out the different positions might be very helpful.
How much is that wood stick in the window?
The one with the bodgity blade?
Whatever, it’s much too high a stick;
It doesn’t look very well made.
====================
Geoff Sherrington (Comment #97530)
” but there is no known way to show that they DID get it right.”
No, you can check. I’m talking about whether they correlate in the instrumental period. They won’t exactly correlate, but you can check how well.
I think you’re querying whether the correlation continues back in time. That’s different. It’s the uniformity principle, and sure, you can’t prove beyond doubt that it did. And you have to look for reasons why it might not have. But that’s true for most paleo inference. And a lot of quantitative geology too.
Nick Stokes (Comment #97513)
Read how here:
Nonesense about the shape limitations. It’s true one method made lines that look like the blue line. But I also made hockey sticks that look like the red line from red noise
http://rankexploits.com/musings/2009/tricking-yourself-into-cherry-picking/
Lucia,
Yes, for the red curve you extended the training period from 1960- to 1900-. And then the selected proxies followed Hadcrut over the longer period, as they should.
You now don’t see the red curve flattening, because you’ve only got 50 years before 1900. And things are more ragged, because you must have had only a few proxies left by that stage.
Your post is right in that if you select by correlation over a period, you’ll get what you asked for. That’s what your post shows.
There is actually a much better way to demonstrate that Nick is right, or wrong.
We know, because Real_Climate_Scientists tell us, that the heating from 1920-1990 is unprecedented in the last 1,500 years.
Thus, take the calibration line-shape and the various proxies that Nick believes to be Thermometers.
Take the first series, say from 1500-1990
Calculate the 70 year correlation of starting dates 1500/1920, 1501/1920, 1503/1920 e.t.c.
Interrogate every possible fit of the unprecedented 1920-1990 temperature line-shape across the whole time span of each proxy.
Plot correlation statistic, like r2, vs start year.
If Nick is right we should have a Mega-Hockey Stick of the form
____________________________________________/
There should only be a good correlation right at the end.
For myself, I think we shall find something else.
I must thank Nick, his position that they are not selection a population and then examining a population gave me one of my biggest laughs in a decade.
Re: Nick “No, you can check. I’m talking about whether they correlate in the instrumental period. They won’t exactly correlate, but you can check how well.”
Are you actually arguing that after screening for proxies that correlate to the calibration period (of the instrumental record), and discarding the proxies that don’t, you can check to see if you got it ‘right’ by verifying your end result against the instrumental record?
Or did I lose your point in the thread?
Nick:
Depends on the characteristic of the noise. If you add more low-frequency noise to your artificial series, the handle isn’t as flat. Lucia was trying to demonstrate a point, not make a literal copy of the hockey stick. For that she didn’t need exact duplicates of the proxies in terms of statistical properties.
To establish the claim that they produce different variances outside of the calibration region, you’d have to show that if you used the CPS on Monte Carlo noise with the same statistical properties. Then you’d have to compare how many were retained and compare that against a real-proxy hockey stick. I suspect this wouldn’t be hard to replicate. It’s just noise, IMO, and I’ll show why I think that below.
Anyway the big problem is with the CPS algorithm itself, which preselects proxies based on correlation and for which there is a documented real variance loss (read descaling outside of the calibration region), and is a method even Mann now seems to think should be abandoned. If
GodMann himself has abandoned the method, why defend it? EIV seems to have less problems, though you still have to be sensible about which proxies to include (and not use series in an unphysical manner).And when you’re finished, unless you’ve got really good geographic coverage (or exactly the same geographic coverage in the different series being compared), your reconstructed temperatures will have different scalings just due to latitudinal effect on proxies even if there is no variance loss in the method. (I talk about it here a bit more.)
Comparing the series and not worrying about the scaling factors (that is to say use Pearson’s correlation to compare the series against each other), Loehle, Moberg and Ljungqvist agree pretty well, but the original MBH series is complete crap (it has zero correlation with the other series outside of the calibration region), and Mann 08 CPS isn’t much better.
Mann 08 EIV does surprisingly well consider they included non-temperature proxies series like Tiljander and Sheep Mountain. I view this as evidence that the method is producing something approaching a real temperature series in spite of significant errors in proxy selection (it’s a sensitivity test if you want of adding non-temperature proxies that exhibit hockey-stick like behavior.)
The proxy ensemble looks decent, though we’d had to control for common proxies. The three sets use completely different methodologies, if I understand Ljungqvist he doesn’t preselect proxies, neither of course does Loehle, Moberg uses the tree-ring proxies for the high-frequency information and a wavelet based analysis to combine them with non-tree-ring proxies that have lower frequency information available. Three different methods, three largely non-overlapping sets of proxies, similar results. I’d say that’s decent evidence that there’s a signal there.
MBH is show for comparison. Draw your own conclusions.
Here are links to the various papers.
Ljungqvist
Moberg
Loehle
Mann 2008
If you go back far enough and you have enough proxies, if the only thing you have is noise, you will get flat in the past.
But it’s not true you can’t get wiggles from red noise– it depends on other factors and you seem to want to make acategorical statement.
More over, even if sticks made from pure noise affected by nothing get flat in the past that doesn’t save the published hockeysticks from the criticisms that the method used imposes a hockey stick on the end. It’s entirely possible that the trees respond to something other than temperature which would result in wiggles– but, to the extent the proxies contain some noise, the screening method forces an uptick on the end by picking out those proxies whose end period does correlate with the instrumental periods.
It is simply the case that this screening method is dubious. If you use it and get a hockey stick at the end, you can’t know if that was caused by screening or if that was really in the data.
Carrick
Lucia had a monthly series with r=0.995. That’s a “half-life” of about ten years. That seems to correspond pretty well with the merge between the tracking behaviour in the training period and the proper representation in the preceding.
As to CPS etc, for the moment I’m just trying to deal with the simpler claim that selection produces hockey sticks. It produces blades, which should therefore be discounted. But it doesn’t select for the shaft, except for that merge period determined by autocorrelation. And for proxies, it’s the shaft that you’re seeking.
Yes. But the fact that you can sift through proxies to match the temperature during training period puts in doubt the idea that shape of the shaft based on those proxies tells us much of anything about temperature variations in the past.
Lucia,
“If you use it and get a hockey stick at the end, you can’t know if that was caused by screening or if that was really in the data.”
You should assume that it was caused by screening, and discount it. That is little loss, because you have the better instrumental data over that time. My point is that producing the blade of a hockey stick doesn’t mean it is getting the shaft wrong, whiuch is what matters. And your simulation is showing exactly that – selection producing a spurious blade with a correct shaft.
Lucia:
If the goal is to select proxies that include a valid temperature signal, why wouldn’t you correlate the proxies against a temperature record that has (a) significant variance, but (b) no significant trend? In other words, why not calibrate against some period like 1910 to 1960? You do not completely overcome the selection fallacy but at least you are not selecting for a hockey stick. This seems to me to be the equivalent of detrending. Is that correct?
Nick
You can’t assume the blade is wrong and you can’t assume it’s right. That is to say– in words I previsously used
So: basically, we can’t assume it tells us anything.
There is no point of reporting a result that — as far as we know– tells us nothing. Sure, it might be right. A result we pull out of a hat might be right. Results taken by averaging the opinions of pyschics might be right. You can’t prove them wrong. But so? The fact is these publications end up promulgating uninformative “shafts” that one shouldn’t consider to be communicating something that has a reasonable probability of being right about the past.
bernie–
They might try that. The difficulty with any screening is you have to think carefully about it. Sometimes, it takes a while to identify whether there is a fallacy hidden in there.
But yes– if they screened based on the early 20th century and then the hockey stick blade came up, that blade would not be the result of screening.
Nick:
“Seems”? You have a statistical test for “seems” now?
To reiterate: You are arguing that Lucia’s handle is too flat (the variance is too small).
Until you match the low-frequency spectral properties of the proxies, you can’t conclude whether or not a reconstruction based on randomly generated series produce artificial reconstructions are consistent with the variance of the low-frequency noise component of reconstructions using real proxies in the handle of the hockey stick.
This follows directly from Parseval’s Theorem of course.
Lucia
“But the fact that you can sift through proxies to match the temperature during training period puts in doubt the idea that shape of the shaft based on those proxies tells us much of anything about temperature variations in the past.”
Why? That’s exactly how the logic of calibration works. You match the performance of some unknowns with a reference in a particular environment, in the expectation that they will work in others.
Suppose you had a chancy manufacturing process for real thermometers. You get a whole lot of blanks, some of which will be OK. How do you tell? In the lab you set up an oven (to vary T), and match them against a reference over a range of T. You scale the ones that can be made to match and discard the rest. Your clones won’t help you improve on measurements made with the reference. But you can send them out to be useful elsewhere.
Proxies work in the same way – you use them in other times rather than other places. The C20 provides the oven for the temp range,
Nick:
And of course, you’ll ignore the CPS reconstructions, because largely they fail. And the fact they fail completely undermines your argument that there is a methodological flaw in purely using correlation to screen.
Carrick
“You are arguing that Lucia’s handle is too flat”
No, I’m arguing that it’s right. Well, I haven’t looked much into variance. But the signal is zero, and the output is flat. Seems pretty good.
Seems again? Yes, I looked at the merge by eye. it’s flat but seems to get perturbed a bit before 1960. I think that’s the autocorrelation kicking in on a scale of a decade or two.
“lucia
It’s entirely possible that the trees respond to something other than temperature”
Oh lets be serious here Lucia, what else other than temperature could possibly influence the growth of tree rings, it’s not like they respond to changes in soil chemistry, changes in eco-systems caused by the introduction of the European Earthworm or other introduced species, parasites and other infections, rainfall, wind speed, atmospheric nitrogen and sulphur oxides or niche competition.
Doc–
I know the idea that tree growth rates might be affected by anything other than temperature is sort of “out there”. Just tryin’ to brainstorm here. 😉
lucia (Comment #97554)
June 13th, 2012 at 7:40 am
That’s a given. The idea, as I understand it, is to be able to find the temperature component, which is what they try to do. Your assertion is just a variation of your hated rhetorical questions.
Lucia,
“You can’t assume the blade is wrong and you can’t assume it’s right.”
Yes, that’s pretty much my view. The blade corresponds to instrumental, so you can’t complain that the selected proxies agree with it. They’re right. But they aren’t providing confirmation.
Look again at the outcome of your earlier post, with the blue curve, say. The premise was that you knew what post 1960 temps were (HADCRUT) and you knew what the proxies should say (flat). So you produced the blue curve that matches the temp where you have instrumental evidence (post-1960) and matches the prescribed proxy behaviour pre-1960. What’s not to like?
In fact, the selection didn’t get “better” proxies – by construction they were all equally good. Any selection would have done pre-1960. But selection didn’t make it worse, there and got it right post-1960.
If you screen to to keep all proxies that match post 1960 and throw away those that don’t match post 1960, your result will match post 1960. Of course. I’m not sure what your point is.
You already know post 1960– so you learn nothing about post 1960. Because this method was used, the fact that the match post 1960 tells us nothing pre 1960. The mathemagic was involved, but we learn nothing about the temperature. So… uhmmm… That’s the point everyone is making: the shaft is uninformative.
Josh –
Thanks for reminding me that we’ve evolved beyond the solely physical transit of information!
Glad we’ve moved on a bit. One mug on order 🙂
Doc,
Yes, of course trees respond to all sorts of other things. Generally only a subset of f actors will be limiting. Sometimes, rarely, that will be temperature alone. And the whole point of the screening process with correlation is to find those rare cases. In the temp correlation, the unconnected processes appear as noise. If the result is significant, it means the noise is down.
Nick– The fact that the part not explained by temperture might be explained by something else means wiggles in the past could be due to non temperature.
I could perfectly well show this principle with synthetic noise. I could just have
growth= aTemp + bSomething + WhiteNoise
The Hockey-stick-omeatic imposes the blade. If ‘b’ is zero, the past is flat. But if b≠0, you can get wiggles in the past, but it’s due to variations ins “Something”– and that something is not temperature.
So… so what if shafts have wiggles?
Do you need me to show this? Seriously?
“That’s the point everyone is making: the shaft is uninformative.”
I am concentrating for the moment on your simple proposition that noise produces hockey sticks. And I’m pointing out that your example disproves nothing – there is no contradiction. The shaft behaves as expected.
But it does not represent real proxies. They rely on the principle of uniformity. That is, that if you find significant correlation, there is underlying mechanism, and the correlation can be expected to continue into the past. Your assumption of randomness excludes that.
Anyway, it’s late here now, and I’ll have to ask to be excused until tomorrow.
“So… so what if shafts have wiggles?
Do you need me to show this? Seriously?”
I would presume they would all agree on some known past events. If they do, then we could well be looking at a temperature proxy. Science is often like that, looking at things that aren’t well understood, it’s kind of it’s nature. Otherwise how do we progress from things that are not understood to things that are. What I don’t get is the utter contempt for scientists who are pushing the boundaries on knowledge. It’s as if they are insulting everyone’s intelligence by doing so.
Nick:
Comparing the variance is the only way you can make any determination. And you have to compare apples-to-apples, so they need to have similar spectral properties in the frequency band you’re interested in the lack of wiggles in the artificial hockey sticks.
The metric you suggested (r=0.995) is completely meaningless for determining whether the Monte Carlo’d sequences are meaningful simulations of the real proxies.
The fact you keep using “seems” tells me this is your opinion rather than something you’ve made any analytic tests on.
OK. It’s your opinion, we understand it’s your opinion and what your opinion is. Can we move on now?
Does anybody have a link to the original Gergis paper? I’d like to see if, like Mann, they actually use a proper test/validation set (not used in the training / calibration /screening).
.
Edit: Nick, I think you’re barking up the wrong tree. Wiggliness or lack thereof is not a sign of skill. Testing against unseen data (which Mann does) is.
bugs:
I don’t think there’s any contempt for “scientists who are pushing the boundaries on knowledge.” I think the contempt is reserved for political operators posing as scientiests who are pushing their agendas at the expense of the science.
bugs
Could you clarify. Who are you referring to using the word “they” and what known past events would you presume they agree on?
You write this in response to my comment. Who do you think is expressing utter contempt for pushing any boundary of science? And what boundary do you think these unspecified scientists are pushing?
I have no idea what points you are trying to make. If you could be even a tiny bit more specific that might help me understand whether there is any signal hidden in your prose.
Nick:
In some of the reconstructions, they fail this test. MBH is an example.
toto:
By Mann, I hope you aren’t lumping MBH in there, or continuing with the fallacious argument that you can’t use Pearson’s correlation to verify.
Gergis is withdrawn. There is no Gergis paper. It’s not even preprint status now.
toto:
No. I have to agree with Nick here. Variance is a measure of the randomness of a sequence. If the reconstructed artificial series can’t reproduce the variance seen in the real signals, that would be evidence that there is a coherent signal present. This statement is provable mathematically.
I agree. But with respect to the claim these are temperature reconstructions, we cannot be certain (or even remotely confident) the signal arises because of temperature.
Lucia:
Yep, Nick’s test will reveal whether or not a signal is present, and how significant that signal is relative to the noise floor.
But it won’t tell you the origin of the signal.
In the case of tree proxies, you may for example be reconstructing precipitation or amount of sunlight outside of the calibration region, which happens by coincidence to correlate well with temperature over the interval you calibrated over.
It doesn’t mean it is wrong – obviously true. But it DOES mean that there is absolutely no reason to think that the shaft contains any useful information whatsoever.
Really.
No reason whatsoever to assume that the shaft shape means anything – whatever that shape might be.
You really can’t seem to accept this, but that is the message – pre-selection really really really does mean that the shaft contains no provably valid information whatsoever.
bushy (Comment #97515)
June 13th, 2012 at 1:46 am
This comment by Jim Boulden at RealClimate under their (Fresh Hockey Sticks) post underpins that cartoon nicely.
“[Response:At the risk of having this statement completely misunderstood and mangled by the usual suspects…if you only had one tree out of 10K that responded well to temperature, and you found and cored that one tree, you would have legitimate evidence of a temperature signal.
===============
Jim Boulden at RealClimate is speaking nonsense. The other 9999 trees are telling you loudly that this is simply chance at work, or there would be a lot more trees that correlate with temperature.
However, if you leave the 9999 that don’t match out of your paper, then this will make the one that does match look like a high quality proxy rather than random chance.
Try and post this very obvious statistical blunder on the RC web-site.
“Nick Stokes
Yes, that’s pretty much my view. The blade corresponds to instrumental”
I has stated that one can perform a very simple and robust test of this assertion.
Is the fit of the proxy line-shape in the1920-1990 period, to the 1920-1990 temperature line-shape statistically significantly greater than all other periods in the proxy record?
Should, say 1820-1890, give a fit as the 1920-1990 period then this proxy is not reading temperature, as we know that the change in modern temperature is far greater than at any time in recorded history.
Do the damned distribution of correlation for all 70 year periods in you proxy record and plot fit vs. starting year.
If you get:-
______________________________________________________/
then you may have a point. How ever, it the 1920-1990 is not statistically different that a previous 70 time period, you are full of crap.
steveta_uk (Comment #97574)
June 13th, 2012 at 8:44 am
My point is that producing the blade of a hockey stick doesn’t mean it is getting the shaft wrong, which is what matters.
========
On the contrary, the selection process is what causes the straight shaft. By selecting on the dependent variable (a statistical no-no) the effect is to make random noise appear statistically significant.
What you are seeing in the shaft is random noise made significant through the selection process. Without the full data you cannot tell if the shaft is statistically significant or not.
It is the 9999 trees that don’t match that tell you how good a proxy the 1 tree that did match is. How well the 1 tree that did match fits with temperature – this is not a measure of how good a proxy it is – because with enough trees in your sample eventually you will find one that matches simply by chance.
Climate science commits a logical fallacy in equating goodness of fit with proxy quality because this ignores that some trees with match by chance. They hide this fallacy by only publishing the trees that did fit, while refusing to publish the trees that didn’t. Thus there is no way to judge proxy quality.
DocMartyn (Comment #97576)
June 13th, 2012 at 9:02 am
then you may have a point. How ever, it the 1920-1990 is not statistically different that a previous 70 time period, you are full of crap.
=========
Also, if the significance changes with the choice of end points, then statistically it is worthless. It is simply another example of selection bias. Choose a 60 year time period instead and there is nothing unusual happening in climate.
Actually, even within just the published pre-selected data you should be able to detect a problem.
Clearly all the selected proxies correlate well with the blade, and together produce a pretty flat shaft. But you should be able to take any 50-year period on the 600-year proxy record, and find that the correlation between proxies remains constant and pretty close to as good as the “training” period from the 20th C.
If 15th C correlation between trees isn’t as good, then you aren’t looking at temperature.
lucia (Comment #97572)
June 13th, 2012 at 8:35 am
evidence that there is a coherent signal present.
I agree. But with respect to the claim these are temperature reconstructions, we cannot be certain (or even remotely confident) the signal arises because of temperature.
=============
Exactly.
1) Trees do not respond linearly to temperature. Both high and low temperatures inhibit growth. What you think is a signal for low temperatures could in fact be the opposite.
2) Trees respond non-linearly to water, sunlight, crowding, disease, pests, fertilization due to run-off, animals, etc., etc.
The technology to perform multi-variant non-linear regression on noisy data and reliably separate out the component signals does not exist in theory or in practice. It is beyond our skill.
O.K Nick, here is what causes tree ring thickness to vary; atmospheric DMS/DMSO alters tree rings widths
http://www.sciencemag.org/content/302/5648/1203/F3.large.jpg
growth= aTemp + bSomething + WhiteNoise
b = DMSO/DMS or MSA
Carrick:
IIUC (probably didn’t) you’re saying, “if your proxy-based reconstruction wiggles more than your red noise-based reconstruction, it means the proxies have some coherent signal which allows the wiggles to survive averaging”.
.
1- You need a couple of assumptions for that, e.g. what if one proxy has much stronger variance (at whatever timescale you’re looking at) than the others?
.
2- The argument doesn’t work in reverse – absence of wiggling isn’t proof that the proxies have no signal, it might just mean they’re really low-freq (or that the real signal happens to be flat!).
.
3- More generally, as I said, absence or presence of wiggles don’t tell you anything about *skill* at reconstructing your quantity of interest. As Lucia pointed out, the “coherent signal” can be anything. Only a separate test set can allow you to make any inference about skill.
.
Hence my interest in looking at what Gergis et al. actually did.
toto:
If you’ve scaled and weighted them properly and they are all tracking temperature, that’s corrected for in the reconstruction.
You’re right if the variance changes across proxies, and you fail to properly correct for variance (there are probably a host of other errors you could make that invalidate it too) then this statement won’t hold.
The Monte Carlo will give you an estimate of the noise floor. You can never prove that a signal it not present at some level, but if you know the noise floor, you can set a limit on detectability.
You’ll need to define precisely what you mean by skill.
Detectability depends on what the noise floor is, and if you know the noise floor that’s told you something about detectability. The lower the noise floor, generally, the more skilled a reconstruction is. That’s how I use the term at least.
steveta_uk:
That is part of the problem: Many of the series fail to verify against each other (e.g., MBH comes to mind). In some cases, the authors withheld this adverse information as “uninformative”.
It is true that you can get spurious correlations using Pearson (or Spearman if you have nonlinear relationship) r. That’s one of the arguments used by Mann for not publishing the results of his correlation. The argument is specious though. Simply because you can get a false positive doesn’t mean that you don’t use it as a “fails to verify” metric. Verify doesn’t mean the relationship holds, and the proxies may be internally inconsistent for some other reason. But failing to verify using correlational analysis is a big red flag.
http://rankexploits.com/musings/2012/gergis-by-josh-haiku-by-lucia/#comment-97564
Nick Stokes’ statement of what he understands the “principle of uniformity”:
The principle of uniformity as generally understood is only that the same laws of nature and processes applied in the past. Not that you can extrapolate mere statistical correlations into the past – particularly when there is no forward process.
The scary part is not the butchered stats. It is the incredible overreach of the conclusions by Gergis, Karoly, et al. Even if they were to perform a little PCA properly on some proxies, there is simply no justification for claiming that they proved much of anything. At best, they would have added some bit of evidence to the argument. But we get this hubristic overkill in claiming ironclad proof that goes so far beyond the limits of their evidence as to make one wonder if they ever had any worthwhile training in science. They sound like propaganda ministers in full spin mood.
Steve McIntyre:
Agreed. Principle of uniformity applies to physical laws holding up.
The discovery of a correlation between temperature and number of San Francisco firemen isn’t going to teach us anything useful about past climate.
I don’t know any physicist worth his salt who hasn’t had this principle drummed into his head “correlation does not imply causation”. Why there’s even a Wikipedia entry on it.
Gee there’s even a latin phrase for this logical fallacy
cum hoc ergo propter hoc.
In paleoclimate, it is known by the rubric “Mannian statistics”. 😈
Things That Make You Go Hmmmm
I down loaded Steve McI’s excel spread sheet with all the used proxy data.
I took the average and SD of each of the proxies.
I transformed using
(reading – mean)/SD
or if negatively correlated
(reading – mean)/SD
So each proxy now has a mean of zero and an SD of 1 from 1920-1990.
Now these have all been screened against the temperature 1920-1990 haven’t they.
So if you average all the proxies, which are reporters for temperature, you should get the line shape of the temperature from 1920-1990; with a mean of zero and an SD of about 1, and not an SD of 0.45.
very odd indeed. No warm 1940’s and no drop from the 50’s to the 70’s.
DocM, in fairness to Gergis, they did not orient coral O18 proxies which have a negative correlation to temperature.
Other O18 proxies (classically ice cores) have a positive relationship and the negative relationship of coral O18 causes multiproxy jockeys to flip them.
But there remains an interesting physical question as to why coral O18 have a negative relationship to temperature whereas the classical Rayleigh O18 isotope relationship is positive.
Does this blog allow us to consider physical evidence in addition to statistical analysis? Because the reconstruction in Gergis quite clearly shows a dip in proxy temperatures around 1450. And lo and behold, a large volcano erupted at Kuwae in 1452, which released a large amount of sulfates into the atmosphere (based on Antarctic ice cores).
http://oro.open.ac.uk/5106/
So, we have reconstruction based mostly on corals and tree rings, that nails down the cooling effect of a volcanic eruption 600 years ago. That’s pretty impressive, since the cooling impacts of volcanic eruptions only last a few years or so. Somehow the Gergis reconstruction (and the Northern Hemisphere hockey sticks) were able to pick out the cooling dip caused by the eruption.
How does statistical processing of red noise pick out the cooling dip associated with the Kuwae eruption?
The other peculiarity of the Stokesian principle of uniformity is the way it is supposed to apply in time but not in space.
Nick would have us believe that because certain larches growing on one Siberian riverbank happen to have ringwidths that correlate with temperature during the 20th century then larches growing on that river bank (most of which lived and died hundreds of years ago) have always been well correlated with temperature. At the same time we are asked to believe that otherwise identical larches living on a similar river bank a few miles away are not and never have been correlated with temperature.
It’s a very odd point of view.
Paul K2: The old Briffa reconstructions also show clear dips around historical volcanic eruptions, both in the high-freq and low-freq data – including one around 1450.
.
Now as we all know the Briffa reconstructions are evil (because “divergence”, “Yamal”, something about Mann and stuff). So I doubt this argument will convince the doubters.
.
http://ecosystems.wcp.muohio.edu/studentresearch/climatechange03/elnino/Holocene%20trees.pdf (Figure 5)
toto: I find the overwhelming opinion on this blog post and comments, that all the hockey sticks are sifted from red noise (as illustrated clearly in the diagram at the top of this post) a bit shaky. Why don’t the people who push this point of view, publish their criticisms?
They claim they are just science critics, or auditors, or inspectors, or some such mythical beings, and thus don’t have to publish. But if they tried to publish this “hockey sticks are just sifted from red noise” theory, they would have experts reviewing ALL the scientific evidence, and their darling pet theory would end up in the trash can.
No, its much safer to preach this theory using internet posts.
“So I doubt this argument will convince the doubters.”
If you mean I won’t be convinced that a squiggly-line drawing describes something meaningfully just because you say it might, you are right.
Andrew
toto: I looked through the Briffa paper you linked to, and Figure 5 clearly shows the detectable impacts of other volcanoes in the reconstruction, notably including the 1641 and 1816 events.
Proponents of the “hockey sticks are sifted from red noise” hypothesis need to explain these unusual coincidences.
Toto, Paul K2,
Your Briffa link is interesting. Fig. 5 is convincing. But, actually, as you say, there is the divergence. And that changes everything. The figure 5 concerns the MXD which follow temperatures very well, including the twentieth century until proven otherwise. Stations Thermometers represents a very poor proxy of continental temperatures, it is perhaps time to admit it.
And if anyone thinks the Briffa MXD don’t follow well temperatures of the twentieth century because we have satellite data etc.. Just see the data in detail, Briffa 1998 on the twentieth century:
http://img221.imageshack.us/img221/3179/briffa1998p.png
And connection with the TLT:
http://img708.imageshack.us/img708/1363/anomthn.png
Paul K2:
“Clearly visible” being a statistical criterion now too?
Jeebus, yes it’s there, but so is a lot of unexplained variability.
We can pick of the climate equivalent of a sledgehammer. Yeah!
(And of course that doesn’t imply that the signal has anything to do with temperature, it could be reduced sunlight, change in precipitation, etc. Major volcanoes affect weather, but it’s a consequence of the other main effects they have.)
Carrick: So what is your explanation of how the proxy temperature data shows temperature drops at the times of these major eruptions? Remember the hypothesis we are discussing, as stated by Lucia above:
Once again, application of mathemagical screening that can make hockey sticks out of trendless red noise results in a report that a hockey stick emerges from proxies.
Show us your statistical analysis of trendless red noise sifted into a hockey stick, and point out where the resulting proxy temperature plot created from red noise shows cooling periods associated with the 1452, 1641, and 1816 volcanic events.
Paul K2:
Who said that “all of the hockey sticks are sifted from red noise”? Can you give a direct quote please and a link to that individual saying it? Otherwise this is just a strawman you pose.
While you’re at it, you can observe for us the irony in your comment that people writing posts critical of Mann have to run a gauntlet to get anything published in any journal other than E&E. Gatekeeping is there and in full force.
Regardless, if a thing is true, it’s true regardless of the source. You’ve simply given us a nice example of the appeal to authority fallacy. There’s a nice latin phrase for this:
Argumentum ad verecundiam.
Is today strawman/logical fallacy double-prizes day and I wasn’t told? 🙁
Paul:
That was never my argument. What I said was you can see a sledge hammer effect on climate show up in tree-ring proxies.
So how do you know it is temperature related? As opposed to e.g. precipitation or number of hours of sunlight?
“Paul K2
Carrick: So what is your explanation of how the proxy temperature data shows temperature drops at the times of these major eruptions?”
Is that meant to be a rhetorical question ?
Top of my head;
aerosols light scatter and drop photosynthesis, particulates cause nucleation and alter rainfall patters from land to sea,
acid rain caused by Sulphates,
ozone depletion causes by high altitude NOx, change in local flora/fauna following cold winter spike
alteration in seed/truck resource allocation
attenuated light at dawn/dusk expands time range on destructive insect daytime activity
Paul K2 (Comment #97609)
Should we be able to clearly see the volcanic eruptions in the record, though?
Can you see Krakatoa in the observed temperature record? I can’t.
And can you see Krakatoa in the proxy record?
Carrick: I am happy that you agree with me, that the hockey sticks clearly have not emerged from trendless red noise by applying improper screening. They are real, and match real world observations.
Perhaps you can convince Lucia to revise this post, correcting her erroneous statement above.
Paul_K2, If you want my view on hockey sticks, see this comment. MBH 1998 is red-noise generated and worse when one looks at the level scholarship involved (making it extremely humorous to watch you lament about the lack of peer reviewed papers by critics). On the other hand, Moberg, Ljungqvist and Loehle, I view as having produced valid reconstructions. That probably puts me at odds with a number of people on this thread.
In any case, Lucia is quite able to fend for herself on her views, but they are her views not mine. AFAIK, we are not locked into supporting or criticizing her views here, we’re allowed to comment on our own views without being forced into a “contrast and compare”.
Paul_K2:
I do partially agree with you. I think some of the reconstructions are noisy but valid temperature reconstructions, with the “noise” being influences from other climate-related variables.
I even mostly agree with Nick. See comments above too.
Just as a WAG, I’d say about 80% of the reconstructions are just noise. The ones that aren’t, generally avoided or were otherwise insensitive to the screening fallacy that Lucia alluded to above (and is the point of her post, in case you missed that).
“Carrick
On the other hand, Moberg, Ljungqvist and Loehle, I view as having produced valid reconstructions.”
I think they are reasonable goes at it.
I think it would be much better to match all 62 southern hemisphere proxies with the MWP/LIA rather than temperature.
Do the temperature calibration last.
Andrew FL: Krakatoa definitively shows up in the temperature record; e.g. read this wikipedia summary:
In the year following the eruption, average global temperatures fell by as much as 1.2 °C (2.2 °F). Weather patterns continued to be chaotic for years, and temperatures did not return to normal until 1888. The eruption injected an unusually large amount of sulfur dioxide (SO2) gas high into the stratosphere which was subsequently transported by high-level winds all over the planet. This led to a global increase in sulfurous acid (H2SO3) concentration in high-level cirrus clouds. The resulting increase in cloud reflectivity (or albedo) would reflect more incoming light from the sun than usual, and cool the entire planet until the suspended sulfur fell to the ground as acid precipitation.[11]
There was a paper published in 2006 regarding the long term impact of Krakatoa, arguing that the sulfates influenced global temperatures for decades.
The Briffa paper reference above shows the Krakatoa impact as the 1884 spike down in the proxy temps, and the Gergis reconstruction also shows the impact of Krakatoa.
Andrew_FL:
Yes you can see it, especially in the sea temperature record link. (The eruption occurred in 1883 btw.)
That said, it’s my understanding there’s a latitudinal dependence, and possibly seasonal effect, on how much influence a particular eruption has on global climate. There’s really no question that the 1816 eruption in particular was highly disruptive.
DocMartyn:
It would be where I would start.
Steve M
“Nick Stokes’ statement of what he understands the “principle of uniformityâ€:
‘They rely on the principle of uniformity. That is, that if you find significant correlation, there is underlying mechanism, and the correlation can be expected to continue into the past.’
The principle of uniformity as generally understood is only that the same laws of nature and processes applied in the past. Not that you can extrapolate mere statistical correlations into the past – particularly when there is no forward process.”
I was paraphrasing the NRC North report version, which I also quoted. It says:
“All paleoclimatic reconstructions rely on the “uniformity principle†(Camardi 1999), which assumes that modern natural processes have acted similarly in the past, and is also discussed as the “stationarity†assumption in Chapter 9.â€
And in Chapter 9:
“Stationarity: The statistical relationship between the proxies and the climate variable is the same throughout the calibration period, validation period, and reconstruction period.”
It’s not my invention.
PaulK2 (#97600)
“[T]he reconstruction in Gergis quite clearly shows a dip in proxy temperatures around 1450” which you attribute to the Kuwae eruption of 1452. According to Wikipedia, “late 1452 or early 1453”.
Let’s look at the reconstructed anomaly around then:
Year Anomaly(K)
1448 0.012
1449 -0.275
1450 -0.21
1451 -0.41
1452 -0.467
1453 -0.333
1454 -0.368
It’s true that 1452 is slightly (0.057 K) lower than the previous year. But what of the drops of 0.2 K or more in 2 of the 3 previous years? Further, the Kuwae event should show up in the 1453 entry, which covers Sept 1452 – Feb 1453, and is (according to Gergis) warmer than the preceding summer.
I don’t think there’s a clear signal there. I think you’re seeing what you want to see. Plot the reconstruction for, say, 1430 to 1470, including the stated uncertainty of +/-0.4 K.
[And by the way, I think the 2SE uncertainty estimate of +/-0.4 K is too low for a 4-proxy reconstruction. YMMV.]
“Nick Stokes
I was paraphrasing the NRC North report version”
“But I think we may with much probability say that the consolidation [of the earth] cannot have taken place less than 20,000,000 years ago, or we should have more underground heat than we actually have, nor more than 400,000,000 years ago, or we should not have so much as the least observable underground increment of temperature.”
“That some form of the meteoric theory is certainly the true and complete explanation of solar heat can scarcely be doubted, when the following reasons are considered: (1) No other natural explanation, except by chemical action, can be conceived. (2) The chemical theory is quite insufficient, because the most energetic chemical action we know, taking place between substances amounting to the whole sun’s mass, would only generate about 3,000 years’ heat. (3) There is no difficulty in accounting for 20,000,000 years’ heat by the meteoric theory.”
“Modern biologists are coming, I believe, once more to a firm acceptance of something beyond mere gravitational, chemical, and physical forces; and that unknown thing is a vital principle.”
Lord Kelvin
Paul K2 (Comment #97618)-Rather than just taking Wiki at there word, why don’t we look at the temperature record?
http://www.woodfortrees.org/plot/hadcrut4gl/from:1883/to:1888
I sure as heck can’t see a 1.2 (!) degree change anywhere in the temperature record. Wiki’s claim of such large down spike is absurd.
Carrick (Comment #97619)-It’s rather bizarre to make reference to a difference between two sea surface temperature records as showing the impacts of a volcanic eruption on actual Sea Surface Temps.
well.
there is a record of instrumented temperatures going back before 1850. ahem.
and carrick.. yes 1816..
It will put an interesting test to everything.
HaroldW: There was initially some confusion over the dating of the Kuwae eruption. Apparently sulfates begin showing up in Antarctic ice around 1450. After further analyses, the date of the explosion clearly seems to be in 1452. But we can’t be sure that sulfates weren’t already being generated in large quantities prior to the explosion.
The modeling work presented in Figure S4.2 of the Gergis paper shows the expected cumulative impacts of the different forcings, including sulfates.
Here’s the above chart with the climatology re-centered to the thirty one years around Krakatoa, in case long term differential warming of winter over summer adds seasonal noise:
http://i23.photobucket.com/albums/b370/gatemaster99/whereskrakatoacentered.png
Still can’t see it. Sure as heck can’t see a 1.2 degree decrease.
Here is a 2005 GRL paper on proxies and volcanoes since 1800. He’s generally trying to figure why they don’t show up more. However, Tambora is big.
Andrew_FL: Krakatoa starting blowing ash in the spring of 1883 and blew up in August,1883, the year you started your temperature plot.
Run it from 1880 to 1890, and you can see that the annual average global temperature anomaly dropped several tenths of a degree. Please note that another of the 12 biggest or so volcanic eruptions in the last 250 years occurred in 1886 at Tarawera on the North Island of New Zealand. The entire 1883 to 1888 period had depressed temperatures.
The wikipedia entry appears to be referring to a shorter averaging period (perhaps monthly?), and could be referring to land only regions.
Andrew_FL, different geographical coverages.
Make sense now?
Paul K2 (Comment #97630)-“Run it from 1880 to 1890, and you can see that the annual average global temperature anomaly dropped several tenths of a degree.”
Why start counting volcanic cooling before the volcano goes off? Either way, the claim is being made that the temperature proxies are valid because they show obvious volcanic cooling. But when the temperature record, the observations should be showing just as striking of temperature decreases, they don’t. If the temp decreases are obvious in the reconstructions, why aren’t the obvious in the observations?
Nick, it’s not North’s invention either. He’s just misapplying it.
If two quantities are linear-causally related, you should get a correlation (it’s a sufficiency condition). However, getting a correlation between two quantities says nothing about whether they are causally related.
Cool reference, glad it wasn’t just me that didn’t think it was that obvious.
Andrew_FL asked “Why start counting volcanic cooling before the volcano goes off?”
Because you need a baseline to compare the cooling to.
Paul K2 (Comment #97634) -“Because you need a baseline to compare the cooling to.”
Which is why I re-centered the climatology to the thirty one years (fifteen on either side) around the eruption. 1884 and 1885 are indeed below average, but they don’t form an obvious down spike beginning in 1883. If you smooth the data with a 13 month centered average it does look like a cooling spike begins a little after the eruption, too bad it’s only about .1 degrees. Try spotting that in a noisy proxy recon.
And that is the point: we shouldn’t be able to easily see the volcanoes in the proxy record.
Andrew_FL:
I don’t see that either. It references an article, perhaps they’re referring to a particular region, I need to be writing a report not playing. >.>
One of the problems is the residual noise is much large because of the limited number of stations. 1983 and 1991 aren’t all that apparent either, in the global surface mean. See e.g. the MSU data.
TLS is particularly striking.
(By the way, the weakness of the signal is why I don’t think the climate response in the tree-ring proxies to volcanism are temperature related.)
“Mann 08 EIV does surprisingly well consider they included non-temperature proxies series like Tiljander and Sheep Mountain.”
From what i’ve seen of EIV and his reconstruction it is not as successful as CPS. I can elaborate perhaps a bit further in the future but we are intending on submitting a study which evaluates different reconstructions and we find that there is some overestimation of low-frequency changes in Ljungqvist’s stuff. I actually chatted with him a bit about it. Loehle comes up as being the worst reconstruction we looked at out of any of the 15 different ones because it fails on both high and low frequency…
Regarding volcanic eruptions and their impact on climate and trees – remember Mann et al (2012?) have got a paper on that where they find significant volcanic eruptions can sometimes produce no tree growth whatsoever resulting in no ring or a very faint one. I can tell you 1883-1884-1885 was cold in the North Atlantic – that’s where the impacts of many volcanic eruptions are most strongly affected.
Andrew_FL and Carrick: You should read the 2005 paper referenced by Nick Stokes above.
The paper discusses the impacts of the volcanoes on the proxy records, and answers some of your questions.
The key with all this proxy talk is that by rights you should be producing a unique reconstruction for each location based on the best locally available climate series’.
[1] Take TempLS and BEST and use them to create a regional temperature composite
[2] Set a series of criteria (statistical and physically based) to test each proxy’s sensitivity to climate. Things like reproducing high amplitude events should be important
[3] Provide secondary screening based on correlation on a period such as 1880-1960 or perhaps use two thresholds (1880-1910 and 1930-1960).
[4] Calibrate the proxies that pass the initial climatic sensitivity screening to the instrumental record
[5] Produce temperature reconstructions for each location
[6] Assess local reconstructions with comparisons to red and white noise
[7] Combine all the local temperature reconstructions into one using TempLS method
As I understand it volcanoes, like rice/wheat prices, viking settlements, medieval mines under glaciers or even the Chinese pollen record, are only counted as anecdotal evidence. If it is not in the tree rings, then it didn’t happen.
More fun for everyone? Check out the New Zealand tree ring reconstruction discussed in McIntyre’s post here:
http://climateaudit.org/2012/06/03/gergis-two-medieval-proxies/
Readers should be aware that Joelle Gergis said in her paper that they couldn’t get a clear warming signal for the huge 1258 eruption; a volcanic eruption still steeped in mystery (we don’t know where it occurred). I have been discussing the Kuwae eruption in 1452, and later eruptions, and their relatively clear signals in the hockey stick reconstructions. If you check out McIntyre’s post and look at the NZ tree rings, you can see the 1452 spike down, and an even bigger spike down in the timeframe consistent with 1258. Somehow this signal got ‘diluted out’ somehow, but the big 1258 eruption appears to be in the NZ tree ring reconstruction.
On the lighter side – “mathemagical” that is priceless.
“If it is not in the tree rings, then it didn’t happen.”
What about entire ice caps melting away that were existing at the MWP such as in Anderson et al (2008)? In the sensitive location of the Canadian Arctic these are important indicators – not to mention the fact that pillow ice caps are amongst the most strongly dependent on temperature of any glacier.
“Observational records show that the area of ice caps on
northern Baffin Island, Arctic Canada has diminished by
more than 50%since 1958. Fifty 14C dates on dead vegetation
emerging beneath receding ice margins document the
persistence of some of these ice caps since at least 350 AD.”
But how about we look at the melt rates on Arctic ice caps?
“Arctic ice core melt series (latitude range of 67 to 81 N) show the last quarter century has seen the highest melt in two millennia and The Holocene-long Agassiz melt record shows the last 25 years has the highest melt in 4200 years. The Agassiz melt rates since the middle 1990s resemble those of the early Holocene thermal maximum over 9000 years ago.”
Fisher, D., Zheng, J., Burgess, D., Zdanowicz, C., Kinnard, C., Sharp, M. and Bourgeois, J. 2012. Recent melt rates of Canadian Arctic ice caps are the highest in many millennia. Global and Planetary Change 84-85, 3-7.
Anderson, R.K., Miller, G.H., Briner, J.P., Lifton, A., DeVogel, S.B. A millenial perspective on Arctic warming from 14C in quartz and plants emerging from beneath ice caps.
A quick, relevant read from 1950: http://edvul.com/extrapdf/Cureton_Baloney.pdf
Nick, please do your best to turn baloney into prime rib.
Robert:
Robert thanks for the comments, but I’m not entirely sure I buy into this. I hope you’re OK with that and are open minded to different ways of looking at things.
I think what you say follows if you start out by assuming that all of the reconstructions were created equal, but of course in science, some are better than others.
The three series I graded based on my own prior criteria were Moberg, Ljungqvist and Loehle, and they actually agree incredibly well given the general weaknesses of the reconstruction algorithms and the very noisy proxy data.
If you don’t assume a common baseline and you don’t assume that they are strictly calibrated to temperature, they covary quite well, as does Mann EIV (Mann also states that he thinks EIV does a better job, btw, than CPS in his 2008 paper, see the text at the top left on page 13255). Again here’s the link to the correlational study I performed on these reconstructions.
In my opinion, these four series are probably getting the low-frequency portion “about right” and the other (mostly CPS-like) methods are systematically underestimating it. I gave the rationale for why I preferred Moberg, Ljungqvist and Loehle reconstructions over others in above comments. But briefly Loehle in particular preselects proxies based on prior physics-based correlation, then simply applies an unweighted sum. IMO none of the series reconstructs a quantity that can be directly related to global mean temperature, because long-term trend in global mean temperature has a latitudinal bias associated with it due to polar amplification. I think you need to weight each series by the appropriate teleconnections function to get something that is close to global mean temperature.
Loehle, by the way, would be predicted to fail for high frequency because it doesn’t include tree-ring proxies (Moberg makes a point of using these proxies to recover short-period climate fluctuations). So we agree on that point.
Finally, I think that most of the other reconstructions suffer from a loss-of-variance and a loss of low-frequency coherence due to the method by which they are summed (I think it’s the result of contamination of the data by “red” noise). This is a hypothesis, but I think it could be tested using a correctly performed Monte Carlo study.
What would be interesting would be to do an internal consistency test on the other 11 or so proxies that you looked at (e.g. Pearson’s r on a sliding 300-year window). Given what the spaghetti curve looks like (no particular pattern), I would be surprised if you saw see much correlation between them.
I suspect the word was coined by JeffId of the air vent. See https://www.google.com/search?q=the+air+vent&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#hl=en&client=firefox-a&hs=81z&rls=org.mozilla:en-US%3Aofficial&sclient=psy-ab&q=mathemagic+site:noconsensus.wordpress.com&oq=mathemagic+site:noconsensus.wordpress.com&aq=f&aqi=&aql=&gs_l=serp.3…6259.16771.1.17000.18.18.0.0.0.10.253.3610.0j13j5.18.0…0.0.-7LaxTqwqqI&psj=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=1c4c7c40b873c9a6&biw=1048&bih=874
PaulK2 (#97627)
Apparently sulfates begin showing up in Antarctic ice around 1450. After further analyses, the date of the explosion clearly seems to be in 1452. But we can’t be sure that sulfates weren’t already being generated in large quantities prior to the explosion.
The Witter&Self article which you cited is paywalled, but I ran across this:
.
So the historical record shows noticeable effects in 1453 and following (meaning austral summer 1454 by Gergis’ numbering). Now you’re attributing the temperature drop (in Gergis’ reconstruction) between 1448 and 1452 to pre-explosive sulfate releases. Yet you would then have to concede that this gigantic explosion, which released >100 Tg of H2SO4 into the atmosphere*, and resulted in recorded world-wide effects, didn’t result in a drop in temperatures (in the reconstruction).
.
It’s a Rorschach blot. Nothing more.
.
* The “>100 Tg” value is from Witter & Self. I have no idea how much this really is, but the figure sounds very impressive. I’ve seen other folks (not you) throw large-sounding numbers around without context in order to create an impression; just wanted to see how it works.
Oh… the word may have been coined by ryan O in a guest post here!
Ryan O posted on this at Lucia’s: http://rankexploits.com/musings/2009/steigs-antarctica-part-three-creative-mathemagic/
I think it’s fair to be open to different ways of looking at the data. My analysis of EIV is of course hampered by the amount of filtering he does on the final recons in his 2008 paper. I find some very interesting dynamics because of this. Nevertheless I think that although Loehle’s attempt is a very simple unweighted mean that likely is moderately related to temperature – I think that there’s too many methodological issues with it for me to be able to take it seriously – even if it agrees okay with the others.
As an example Loehle linearly interpolates a pollen reconstruction to annual scale when it is originally comparable to 100-year means. I think that although technically you can do it – it is dubious. Notwithstanding that the particular pollen data shows strongly different regional signatures and has a strong dependency on the source data used for the calibration. This issue is one that Paleo has not done enough to recognize – that there’s significant differences between using reanalysis versus Cru versus GISS versus NOAA versus BEST for calibration.
Nevertheless i’ll leave out the rest of my comments on loehle, pollen, other proxies etc…
“If you don’t assume a common baseline”
Do you not have Loehle baselined to the others in your comparison?
I had to go look again at the paper we’re preparing to submit but I’ve looked again and we find that one study in particular best models centennial climate variability and it is not a study that one might expect – nevertheless I can’t reveal more details until it is accepted.
What I will say though is that the work by Christiansen is important to this entire discussion about proxies and their recent publications (CL2011 for one) show greater centennial scale variability than former studies. I think that the three studies you have shown there (Moberg, Loehle and Ljungqvist) together with your ensemble mean probably miss the so called 2nd MWP to some degree. However your ensemble does predict the maximum medieval warmth where I would expect to see it (800-950) and where other studies corroborate (D’Arrigo et al 2006).
I think that the final conclusion is that climate will be more variable than some studies have shown (even moberg) but also that the early century warming will be comparable to the MWP. Take the early 20th century warming and leave it ongoing for 100 years and you have the MWP..
Carrick, I have a question. You say:
Wouldn’t we expect to be able to find that sort of similarity between reconstructions regardless? If you have a large amount of reconstructions, some of them could be similar largely by chance.
That sort of thing is why I wouldn’t take any similarity between Mann 2008’s EIV (CPS isn’t even worth discussing) and other reconstructions seriously. The fact a reconstruction dependent upon unacceptable choices is similar to another reconstruction doesn’t indicate the poor reconstruction is still right.
Robert:
What I meant by that is, when you compute Pearson’s correlation, it of course subtracts the mean of the two series on a window by window basis.
So in effect you’re ignoring the baseline choice of the two series.
If they are missing part of the MWP, wouldn’t that suggest there still is a problem with the low-frequency components being too small?
Brandon Shollenberger (Comment #97651) -Carrick’s other criteria, “largely non overlapping” data on which they are based, is violated among the Hockey Team recons.
Brandon
I didn’t randomly select those three series, nor did I look at how well they lined up before selecting them. So it would be a very interesting coincidence for the three I pre-selected to match up so well (and the ones I would have predicted wouldn’t agree don’t).
The similarities of Mann EIV I presently take with a grain of salt (due to corruption of the series by the double-counted Tiljander series and questionable Sheep Mountain proxy).
“If they are missing part of the MWP, wouldn’t that suggest there still is a problem with the low-frequency components being too small?”
Not necessarily – I’m referring to the warming during the 15th century that is picked up in many other reconstructions. Either way I think it comes down to a data quality issue – particularly for loehle where the resolution of some series (pollen) is not sufficient to capture periods of rapid warming that are short.
That being said most are not able to reproduce multidecadal to centennial scale climate variability well because of the data ultimately. Some of the low resolution proxies are very slow acting and essentially lag temperature because of the inertia in the system. Take pollen for one – the way they extract temperature is wholly dependent on ecosystem changes which are very slow moving and are reacting to changes which began far before. They also cannot capture periods of rapid warming very well because vegetation growth has that period of inertia.
– I think there is of course much use in incorporating the low frequency climate proxies in with the annual but I’ve yet to be convinced of a particularly pertinent method of doing so which does not overstate or understate variability.
Regarding the first part – yeah I missed that part about the pearson being used in that manner – should read more carefully.
Robert, thanks I agree… if it’s a short-duration peak in warming, Loehle’s series would miss it.
In terms of interpolating the pollen series, probably there are better methods than linear interpolation. 😉
Andrew_FL may have given an interesting answer to my question to Carrick:
There are a couple dozen reconstructions one could pick from for comparisons. However, most of them are not, in any meaningful way, independent. This means the sample size for independent reconstructions is much smaller than the total population, making it harder to find spurious correlations.
In other words, because the scientists involved have done such a bad job, we can’t consider as many reconstructions, and thus Carrick’s comparison is more meaningful. Strengthening a conclusion by screwing up…
Carrick:
I guess that would be more meaningful for you than it is for me. You (I assume) are actually aware of the processes which went into your choosing those three for comparison. Me, I’ve seen them compared in so many places, including ones with heavy bias, that selection criterias can’t simply be trusted.
That said, it is a bit worrying to hear you say the ones you didn’t expect to agree, don’t. The implication is you considered more reconstructions than you showed, and you chose to not include ones you expected to match poorly. That would diminish the significance of what you display by a large margin.
Of course, it may be those you didn’t expect to match were excluded for other, good reasons, and thus that isn’t the case (I assume that’s the case). It’s just not indicated in your comment, so it’s hard to know how to interpret your claims.
Brandon – I think it’s unfair to say that “scientists have done such a bad job”. There are many scientists in the paleo field that do impeccable work and who are very clever – skeptical (in the true sense of the word) and try to adhere to the utmost integrity. This is the problem with climate change – everyone refers to these few prominent scientists who make statements and receive criticism – however there is a lot of literature out there and a lot of names and papers that people just seemingly ignore – papers which I feel are important – particularly from a methodological standpoint. Balance is always important – the thing i’ve always accused some in the paleo community of doing is not balancing between statistics and physical properties. A good example of which is using proxies counter to their intuitive direction. Call it what you like “cherry picking” or making “cherry pie” but I agree with the approach by D’Arrigo and others where you make regional composites that make sense physically and then combine. On a related note – one of their regional chronologies made available in Kinnard et al (2012) seems to be a very good proxy for temperature for the subregion I have worked in. Too bad about that whole divergence thing though…
Nick Stokes
Here’s my understanding. Where am I going wrong?
1) Tree rings are never going to be a perfect match to temperature, they are noisy.
2) The blade is forced to be the best match possible by picking series that best fit the instrument record,.
3) Having done this you take a look at the pre-instrumental period to get an extended record, the shaft.
4) Claims such as “warmest in 2000 years” are dependent on comparing the shaft and the blade.
5) If there is more noise in the shaft than the blade any lumps and bumps from the signal will be muted there.
The question seems to be that by screening out the noise in the blade have you screened out the same amount of noise in the shaft? Do series remain consistently good proxies for temperature over 2000years? Does the divergence issue suggest no?
That’s assuming the screening method is screening for real signal in all cases.
Being generous to the method it seems the safest conclusion would be that the variation seen in the shaft represents a minimum with potential for greater variability in the real signal due to bias towards dampened variability being introduced by the methodology?
Brandon:
No, that’s not what I meant at all.
What I’m saying is the ones that I expected to not agree apriori (e.g., MBH) due to what I view as methodological flaws did not agree with those that I thought had fewer methodological flaws when applying the same verification criteria I used on the other series.
Again I picked three series based on a relatively clean methodology, no use of CPS or CPS like algorithm, and that all three series have largely non-overlapping proxy sets. The selection occurred before the verification process, in other words.
First pick the best methods, then analyze them, then obtain the results. No going back and tweaking the results if they didn’t turn out the way you expected.
I excluded the other series due to worries of highly overlapping data sets and because I think many of them are, bluntly, red noise and no signal. It would however be interesting to do pairs of three, and see how many of them get similar high correlation levels.
It’s this all testable?
For example if you did a local reconstruction in a region with an extended instrument record. Central England seems like a good start. Are there any proxies In the Uk? You could have many more time periods over which to screen a database of proxies. Then look at the impact fitting different time periods has on the shape of the rest of the reconstruction. I guess even the CET doesn’t go back to the MWP which would be the ultimate test.
Alternatively generate a database of synthetic time series that contains a signal, let’s say a series of peaks and trough, but also a degree of noise.
The series contains 4 peaks. You know the signal for the first peak (i.e the instrument record). So you screen the database for best fit time series for that peak. You generate the reconstruction based on this subset. Does the first peak now come out highest because the noise in the other peaks is dampening the signal?
If you repeat the experiment but this time you know the signal for the second peak and base the reconstruction on the series that best fit this signal does the 2nd peak in the reconstruction now look the tallest and the rest have been muted by the noise?
And so on. Is this how the methodology is potentially biasing the data?
Robert:
I think it’s strange to criticize a person for painting a group with a broad brush then paint the group with a similarly broad brush. I’d be happy to hear recommendations for papers I should read, or authors I should pay attention to. However, I’ve followed the paleoclimatology discussions on the major blogs, as well as in the IPCC cited literature. Unless you’re going to argue those sources have simply failed to consider a large amount of work (which I accept is possible), I can’t agree with you.
My comment is based upon an examination of the work promoted by the “consensus.” If it’s unfairly broad, that’s because the “consensus” is screwed up.
And not that it changes anything, but if you’re going to remove a word when quoting someone, you should use an ellipsis.
You may have some point to make, but I’m afraid I can’t even begin to consider it as long as you put it in verbiage like this. D’Arrigo chose to not archive data which didn’t give desired results. There is no meaningful difference between doing that and deleting data because you don’t like it. If you think that sort of thing not only deserves to be defended, but should be promoted…
To put it as kindly as I can, I won’t take your opinions on who adheres “to the utmost integrity” as meaning a damn thing.
Carrick:
That’s what I expected you meant, but it didn’t come through in your words. Or at least, it didn’t for me. It’s possible there were details/subtleties in your comment I missed.
This may be a memory problem on my part, but I could have sworn the Ljungqvist reconstruction used CPS. My recollection is it uses CPS, but unlike Mann’s version, uses it over the entire period, not just the modern period (where instrumental data is available). This leads it to have some variance deflation but not as much as other CPS implementations. It also means there isn’t too much difference between Ljungqvist’s and Loehle’s methodology.
Am I just thinking of some other paper, perhaps? I’d check for myself, but my internet connection is so spotty tonight I seem to be able to load about one page per ten minutes.
I think I’m in agreement with you on all this. I’m less sure the reconstructions you picked have the best methods.
HR:
Yup. Because the variance in your series gets deflated outside whatever period you match against, you’ll see the effect you described. To make it even worse, you could match against a period covering two peaks, and both of those peaks would wind up being taller than the other two. In fact, you can do the same sort of thing with any shape or pattern.
The strength of the bias introduced depends on all sorts of details, like the level (and type) of noise in the proxies, the exact methodology used, the number of series, etc., but you’ve got the idea down.
Brandon, I wasn’t being entirely clear on Ljungqvist. Robert or anybody else can correct me if I didn’t understand his paper correctly, but it’s my understanding he preselected the proxies. What I am worried about is the use of screening to cull out a subset of the proxies (screening fallacy), and the influence of red-noise on that. I usually think of that as part of CPS, but maybe that’s just my terminology.
No worries on the other… I’m in the middle of analyzing data and trying to finish a report tonight, so as usual I’m distracted while trying to write, and probably am as clear as mud.
What I would recommend if you are interested in pursuing this is to review the method sections of the papers without regards to results, and try and select out the ones you think have the most robust methodology.
I don’t include Mann EIV in with the other series (adding it doesn’t substantially affect the ensemble mean however) even though it passed verification because of problems with proxy selection. It’s interesting that it came as close as it did, either that’s a fluke or a test of the sensitivity of the method against inclusion of bad data (that is absolutely a requirement for a decent reconstruction).
I can explain in each case the reasons why I selected the reconstructions that I did, though any detailed explanation is beyond my limits of energy at the moment and would definitely go back and require mentally reconstructing my reasoning. I don’t claim they are optimal, they are based on somewhat subjective criterion on my part but I liked the fact they were using different methodologies as well as proxies (so if you might expect the systematics would tend to cancel across the methods).
Carrick:
Not a problem. You come back and respond to clarify points of confusion, so even if you’re not clear about something, people can find out what you mean.
It’d be a good idea for me to review the methodology sections of papers I haven’t looked at recently (and find copies of them for storage), but it could lead to wrong-footing on its own. It’s usually hard to tell what overlap there is between the data used in papers just by reading the papers themselves.
Another issue is it’s hard to tell what choices were made in data selection. Even if the methodology used within a paper is completely unbiased, the results can be biased if the authors (consciously or subconsciously) picked data series with a particular pattern. Two reconstructions could have different methodologies and data sets, yet because of subconscious data selection choices, come up with the same, wrong pattern.
For what it’s worth, I believe I’ve read every paper with a (hemispheric or wider) temperature reconstruction going back to 1400 or earlier. Some have seemed better than others, but none have inspired much confidence.*
I think it would be useful to try the same thing with the EIV reconstruction, but having excluded the tree rings/Tiljander series. If it significantly changes your results (which I think it will), it will indicate one of two things. One, including bad data caused better correlation by chance. Two, some sort of bias existed which led to the increased correlation. The latter would be interesting as it could suggest the correlation between the other series is less meaningful.
*I’ve discussed an approach I believe would be a massive improvement for reconstructions a number of times. It would be more work-intensive than the current approaches, but it’d produce results free of the biases one can posit for current reconstructions. It’d also allow for easy sensitivity testing. Given it’s an idea I came up with while still in high school, I’m really unimpressed by the work I see published.
HR (Comment #97662)
Some responses:
1) No perfect match – both data sets are noisy
2) True
3) Yes
4) Yes
5) I don’t think that follows. Firstly, the selection doesn’t reduce high frequency noise. It may not reduce low frequency either – it depends ion instrumental. There is a variance reduction effect, but I don’t think it’s as simple as that.
Your later questions relate to the uniformity principle, which is not statistical, or changed by screening. Divergence is a non-uniformity, and is a negative unless explained. And on variability, it’s easier to see possible upside, though hard to quantify.
Brandon:
Yes I agree this would be a good idea. Whether Mann’s code is at the point one can rerun it without the proxies remains to be seen.
I’ve looked at the proxies from Mann09 by the way, and contrary to some people’s notions of how things work, most of the tree-ring proxies aren’t divergent from 1968-1998 (the latter year is when Mann cut off the series).
Histogram of trends.
Marble plot showing distribution of positive and negative trends 1968-1998.
Just another highlight that you can’t make assumptions about data. You go where the data lead, you don’t lead the data to your pre-conceived ideas of where it should be going. Applying the uniformity principle to observed correlations is silly and misguided.
I decided to review the Ljungqvist paper again, and it turns out my memory had held up right. It does use CPS (over 1000-1900, not the entire period like I thought), but it doesn’t screen proxies based on that (or anything else).
I’m not convinced of the merits of the paper though. Of the 30 series in it, several aren’t publicly available. This wouldn’t be too bad except there is no collected archive of the other series. If a person wants to get a copy of them, they have to track down each series individually. That’s silly. The only redeeming factor is one can find some of the proxies in an archive made for an earlier paper. I was able to find eleven of the series in it, as well as two more which I couldn’t uniquely identify.
In the process of looking for the data, I noticed several series I’m familiar with. One series is the Polar Urals series from Esper 2002, a series with an MWP far exceeding recent temperatures (this Polar Urals version was never used again after that paper). Another series is a “documentary series” from Yang 2002, a series which is actually a reconstruction of winter temperatures based upon five proxies in a paper (available only in Chinese) for which the underlying data isn’t archived. Oddly, Ljungqvist lists it as an annual, not winter, record. Perhaps more strangely, the series also has a far warmer MWP.
A third series I recognized, the Indigirka series, is especially noteworthy. It is taken from Moberg 2005. Moberg used it as one of their seven high frequency proxies, but the series does have a notably warmer MWP. Moberg used this series only for its high frequency signals (he filtered out any signal >80 years). Ljungqvist used the same series, but included the low-frequency data Moberg filtered out. Why do the two papers use the data differently? No explanation is given. Maybe it’s just me, but I think it’s odd for a paper to take data from another paper, use that data differently, compare itself to that paper, and never comment on the difference. That aside, it raises the question of how the difference in the data usage affects the reconstructions. If Moberg’s MWP had been forced higher, it would have been closer to Loehle and further from Ljungqvist, meaning the two reconstructions have some spuriously created correlation due to different handling of the same data.
Oh! Having to think about Moberg again reminded me of something. One of the eleven low frequency proxies used by Moberg was the Yang reconstruction. One of the nine proxies in the Yang reconstruction is the documentary series used as a proxy in Ljungqvist’s reconstruction, a series which is really a reconstruction of five proxies… This means a proxy used by Ljungqvist is a proxy reconstruction used as a proxy in a reconstruction used as a proxy in Moberg. My head hurts.
By the way, if you think Moberg is a decent reconstruction Carrick, you should take a look at the low frequency proxies it uses. There are only eleven, and one of the ones with greatest modern warmth isn’t any sort of real temperature proxy. And the one with the third greatest was modern warmth was the Yang reconstruction, a reconstruction described by Ray Bradley as “crap.” There’s a reason I don’t hold Moberg in any esteem.
Sorry about the last post being lengthy and rambling. I started off with a specific focus, and partway through, I got sidetracked.
Anyway, Carrick you say “most of the tree-ring proxies aren’t divergent.” My question for you is this. What do most of those tree ring proxies show? Whether or not many of them show a divergence isn’t very important. What matters far more is which ones show a divergence. The divergence problem is known not to affect all tree ring proxies, but the key to it has always been it affected key proxies.
And now I’ll go back to trying to track down data for the Ljungqvist reconstruction. I don’t think one-third of the data is enough to test it, especially when there’s a chance the data I have isn’t really random (because of which series I recognized).
Oh, I need to correct something I said earlier. I made a dismissive remark to Robert earlier, and it was (somewhat) unjustified. I had said D’Arrigo refused to archive certain data with the “making cherry pie” approach, but that was really something done by another person (Jacoby). I mixed up the two because the approaches are so similar, but D’Arrigo’s is notably less problematic (at least the data still exists).
I really have no idea how anyone tries to justify cherry picking. By definition, it’s wrong.
Volcanoes add about 24 Tg of sulfur to the atmosphere each year. Man’s activities add about 79 Tg sulfur to the atmosphere each year.
Adding >100 Tg of H2SO4 into the atmosphere is an overall increase of a factor of four.
However, big eruptions fling this into the upper atmosphere where things get interesting. The residence time of sulfate aerosol in the troposphere is about a week, while the residence half-life of stratospheric sulfur aerosols is about one year.
The of the 24 Tg of volcanic sulfur that goes into the atmosphere each year only about 4 Tg ends up in the stratosphere.
A big eruption could give up to a 25 fold increase in stratospheric sulfur aerosols.
The Kuwae event stands out in the Law dome as a huge, single, sulphate spike, whereas the a group of smaller eruptions from 1810 to 1840 gave a broad increase in aerosols.
Thus, you should have a nasty spike about 1460 and a long lived dip from 1810-1840.
If you have a proxy with no event in 1460 and no 25 year cooling in the 1810-1840 region, then something is odd.
http://digitalcommons.library.umaine.edu/cgi/viewcontent.cgi?article=1125&context=ers_facpub
Volcanoes add about 24 Tg of sulfur to the atmosphere each year. Man’s activities add about 79 Tg sulfur to the atmosphere each year.
Adding >100 Tg of H2SO4 into the atmosphere is an overall increase of a factor of four.
However, big eruptions fling this into the upper atmosphere where things get interesting. The residence time of sulfate aerosol in the troposphere is about a week, while the residence half-life of stratospheric sulfur aerosols is about one year.
The of the 24 Tg of volcanic sulfur that goes into the atmosphere each year only about 4 Tg ends up in the stratosphere.
A big eruption could give up to a 25 fold increase in stratospheric sulfur aerosols.
The Kuwae event stands out in the Law dome as a huge, single, sulphate spike, whereas the a group of smaller eruptions from 1810 to 1840 gave a broad increase in aerosols.
Thus, you should have a nasty spike about 1460 and a long lived dip from 1810-1840.
If you have a proxy with no event in 1460 and no 25 year cooling in the 1810-1840 region, then something is odd.
.
I think there’s a misunderstanding here. The “uniformity principle” in paleoclimate is an assumption that *all* reconstructions must somehow make. If you don’t assume that your proxies behaved roughly similarly in the past as they do in the present, then you can’t reconstruct anything!
Jacoby and D’Arrigo are long-time co-authors. The work “done by Jacoby” was done by Jacoby and D’Arrigo.
Much of the HS-ness in Yang comes from using Dunde in a 50-year smoothed form, which is then interpolated to decadal and scaled.
One of the main HS proxies in Moberg is the G. bulloides series from the Arabian Sea. G Bulloides is a proxy for polar/cold water and its proportion increases in the 20th century. Moberg teleconnected this increase to NH temperature and thereby flipped the series. An increase in cold water in the Arabian Gulf is one of the key evidences of global warming.
DocM –
Thanks for the context on the Kuwae eruption. It’s interesting that your reference gives an “ice date” of 1459.5, whereas historical sources indicate 1453. I wonder if this indicates a general inaccuracy in dating, and if this is unique to ice cores, or whether the coral or tree ring series encounter this difficulty as well. It might account for a certain amount of incoherence between proxies.
.
I’ve posted the Gergis reconstruction for 1430 to 1480 and my attempt at calibrated proxies for that period (calibration over the same interval chosen by Gergis, viz. 1921-1990).
.
Nothing leaps out at me.
Re: toto (Jun 14 08:07),
Exactly! The problem with tree rings,though, is that there is no good reason to believe the assumption is valid and a lot of good reasons to believe it isn’t.
Instead of screening proxies it might make more sense to sort them, and not with respect to their correlations with temps but with each other. That way, what you are finding is which ones have common signals. Of course, proxies that given inverse correlations to those expected should be marked as dubious as to whether they belong to the same “group” as their shape would suggest.
Carrick (Comment #97673)
June 14th, 2012 at 1:57 am
Carrick, I read with interest your histogram of trends from Mann (09) and Marble plot showing the locations of proxies with negative and positive trends. A couple of questions:
I am most familiar with Mann(08). Are you sure these proxies are from Mann(09) and if so how do they differ from those in Mann(08)?
Were these proxies all used in Mann (09) or were they part of a larger population from which the final candidates were selected?
My simple minded response to what you have shown is that looking at a large number of proxies shows that trend matching with the instrumental temperature series would not be a difficult task for those prone to selection based on that matching. Also the fact that one can find proxies in proximate localities with negative and positive trends tends to rule out the negative/positive aspect being due to location.
The bias of the histogram towards the positive trend would be understandable if the selection of this population of proxies was selected based on trend matching with the instrumental record or some aspect related to that trend.
If the lower frequency response of proxies could be more or less reproduced with an ARIMA model one would expect to see series ending trends that had positive, negative and no significant trends.
toto:
That’s a misunderstanding too.
If you have a basis built on physical law for why the proxy behaves a certain way (e.g., dO18 in ice cores), you can assume the same physical laws applied in 1000 AD that apply now. That doesn’t mean you assume that e.g. that the frequency content is the same now as it was then. The proxy doesn’t have to behave the same, the physical laws that govern its behavior do.
So if you have a proxy that is known to respond in principle to multiple stimulus like tree-rings that respond approximately in this order to changes in stimuli: precipitation, sunlight, fertilization then temperature, you either need a model for why it’s “temperature limited” and a meta-data based argument for why it should remain temperature limited, and you select the proxies based on that argument. This isn’t just uniformity that you’ve invoked here, though, you’ve also had to assume that your arguments are valid over the time period, that has nothing to do with uniformity, that has to do with whether you understand the conditions under which the tree grew over the period of the tree-ring measurement.
See the difference?
If you simply find the existence of a correlation between temperature and and tree-rings and don’t even have a basis for arguing that temperature is the controlling physical variable, then there is no basis for claiming that this will continue to hold for periods prior to where the instrumental temperature record held.
The uniformity principle has nothing to do with this, since it doesn’t protect you against selecting a tree-ring series that has long-period serial correlation that just happens to look like its responding to temperature over the period of the calibration, but you’re just seeing noise that is unassociated with temperature.
Which gets back to the point of the hockey-stick-o-matic and “correlation does not imply causation”.
Brandon:
He can’t publicly release them, but if you write him, he will send you the series on condition that you don’t disseminate them further.
I think it’s very possible to be so critical one never accomplishes anything. You have to distinguish the possibility of problems with proof of problems. The Yang series is an example of that. Bradley calling it “crap” doesn’t make it crap, it’s just somebody’s opinion. Who was coauthor with Mann and may have reasons that have nothing do with science for being dismissive about that series. Even including one series that isn’t a temperature proxy (you’ve added noise) doesn’t mean you have to throw away the entire reconstruction. It could be this series gets such a low weight that removing it has no effect, in which case the criticism itself is overblown.
So what you do is eventually is work on plausibility, then you have to use objective methods for testing validity (as I did with the three series). The fact that these three series pass a proposed validity test doesn’t “prove” validity, it merely shows they are consistent with the assumption they are temperature proxies. I think that’s the way science progresses.
Everything in science has warts. If you get stuck on the warts you never make any progress.
Kenneth Fritsch,
I’m sure they are from Mann 09 (you can get them in the directory /pub/data/paleo/contributions_by_author/mann2009b/ on http://ftp.ncdc.noaa.gov or via Mann’s website at http://www.meteo.psu.edu/~mann/supplements/MultiproxySpatial09/).
I didn’t look to see how they were different from Mann08, sorry.
I believe these are all proxies. I don’t know which one’s he selected from for his series.
Except I’m looking at the proxies over the period for which supposedly a divergence problem exists. That is the proxies are “supposed to” anti-correlate with temperature.
But it should have an equal number of positive and negative trends. And there shouldn’t be a strong geographic correlation between trend sign and location. (The marble plot tells us this isn’t just noise.)
Steve McIntyre:
Sorry it took me a minute to understand you were being sarcastic. You’re saying the proxy shows cooling based on the assumed physical mechanism for how that proxy responds to temperature, right?
So what happens to Moberg without that proxy? Glitches and warts in data happen. It doesn’t make everything come to a screeching halt. If your plane never took off until everything was perfect you’d never get in the air. Even systems that have 0 tolerance for failure can be glitch tolerant (one’d argue they have to be glitch tolerant).
“Glitches and warts in data happen”
True. Don’t know how helpful this is though. It’s like saying politicians will be bad or something like that. Kinda dismissive of the problem.
Andrew
Steve McIntyre:
Thanks for that correction. I thought there was a reason I associated D’Arrigo with that, but when I did a quick check, I only found comments about Jacoby and it. I figured my memory was just mistaken.
That reminds me, almost half of Ljungqvist’s proxies are interpolated, then later smoothed. It strikes me as odd.
Carrick:
It’s also very possible for a field to consistently claim to be able to get answers when no answers are actually possible. Or, for the field to consistently do a bad job, and thus not find the answers which are out there.
When someone on the Team thinks a series which shows a decent hockey stick is “crap,” it’s a pretty good sign the series has issues (and boy, does it have issues). As for getting “a low weight,” that’s a defense which has been actually used by the Team, for this very series (in a different paper). It’s a bogus one if you actually take the time ot examine it.
That’s something worth emphasizing. I don’t simply look, see a potential problem, and write off a paper. I do put effort into examining things. Moberg’s paper uses only 11 series for long-term signals. There isn’t enough there to be robust to problems, especially when those problems are in multiple proxies.
You may think “it sounds like nonsense,” but it’s fairly easy to verify what McIntyre said. Look at the proxies listed in Moberg. The eleventh is Globigerina bulloides from the Arabian Sea. A little research into those will show they’re as McIntyre says.
By the way Carrick, you’ve made a lot of comments suggesting the issues raised with Moberg’s reconstruction may not matter much. The most disturbing one is you suggested the Yang series might not have been given much weight. There are only eleven proxies in the reconstruction. I don’t know how you expect to say removing one won’t matter much.
Moreover, Moberg’s Table 1 lists the % contribution of each of his eleven low-frequency proxies to the total variance of his reconstruction. The Yang series I mentioned contributes the most, and the series McIntyre refers to contributes the third most. They’re obviously important.
Brandon, I was just asking how much you looked at Yang and whether you had worked out what weight it carries. If that disturbing to you, then well…
Carrick:
It’s disturbing to me not because you’re questioning what I know, but because you’re demonstrating what you don’t know. You place (some measure of) faith in the Moberg reconstruction, and to me, that means you should have a decent working knowledge of it. That means I’ll be disturbed when you say something obviously wrong, especially when it’s directly contradicted by the only table in the paper.
I have no problem with people asking me for more information/detail. It’s a little annoying when they ask through assertions, rather than questions, but that’s it.
Carrick, your point about tree rings and a multiplicity of factors is well argued.
However, this doesn’t give the corals a clear run at being the ‘best’ biotic thermometers.
Corals are VERY pH sensitive, and volcanoes size/distance is going to play merry hell with deconvelouting them.
Not only will a, nearby, source of SO2 alter their growth it will alter the aquatic Ca/Sr levels as SrSO4 is far less soluble than CaSO4.
I am pretty skeptical about biotic proxies in general. Life is complex, and unlikely to follow a single variable in time.
This may be a pretty good reason to cross correlate proxies: if a biotic proxy agrees very well with a non-biotic one that’s reasonably reliable, then it may be reasonable to think they share a common signal.
Carrick (Comment #97694)
June 14th, 2012 at 9:25 am
“But it should have an equal number of positive and negative trends. And there shouldn’t be a strong geographic correlation between trend sign and location. (The marble plot tells us this isn’t just noise.)”
Carrick perhaps I am not seeing what you are seeing in the Marble plot but I do not see a strong geographical correlation with trend sign.
I asked about where this population originates to determine whether we might be looking at a pre-pre-selection that is already biasing the trend towards positive.
I have always been under the impression that the divergent proxies were a lesser part of the overall proxies available and that divergence could simply be a matter of the chance proxy low frequency response trending downward at some point.
I would suppose we should also be looking at negative and positive trends that are statistically significant and then call the remainder neutral. It will not change the bias you show.
Carrick is the Marble plot of the same data as the histogram. The reason I asked is because the Marble Plot is labeled Mann(08).
Brandon:
Look I just asked a question… I told you I was working on other projects and hadn’t looked at Moberg in months, or longer. I didn’t remember, and I didn’t have time to look. You’re the one who raised the issue, it shouldn’t be my responsibility to go back and facts check for you, should it?
Which question did I ask via assertion? I think you’re reading to much into what I meant as a reasonable line of inquiry. I get the impression from interaction with you that you actually don’t like being asked questions. Perhaps that’s my turn to read too much into things.
Kenneth, they are the same data. Sounds like I mislabeled. Thanks, I’ll fix the label when I get a chance.
The second question that one should raise is what does correlation versus temperature look like, and which temperature should you use?
I am thinking the Hann-weighted averaged temperature in a 1000-km circle to correlate against, but only the “growing months” for the proxy (eg. JJA)
I ask this without having tried anything. I haven’t tried any correlations against temperature on this data yet,so this is a purely methodological question. (But we would all expect global mean temperature and zonally averaged temperature to correlate well. The question is what is the right geographical and temporal averaging to use.)
Single stations is a mistake, because short-period noise wouldn’t necessarily correlate that well with tree-ring growth.
Carrick:
This is the second time you’ve said this. I was trying to be generous about it, but… where exactly do you think you asked a question? None of your responses to me contained a question mark. Everything you said was said with declaratory statements. I’m not convinced you actually asked any question, but if you did, it was asked through assertion.
I like my questions to be worded in the form of a question, and perhaps even have question marks. Failing that, I like the statements they’re in to at least show some effort to consider a possibility beyond just what the statements say. That tells me what I’m supposed to discuss when I respond.
But I’ve reread this comment a few times, and I can’t see anything indicating you wanted more information from me. There were no questions, no leading statements, no anything. As far as I can see, it was just a laundry list of ways in which I might be wrong.
That’s a far cry from just asking a question.
.
Actually a little research would show that they’re a highly-respected proxy for monsoon strength. Although based on local surface cooling (and nutrients flow), they are thought to measure upwelling and south-west wind strength pretty robustly. E.g. they actually increased after the last glaciation, suggesting that they aren’t a dumb “coldness” proxy.
.
They (and monsoon strength in general) have been associated with higher temperatures and lower snow cover in Europe and Asia. Including the North Atlantic, but also East Asia and the Tibetan Plateau (whose coldness is supposed to be a major driver of monsoon).
.
You can get all this info from the papers cited by Moberg, including the one quoted by SteveMc at CA.
.
Remember, it’s always a good idea to trust Steve McIntyre, after you’ve filled in the blanks.
http://climateaudit.org/2012/06/10/more-on-screening-in-gergis-et-al-2012/#comment-337767
Uhhmm… there are respectable mathematical arguments why you should not ‘calibrate’ using the long term trend.
Carrick (Comment #97724)
June 14th, 2012 at 2:27 pm
Carrick, I reviewed the Mann (09) paper and SI and it uses all the data from Mann (08) except the 71 instrumental Luterbacher proxies. Mann (08) had 900 plus TRW proxies and the selection process (correlation p value=<0.13) cut that done to around 300 as I recall. I think it is important for your data used in the Marble plot that we know whether that is from main population or the screened population. It would appear that the number of proxies in your Marble plot is closer to 300 than 900.
Brandon:
It’s helpful when you are criticizing somebody to point to exactly what you were talking about, so thanks that is helpful in focusing the issue.
Having said that, you’ve totally lost me now as to what horrible transgression I must have committed.
Which of these comments are you complaining about:
1) Ljungqvist is responsive to requests for his data even if he can’t
publicly release it.
2) “I think it’s very possible to be so critical one never accomplishes anything. You have to distinguish the possibility of problems with proof of problems. ” There’s a problem with this statement? What?
3) Bradley is not a reliable source for determining whether Yang is “crap”.
4) ” Even including one series that isn’t a temperature proxy (you’ve added noise) doesn’t mean you have to throw away the entire reconstruction.” That statement is true, it’s called robustness. I deal with that on a daily basis, where you have some samples that don’t belong to the distribution you’re trying to measure.
5) ” It could be this series gets such a low weight that removing it has no effect, in which case the criticism itself is overblown.” Because it starts with “It could be…” this was clearly speculation on my part. You responded to that with a somewhat bizarre comment about that speculation being “disturbing” because I suppose I didn’t have that table memorized. WTF!? I’m supposed to be remembering the details of a paper I read months ago in the middle preparing a Q2 report?
Then I go on to say:
Are there problems with that? This addresses the issue of “external validity.”
That’s true isn’t it?
I use the analogy of the person crossing a meadow who encounters a solitary tree in the middle of the meadow, and having encountered the solitary tree, sits down, as clearly the way forward is blocked. “The perfect is the enemy of the good.”
Just complaining because somebody didn’t do X Y or Z doesn’t invalidate what they did unless you can show (and yes it is your responsibility if you want to bring up the issue to show it) that not doing X Y or Z invalidates their study.
McIntyre complains about his criticisms not always being taken seriously, and this is the fundamental flaw in many of them. If you’re going to raise an objection you have to show it matters.
Carrick, Mann (08) used a one sided r test so his selection process of p value=<0.13 for TRW proxies would not have returned any proxies with a negative r. The question is whether a proxy with a positive r with temperature could return a negative trend. The screening test results are summarized below giving the average correlation (r) over the indicated time periods and the number of TRW proxies that passed the test:
Pass-screening over 1850-1995( r ) =0.1965 (258)
Pass-screening over 1896-1995( r )= 0.2311 (239)
Pass-screening over 1850-1949( r )= 0.2283 (218)
The number of TRW proxies on which the screening was performed was 927.
I am a little more confused now about what your data represents in the histogram and Marble plot.
I have the data from Mann (08) so I can do my own histogram and geographic plots when time permits..
Kenneth, there are 927 tree-ring proxies in that set, 664 red dots and 263 blue dots and 1207 proxies total. Does that make it 08 or 09???
I carelessly didn’t save the url when I was playing around with these. Now I’m not sure whether it’s 08 or 09. My mind is going.
If you want to play around with my processed reuslts it’s located here in csv format
Kenneth:
They are the trends of the proxy index with time over the interval 1968-1998 inclusive. The marble plot just shows tree-ring sites with a positive (red) or negative (blue) slope.
Carrick, I’m confused. You say:
What are you talking about? What in any of my comments indicates I think you’re guilty of a “horrible transgression”? All I said was it seemed you posted a laundry list of ways in which I might be wrong, not a question like you claimed. That’s not accusing you of a horrible transgression.
As for the laundry list I referred to, I never said it was bad. I never said it was wrong. I just said it was a laundry list because it seemed like a tedious list of concerns (though I agree every one of them is worth considering). That’s not accusing you of a horrible transgression.
As for my “somewhat bizarre comment,” I found it disturbing to hear you say things which indicated you knew/remembered far less about the Moberg reconstruction than I thought you did. You may think it’s wrong for me to be disturbed by that, but it wasn’t accusing you of a horrible transgression.
In my last comment, I specifically asked you, “[W]here exactly do you think you asked a question?” I didn’t get a response, but I’d like to try a different question now. Could you tell me what makes you think I think you committed a “horrible transgression”?
Carrick:
I wonder if you may have misunderstood my intent. What I had said was this:
My intention here wasn’t to give some exhaustive review of the Moberg reconstruction. I wasn’t trying to prove it wrong. I was just giving a couple examples to give you a reason to examine the proxies. I figured if I raised some flags it would help spark your curiosity.
In retrospect, my last sentence does seem to imply the examples meant more than just that. That wasn’t intended. I was just intending to indicate I’ve studied the reconstruction, and I found good reason to doubt it (possibly beyond what I mentioned). Since you use the Moberg reconstruction in your comparison, I thought it’d be something you ought to hear.
I think there was some mutual misunderstandings here and I apologize for my contribution to that. I was interested in your comments, sorry if that didn’t come out clearer.
Carrick:
Not a problem. I probably contributed plenty to the confusion.
Anyway, my approach to studying reconstructions (and most other things) is fairly simple. The first thing I do is read the paper and try to understand, in a general sense, the approach used. The second thing I do is look at the data used. The third thing I do is examine how the data is handled. For Ljungqvist, I’m still on the second step (the first passed without any red flags), so I’ll just discuss Moberg.
For Moberg, the initial reading tells me low-frequency data is explicitly separated from the high-frequency data, meaning we can (for this particular examination) discard the seven high-frequency proxies. This leaves us with eleven proxies. Given how small a data set this is, each individual proxy can be examined, and doing so gives us a far better understanding of the reconstruction than just studying yet another methodology. No matter the merits or flaws of the methodology, we can know what is reasonable just by looking at the data.
To make things simple, we can look at a visual display of the eleven proxies. When we do, we see two clear hockey stick shaped proxies (#1 and #11). A third series (#10) has the sharp uptick at the end but is less flat in the earlier portions. Without a sharp uptick at the end of some proxies, Moberg’s results obviously cannot hold up. Since these three are the proxies closest to giving anomalous warmth in modern times (no others have a sharp uptick), they become the three most important proxies to look at.
When we do, we see problems. As discussed by Steve McIntyre, #11 isn’t any sort of direct temperature proxy. At best, it’s a wishy-washy indirect proxy (I can give a detailed explanation if one is needed). For #10, we have Ray Bradley, a coauthor on Michael Mann’s original hockey stick, calling the proxy “crap” when it was used by Mann himself. His word obviously isn’t enough, but his explanation is one anyone can verify. Like Bradley, I am especially amused by the authors of the series saying:
The authors of #10 basically say four of the nine (almost half!) proxies they use have “undoubtedly been verified” as temperature proxies even though they can’t give any sort of source for such. That should be enough to immediately make one discard the proxy. This means the use of both #10 and #11 as temperature proxies is unjustified. When we look at Moberg’s Table 1, we see these two are said to be responsible for 27.2% of the total variance (14.5 and 12.7). That, combined with the shape of the problematic series, is enough to indicate a substantial issue with the reconstruction.
It is worth pointing out Moberg tests the effect of removing any one proxy from his reconstruction, but he doesn’t do a similar test for removing two. It is nearly impossible to imagine removing two out of the three proxies with a sharp uptick would have a negligible impact, especially given the variance calculations Moberg provides. Even if the effect was negligible with Moberg’s method, it’d mean the entirety of the uptick would depend upon a single proxy. That would be no better than if the effect was non-negligible (it would arguably be worse).
Short of replicating Moberg’s approach, I don’t think I can prove his reconstruction is “wrong.” However, I don’t think anyone can offer much reason to believe it is right either. If one accepts my statements on the two series I discussed, I don’t think it is reasonable to have any notable confidence in the Moberg reconstruction. And if not, I don’t think it means anything for another reconstruction to be similar to it.
Oh, I forgot to mention something extremely important earlier, and I failed to realize the oversight until now. Comparing the Loehle and Moberg reconstructions is a very iffy thing to do. As mentioned in one of the links in my last comment, eight out of the eleven low-frequency proxies used by Moberg were also used by Loehle. That’s part of why I haven’t discussed Loehle’s reconstruction. It could be used as a weak sensitivity test for Moberg, but it cannot be considered any sort of independent confirmation.
Things like that are why I dislike saying similar results indicate anything. There are so many non-meaningful ways for series to agree with each other it is easy to miss some.
In Control System Theory or in fact any signal processing, one instrument is fundamental and that is a frequency/transfer response analyser. With this instrument white noise is injected into a circuit and calculates the response of the circuit or electro-mechanical sytem to this signal input. White as it does consists of all frequencies with equal weight, consequently the output is the input correlated with the transfer function of the system under test.
Now if one knows that a carrier signal is present immersed within a lot of white noise, then a filter with a transfer function of a band pass filter will enhance the signal to noise ratio and enable clock recovery and hence data recovery to be undertaken.
Either way Climate Science “believes” a signal to be present hence applies a band/high/low pass filter et voila, a signal is found.
One further point from control system theory or in this case sampling theory and that concerns Nyquist, the essentials of sampling at twice the highest frequency component before applying any processing. Since no one knows the highest spatial frequency over all time for weather/temperature variations then any prognosis to be drawn from the data sets is entirely spurious.
Here endeth the lesson.
Re: Steven Mosher (Jun 13 17:01),
Oh yes, Phil Jones was able to determine the entire Southern Hemisphere temperature with just one thermometer in 1850. Isn’t climate science great?
Carrick (Comment #97751)
Carrick, I think your counts clear this issue up. There must be a number of proxy locations that are near on top one another making my estimate incorrect. What you have shown in both the histogram and Marble plot are the 927 TRW proxies used in both Mann (08) and (09).
Mann (08) used hemispheric temperatures and Mann (09) used grid temperatures in the reconstructions.
I need to determine how much of the proxy series where filled-in at the end as I do not think all the proxy data were complete through 1998.
Carrick, I took another look at the TRW proxy data from Mann (08) and found that most ended before 1998 and many well before. The in-filling method is given in the Mann (80) SI and is excerpted below. In using the 1968-1998 time interval for calculating TRW trends you are not using individual TRW proxy data over the entire time period but rather in-filled data in many cases. I think that point is important to note with your histogram and Marble plots. Much of the in-filling would be in the period that sees much of the divergence problem. With Mann the devil is in the details and those details are difficult sometimes to find.
“The RegEM algorithm of Schneider (9) was used to estimate missing values for proxy series Mann et al. http://www.pnas.org/cgi/content/short/0805721105 terminating before the 1995 calibration interval endpoint, based on their mutual covariance with the other available proxy data over the full 1850–1995 calibration interval. No instrumental or historical (i.e., Luterbacher et al.) data were used in this procedure.”
I have wanted to do an analysis of the proxy trends from Mann (08) and this might just be the excuse I need to do it.