Screening Fallacy: More ways to mislead.

I’m going to post yet another synthetic example to explain the sorts of problems that can arise if you screen. In this example I will create a temperature series and set of proxies that do exhibit a correlation with temperature. The correlation coefficient between the proxies and the temperature will be R=0.25~ 0.14. I will then show an example of what is I think the general case which is screening can lead to misleading results while not screening will give more or less the correct results. Without further ado, let me begin.

First, I will assume that over 1000 ‘years’ temperature is the sum of a sinusoidal oscillation and a piecewise linear function that is 0 for the first 900 years and begins to decline linearly with time. That is: For some reason a cooling trend has begun, and that is superimposed on on oscillation. The synthetic “real” temperature is shown in black below, the oscillation is blue and the piecewise linear function is red:

I then create a “shitwad” (i.e. 1000) 1000 ‘year’ long proxies that are the sum of the real temperature and white noise. I will select parameters so the correlation between the proxy ‘ring width’ and the temperature is R=0.250.15. This choice is entirely arbitrary. But it parallels the idea that one might seek out ‘microclimates’ where trees ring width (or density or whatevever is used) is more correlated to temperatures than in other ‘microclimates’. So, a phenomenologically based screening has already been done and I am now going to apply the mathematical correlation based screening– which is the bit that introduced “the problem”.

I will then create two ‘reconstructions’ of the temperature based on the proxies, using two different methods:

  1. Just average over all the proxies: This will be shown in purple.
  2. Assume that we only know the temperature during the final 10050 out of 1000 ‘years’, compute the correlation between temperature and ring width during those final 10050 years only and then select the 44% (i.e. 27/62 would match Gergis et al.) proxies that match “best” during that period. (I will do this one sided– no ‘upside down’ will be used.) After selecting these, I will average to create a ‘reconstruction.

The two “reconstructions” are superimposed on the “known” historic temperature below:

Using the “eyeball” method, you can see two important features:

  1. Both reconstructions have lower amplitudes than the true oscillations. This is simply what happens when I average noise into the signal. Other techniques would be required to try to correct for this amplitude adjustment. This effect is equally bad for both the screened and unscreened methods.
  2. the “screened” reconstruction (green) incorrectly indicates that the final temperature is higher than any historic temperature: This is wrong. In reality, the final temperature is lower than previous peaks (and in fact, when that oscillation peaks it will be cooler than all other peaks in the entire historic record.)

There is a third feature that cannot be detected using the “eyeball” method– but even if we were to re-inflate to capture the amplitude of the signal, the “screened” method tends to be noisier. This is because the screened method is based on only 44% of the data. By screening we threw away 56% of the data– all of which was actually just as good as the data we kept– it just happened that the specific features of the noise during that period resulted in a lower than usual sample correlation between temperature and and “ring width” during the calibration period.

Now, to see just how misleading this is, let’s look at how the “reconstructions” would look if we remove the ‘known’ temperature data outside the calibration period:

How might someone who thinks the mathemagical screeing is the way to improve the signal to noise ratio interpret this graph? Well, they might conclude that

  1. The screened trees are “tree-nometers” and follow the signal better than the purple ones.
  2. The green trees say recent temperatures are higher than they have ever been in the past!
  3. The current uptick– which is all the available data– has now been show to be the hottest in all history.

As we saw, the third conclusion happened to be wrong for this particular set of synthetic data. Even though the green mathemagically-screened series is higher than ever, if you want to compare current temperature to past temperatures, you need to stick to the purple unscreened series.

Mind you: someone might point out that we get the same past with both methods. You can learn that temperature oscillated in the past. But if we chose these trees based on some selection criteria like “good microclimate” and we actually believe that microclimate makes the trees in the batch exhibit a good corelation, there is little or no advantage to screening.

In the case shown, it happens that the mathematically screened to select “treenometers” method will be noisier and so is it is a poorer method for creating a reconstruction of past temperature. If you limit interpreting what you found to describing what might have happened before the thermometer record, and provide a suitable buffer period prior to the calibration period to account for any temporal autocorrelation that might exist in the “noise” the screening doesn’t necessarily bias the results in the far past. However, one things you absolutely cannot do is conclude that the current temperatures are higher than in the remote past based on the observation that the final years in the mathemagical screened “green” reconstruction shows a peak that exceeds previous years.

Any such conclusion would be an example of “the screening fallacy.”

Update: To highlight the “noise” issue, I used a stronger screen for the ‘green’ reconstruction. In this case, I picked the 5% of trees with the strongest correlation:

130 thoughts on “Screening Fallacy: More ways to mislead.”

  1. Nice write up. I just hope you didn’t do this for Nick’s benefit, as he’s being willfully obtuse on the matter.

    All you have to see is this gem:
    It’s much simpler. You just measure correlation, and find a linear relation. You don’t have to assume a cause or physical model.

    http://climateaudit.org/2012/06/10/more-on-screening-in-gergis-et-al-2012/#comment-338164

    …To know that he’s completely lost the plot on this. We don’t need a physical basis for proxies, just correlation! Inverse number of pirates to prove global warming is back in play!!

  2. lucia, would it be possible for you to show what would happen if you forced the variation in the reconstructions to match up with the variation in the “real” temperature record? It’s basically an expansion on your first point about variance deflation, and I think it’d be interesting to see. I’d do it myself, but you already have most of the code necessary, so it seems silly to replicate it.

    (You could also just provide the code you used, and I’d add to it.)

  3. It also might be interesting to ask these folks who defend screening: which proxy is better, one that accurately portrays temperature through the past but not in the 20th century, or one that does not accurately portray temperature in the past but is selected to do so in the 20th century? And worst yet, through the screen methods we’ve seen, -how would you even know the difference-?

    If this were an episode of Mythbusters, I think we’d be able to say from Lucia’s conclusive work that this “screening is valid” myth is totally busted.

  4. lucia, would it be possible for you to show what would happen if you forced the variation in the reconstructions to match up with the variation in the “real” temperature record

    Do you mean scale so the rms in the reconstructions matches the “real” temperature? It’s easy to do if that’s what you are asking.

    I’m going to tweak to make the noise red too. You’ll notice in the graph above, the shape is pretty good outside the calibration period. But if we make the noise red, we can screw up the shape in the reconstruction for a period prior to the calibration period. (How long will depend on the autocorrelation for in the noise.)

    hmm…. I have a typo. My trees have a lower correlation coefficient than stated. It looks like R~0.15, not 0.25. It doesn’t affect the principle– I get similar results for broad range of choices. But… I’m going to fix the function. (I wrote it more generally so I can move on to other flawed arguments Nick seems to be advancing in comments.)

  5. Lucia,
    What happens if you push the correlations to 0.8. This is the order of magnitude for high frequencies densities.

  6. Phi–
    For all |R|=1 qualitative effect of screening is as shown.

    But the quantitative effect is lesser when the ‘treenometers’ are more responsive to temperature. So if R=0.8, the difference between the purple and green traces will look more similar to each other, so we get a smaller tendency to have an excess blade at the end. Also, the amplitude of the reconstruction will begin to approach reality as R approaches 1.

    But there is still no particular advantage to screening based on correlation. (Or, at least if you are going to pick based on some other method– like micro-climate, you should assume a tree “is good” as a null hypothesis and only screen out a tree if the correlation positively rejects the null. This type of should result in fewer than 5% of trees from the ‘good microclimate’ excluded– otherwise your claim that microclimate results in temperature sensitive trees is probably a bad claim!)

  7. Phi

    The quote from Gergis at climate audit reads

    Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

    If I’m interpreting what they said, and they used p two sided, this would be about
    2/sqrt(1990-1921) = 0.2407717 for the cut off. They only kept about 1/2. of samples. I can’t imagine that the average correlations were 0.8. Their correlations are probably closer to 0.20-0.25 But my R=0.17, so mine may be an exaggeration. (The reason I only say ‘may’ is mine only contains one problem– not several! Anyway, this is ‘toy’ math.)

  8. Yeah lucia, that’s what I mean. I think it’d be interesting to see what happens when you try to correct for the deflated variance by matching the variance in the series to the variance in the “real” temperature record.

    It might also be worth showing what happens if you do that using just the period of overlap as opposed to using the variance of the entire series.

    Oh, and to be clear, I’m meaning this to be done with the “known” part of the temperature record.

  9. Brandon– You can kind of see that by eye. I’ll send you the code as it currently exists. That way you can do what you like. It’s skanky…

  10. Lucia,

    I asked you this question because I don’t think screening is sufficient to carve hockey sticks. Sticks are primarily the result of affixing instrumental data to proxies. Serious proxies don’t have blades. If you calculate the correlations based on detrended data (as proposed by Gergis), and letting an high limit for acceptance (eg r> = 0.6), you will not get a good stick without adding instrumental data. It is true that, unless I am mistaken, with such a limit, Gergis would not have material to publish. But locally, there are many proxies that exceed this limit (MXD, galciers melting anomalies, snow).

    If screening produces hockey sticks is that calibration data are bad and that this process eliminates the good proxies to keep only bad ones which are going in all directions.

  11. “If screening produces hockey sticks is that calibration data are bad and that this process eliminates the good proxies to keep only bad ones which are going in all directions.”

    Exactly.

    “It is true that, unless I am mistaken, with such a limit, Gergis would not have material to publish.”

    Also exactly. Hence why it was found that doing that method, many of Gregis et al.’s proxies were actually not statistically significant and should have been rejected by the methods. Hence why the paper is “on hold”; there isn’t enough to publish, and the stated methods were not even followed in what was.

  12. Lucia,
    You said I wouldn’t like it, but I do. In fact, I was planning to do the same myself.

    Your second plot shows exactly what I (and toto) have been saying. Selection alters the behaviour in the training period. It must, and that means the proxy estimate is not an independent estimate of temperature there. But it does not alter the “shaft”; the part before 900 (once autocorrelation has worn off).

    Your point 2, is that the difference in the training period seem to go up and not down. This is unexpected to me at the moment (it’s 6am here), but a side issue. I see some odd things there – your red HS in Fig 1 does not seem to turn at 900 but around 960. But anyway, my contention again is that you do not rely on the proxy temp in this region. It’s the shaft that counts.

    Your third point was that it is noisier, because you are using less data. Yes, of course. But that’s an unfair comparison because you have required that the other proxies are as good so you can use them without penalty and get less noise. In Gergis’ case you can’t use them at all, because you can’t calibrate them.

    But overall, where’s the fallacy? You got essentially the same result in the training period. A predictable variation in noise. And an oddity at the end, which I don’t entirely believe, but is in the period that I have said from the start is not the result you should be looking for from proxies.

  13. Lucia (#97861) –
    My take on Gergis is that they calculated the Spearman correlation coefficient (r) between the proxy and the regional instrumental temperature. Converted that to a t value by
    t=r*sqrt( (N-2)/(1-r*r) )
    where N=70. Screened proxies satisfied |t| >=2 (2-sided). Working backward, that’s |r| >= 0.2357. Very close to your values. [Reasoning is here.]

    The average |r| — here I’m using the “regular” Pearson correlation coefficient — for the screened proxies (vs. regional instrumental temperature) is 0.40. Max 0.61, Min 0.22.

  14. lucia, thanks for sending the code. It probably would be easier for me to just test whatever I’m interested in myself. I’ve just glanced over it so far, but I noticed a problem. Your post says your “known” period covers 100 years, but your code sets it to 50. The line:

    N_start=floor(N-Period_temp/2)

    Shouldn’t have /2 in it if you want want to cover 100 years. Personally, I’d just leave it as is and change the numbers in your post.

  15. I like it too, for many of the same reasons Nick said. Nick, the reason it goes up during the training period is that you’re correlating on the actual sum of the two signals, so it should follow that sum more closely (and it does).

    From what I can see, the procedure recovered the oscillating signal, and the non-oscillating signal is entirely in the training period and therefore is not a target. What wasn’t recovered was the full amplitude of the oscillating signal, but I’ll bet my buttons that if Lucia had included error bars, the full signal would have been included easily.

    So like Nick, I don’t see an issue here.

  16. Phi
    I don’t know what point you are trying to make by explaining why you asked your previous question. Could you try to communicate whatever point you are trying to say directly?

    If you calculate the correlations based on detrended data (as proposed by Gergis), and letting an high limit for acceptance (eg r> = 0.6),

    If you are suggesting that a different method of screening would not have this issue– sure. But in that case, we would have to think a while to figure out if the alternate method has its own unique issues. Right now what we know is the method Gergis propsed to use was not implemented in Gergis. So, the results in Gergis did not arise as any feature of the proposed method. It arose from the method they used.

    Nick

    Your second plot shows exactly what I (and toto) have been saying. Selection alters the behaviour in the training period. It must, and that means the proxy estimate is not an independent estimate of temperature there. But it does not alter the “shaft”; the part before 900 (once autocorrelation has worn off).

    First: I have never disagreed with you that thos screening does not change the shape of the shaft far away from the calibration period. As far as I can tell, no one has.

    Second: In this case, the distorted period is limited to after 900, but if I make the noise red, I can propagate the distorted period back. So, even in this case, if I use red noise some of the shaft will be distorted. One will be able to estimate the period that needs to be excluded.

    Finally: I know you keep repeating this point, but it’s a stupid one. These papers all make pronouncements on how the temperature in the past compare to current temperatures. This is precisely what you cannot do. And trying to counter that by saying that the the general shape in the far past is ok– provided you make no comparison to anything in the training period (or the excluded training period) strikes me as either a) disngeneoups or b) intentionally trying to inject a red herring.

    It’s the shaft that counts.

    Oh? Counts in what sense? Obviously, “what counts” depends what conclusion you are claiming to make. If the only conclusion in the proxy reconstruction papers is to say “The medieval warm period was colder than the little ice age”, then ok. But if the conclusions say the current temperatures are unprecendented— then, FAIL!

    You got essentially the same result in the training period. A predictable variation in noise.

    Uhhhmmmm. You are aware that people claim these show current temperature are unprecendented. Right? Or are you going to pretend that those graphs only show the shaft?

    If the papers make absolutely no comparison between the shaft and temperature in the calibration either by implication (i.e. showing on the same graphs) or text and explicitly communicate that the results tells us nothing at all about how past temperatures compare to current ones, there is no fallacy.

    Because the opposite is the case: They fall into a fallacy.

    I have said from the start is not the result you should be looking for from proxies.

    Then you should be criticizing claims like this

    In the Northern Hemisphere, the late 20th / early 21st century has been the hottest time period in the last 400 years at very high confidence, and likely in the last 1000 – 2000 years (or more).

    When they are based on screening. Because that sort of claim is the fallacy. And if you are trying to pretend you don’t ‘get’ that that’s what we are identifying as the fallacy that runs rampant in hockey stick papers, you are just being disingenous.

  17. Kap

    In the Northern Hemisphere, the late 20th / early 21st century has been the hottest time period in the last 400 years at very high confidence, and likely in the last 1000 – 2000 years (or more).

    Do you see an issue with making a claim like the above which involves comparing the temperatures during the training period to the shaft? Or are you imagining this conclusion is only made by looking at the shaft and without comparing temperatures in the late 20th century and the early 21st to the period prior to the 20th century? Because you are really going to have to tell me how they make that claim without falling into the fallacy.

  18. Lucia,
    “First: I have never disagreed with you that thos screening does not change the shape of the shaft far away from the calibration period. As far as I can tell, no one has.

    Second: In this case, the distorted period is limited to after 900, but if I make the noise red, I can propagate the distorted period back. So, even in this case, if I use red noise some of the shaft will be distorted. One will be able to estimate the period that needs to be excluded.”

    Well, what are we arguing about then? The whole point of proxies is to find out about temperatures before the training period. And yes, redness takes the selection effect back a few years. But that only reflects the limited time resolution of the proxies.

    “These papers all make pronouncements on how the temperature in the past compare to current temperatures.”
    Yes, they do. And I’ve said they should be careful. But it’s basically correct if you compare past proxy temps with recent instrumental temps. So
    “In the Northern Hemisphere, the late 20th / early 21st century has been the hottest time period in the last 400 years at very high confidence, and likely in the last 1000 – 2000 years (or more).”
    is correct, based om instrumental vs proxy. And we do know instrumental

    But this “screening fallacy” seems to be very hard to pin down.

  19. Nick

    But it’s basically correct if you compare past proxy temps with recent instrumental temps

    No. This is nearly always basically incorrect if your past proxy was developed using correlation screening. That’s what we are arguing about. I’ve been pretty well aware of this. I have no idea why you don’t understand that’s what the argument is about.

    But this “screening fallacy” seems to be very hard to pin down.

    The screening fallacy is the claim you make here:

    But it’s basically correct if you compare past proxy temps with recent instrumental temps. So
    “In the Northern Hemisphere, the late 20th / early 21st century has been the hottest time period in the last 400 years at very high confidence, and likely in the last 1000 – 2000 years (or more).”
    is correct, based om instrumental vs proxy

    When you make that claim and basing it on a reconstruction based on screening, you are almost certainly making a screening fallacy. There may be a way to avoid it– but that would involve someone recognizing that they need to correct for the artifact created by the screening and then correcting for it.

  20. Lucia:

    Right now what we know is the method Gergis propsed to use was not implemented in Gergis

    Just for the record (I suspect Lucia knows this), the method of detrending before correlating was first introduced by von Storch (2004), and because it’s critical of Michael Mann’s approach, the RC guys naturally say it’s flawed.

    Mann used a detrended series to reconstruct with at one point, got nonsense, von Storch went back and showed if you can program correctly (*cough* /2) you get a consistent reconstruction.

    That paper’s located here.

    Actually the whole back and forth is fun to read (bring popcorn).

  21. B.Kindseth

    This suggests a multitude of possible tests of the screening methodology.

    Yes. There are many “features” that can be introduced by screening. To show each requires thinking of a ‘toy’ synthetic case and showing what happens.

  22. Lucia #97887,
    I asked for a definition of “Selection Fallacy” and never got one. It was supposed to be the way in which “they” made hockey sticks out of thin air. And now, as I suspected, it’s vanishing into almost nothing. Maybe some people quoted proxy temps when they should have quoted the (very similar) instrumental temps.

    But I guess it will go on for a while as a shorthand for all and any of the alleged sins of Mannkind. And we’ll never get a definition.

  23. lucia, glad to. The code you sent was quite helpful for what I was curious about.

    When I rescaled the series so their variances matched the temperature’s for the period they overlap, I got the result I expected. The averaged series looked the same as the underlying data. The screened series, on the other hand, had deflated variance. This shows the screening process doesn’t just cause the sharper uptick at the end, but changes the interpretation of the data throughout the entire series.

    More interesting to me was what happened when I scaled the series by their entire length, not just the period they overlapped the “known” portion. In this case, the screened series matched the actual temperatures well for 900 years, but where the two series overlapped, the screened series exaggerated the swings in the data.

    This may be intuitively obvious to many people, but I wanted to work through some things like this since I started looking at the Ljungqvist reconstruction. It uses this type of rescaling by comparing one series from 1000-1900 to another series from 1850-1989. That struck me as strange, and I wanted to get a feel for how selecting different periods could affect results.

  24. Nick #97901

    I’m trying to follow what you’re saying, but this exercise is getting circular.

    You’re trying to say that the statements about the instrumental record being unprecedented in recent millenia is basically correct given the proxy data.

    You’re also saying that proxies should not be used to define/understand temperatures in the instrumental record period, and instead should just rely on proxies.

    You’re also saying that there is an ‘assumption’ built into the system that proxies that correlate well to the instrument record (even though trees shouldn’t be used to represent temperature in the instrument record era)– are assumed to be ‘good’ going backwards in time, however far you need.

    How can you simultaneously hold that trees shouldn’t be used to indicate present temperature, and yet screen them against present temperature to identify the ONLY reputable candidates for paleo-reconstruction?

  25. Nick, how can it be that you “never got one” but can tell us what it was ” supposed to be” in the very next sentence (and then get that wrong too)?

  26. Nick Stokes:

    “It is particularly important to get the last word where you are in some doubts as to the merits of your case. The last word will serve as a clinching argument that will make up for any deficiencies in your logic. Achieving the last word now also brings the advantage that you may subsequently point to your success in this debate as the clinching argument in future debates. However, if you did not win the last discussion, we still recommend claiming incessantly that you did.”

    Andrew

  27. Nick–

    I asked for a definition of “Selection Fallacy” and never got one.

    You’ve been told many times. You just want to pretend you aren’t getting an answer by saying the shaft is right (which isn’t entirely true.)

    But I guess it will go on for a while as a shorthand for all and any of the alleged sins of Mannkind. And we’ll never get a definition.

    No. See above.

    I’ll be posting another problem with screening — likely Monday.

  28. Nick,

    I don’t see why comparing the shaft with the blade of the reconstruction or the instrumental temp record (ITR) makes any difference.

    Surely the blade (and therefore the shaft which remains firmly attached to the blade) have been scaled to match the ITR. Lucia has shown here that the methodology can introduce a bias between the two parts of the reconstruction. Your point seems to suggest or assume that the shaft of the reconstruction and ITR are independent, but they are not their relative magnitudes are imposed by the blade. Simply disappearing the blade from the reconstruction doesn’t take away this fact.

    The only interest for policy in these reconstructions is the fact that they show late 20th century is the warmest in the past 2000 year. You have to see that, the IPCC certainly do. After you’ve decided the late 20th C is the warmest then all the wobbles and bumps in the shaft are of interest only to the scientists and science nerds.

  29. HR

    The only interest for policy in these reconstructions is the fact that they show late 20th century is the warmest in the past 2000 year. You have to see that. All the wobbles in the shaft after you’ve decided the late 20th C is the biggest are of interest only to the scientists and science nerds.

    Moreover, the papers that publish these reconstructions frequently proclaim showing the late 20th century is the warmer than anything in the past as their major conclusion. The fallacy is that if you screen using correlation, you can’t make such a comparison. (Or, to do so, you need to correct for the mathemagical artifacts introduced by screening– which they don’t. I suspect it’s not even possible because they threw away the data that would permit you to figure out the correction!)

  30. These are the correlation coeficients vs CRU and their own reconstruction. I would only give 14 of them house room.

    r2 CRU r2 recon loc
    0.361 0.448 Mt Read
    0.0943 0.13 Oroko
    0.1542 0.3564 Buckley’s Chance
    0.1566 0.2556 -Palmyra
    0.1835 0.3677 Celery Top Pine East
    0.2631 0.331 Kauri
    0.1956 0.3107 Pink Pine South Island
    0.2238 0.31 Mangawhero
    0.0973 0.1347 -Urewera
    0.0639 0.1462 NorthslandComposite_1
    0.3333 0.5616 Takapari
    0.1231 0.1911 Stewart_Island_composite
    0.2248 0.3035 -Fiji_AB
    0.0639 0.0951 -New_Caledonia
    0.1564 0.2839 NI_LIBI_Composite_2
    0.2342 0.4003 -Rarotonga
    0.1632 0.1915 Vostok_d18O
    0.0762 0.1187 Vostok_Accumulation
    0.0898 0.1702 -Fiji_1F
    0.0985 0.1506 Bali
    0.0664 0.1419 -Abrolhos
    0.0748 0.241 -Maiana
    0.1244 0.213 Bunaken
    0.2197 0.46 Rarotonga.3R
    0.22 0.205 Ningaloo
    0.0432 0.0072 Madang
    0.1775 0.2264 Laing

  31. Doc–
    Those are the r2’s of the ones they used? Are they R2 or Rs? And is the ‘-‘ mean a minus sign. (If it does, then you must mean R, right?)

  32. For my own clarification

    1) The relative size of peaks of the shaft and the blade of these reconstructions can be biased by the methodology. This is what Lucia is showing here and Nick seems to agree with that.

    2) The reconstruction is scaled to the instrumental record at the region of the blade. The shaft begin scaled proportionally.

    3) It makes no difference whether you compare the shaft with the instrument record or the blade of the reconstruction. The possible methodological bias has introduced to the comparison of the instrument record with the shaft of the reconstruction by using the blade as the region for scaling between the two.

    Sorry if this is repeating what I said before but I’m trying to clearly write what seems so obvious in my head. The fact that Nick has persisted with his argument makes me wonder that I’m missing something.

  33. lucia
    Doc–
    Those are the r2′s of the ones they used? Are they R2 or Rs? And is the ‘-’ mean a minus sign. (If it does, then you must mean R, right?)”

    I have just been plotting the data from Steve McI’s linked excel website.
    They are r2. The (-) is a memory aid for me for inverted datasets; the last five are also inverted.
    The Maiana dataset doesn’t correlate well to temperature, 0.0748, but does to the reconstruction, 0.241.
    The data is, forgive my native Anglo-Saxon, shit.
    The fact they all correlate so much better with the recon, than with the temperature is odd.
    The fact they didn’t try to get the best fit for the recon over the while available 1900-1990 range is odder.
    The Palmyra is only included as a chunk of discontinuous data gets rid of the warming that is the same as the modern era. Without that, no hockey stick.

    I think Vostok_Accumulation is trying hard to tell us about rainfall; but the ice record only goes back to 1777; which is a bit odd when you think about it.

  34. Brandon:

    This may be intuitively obvious to many people, but I wanted to work through some things like this since I started looking at the Ljungqvist reconstruction. It uses this type of rescaling by comparing one series from 1000-1900 to another series from 1850-1989. That struck me as strange, and I wanted to get a feel for how selecting different periods could affect results.

    This realization was where I recognized that you couldn’t treat the reconstructed quantity as temperature, so I’ve used the term “pseudo-temperature” since then. If you remember when I did intercomparisons I used Pearson’s r, in part to remove the different scalings between the series.

    What is really interesting about the lack of an absolute scale factor for proxy temperature is you can’t conclude that the MWP is actually cooler than the current temperature, until you’ve properly “rescaled” the series for its loss of variance.

    By the way, Jeff ID has done a bit of work on this too. See this link.

    Von Storch was I think the first to point out this loss of variance out. It’s the Here’s his abstract:

    The performance of two methods to reconstruct Northern Hemisphere temperature histories of the past millennium is analyzed by applying them to the known development of an extended climate model simulation. The MBH-method underestimates low-frequency variability significantly, whereas Moberg’s method operates satisfactorily.

    Moberg as you may remember proposes to use “known” temperature proxies where you can apply the uniformity principle to reconstruct the low-frequency component, however, these are plagued by being available in only a few remote geographical regions and have very low sampling rates, and then “back-fill” with tree-rings to get high resolution reconstructions and to use tree-ring width as an interpolating variable to increase the geographical coverage of the series.

  35. It would be interesting to put the screening period in the middle of the time sequence. I bet if you did you would not only get the hockey stick but would also get a “divergence problem” after the screening period.

    It seems clear to me that if one had 1000 sequences that were random and constrained to be between -1 and +1, and extended for 2000 years, each with some autocorrelation (I believe that means that last years value is correlated to this years value. Correct me if I am wrong but that is what I mean.) Then if you were to just average the sequences you would get essentially a horizontal line at 0.

    If one were to then screen for the 500 sequences that were increasing for the period 900-1000 and then average the values one would get a curve that started at 0, declined until at year 900 to a value of -.5 or so, then increased rapidly to year 1000 with a value of somewhere between +.5 and +1, (there is the hockey stick) and then stopped increasing and began to gradually decrease (here’s the divergence problem) again until again reaching a value of 0.

    If the screening retained a small number of sequences one would get essentially the same pattern but with more variation in the final result. If the number were sufficiently small the result would be very noisey and the general pattern may not even be evident. It would be dependant on which specific sequences happened to be retained.

    In any case, if one were to do some type of screening and get a result like I described, I think you should conclude that you were dealing with essentially random data with no actual relationship to the variable used for screening. You certainly could not conclude that the shaft of the stick had any meaning what-so-ever.

  36. Jacoby: a few good men

    http://climateaudit.org/2005/02/06/jacoby-1-a-few-good-series/

    david stockwell’s paper

    http://landshape.org/images/script.pdf

    http://noconsensus.wordpress.com/2008/09/23/the-flaw-in-the-math-behind-every-hockey-stick/

    http://noconsensus.wordpress.com/2008/09/29/simple-statistical-evidence-why-hockey-stick-temp-graphs-are-bent/

    ##################

    I think the challenge is pretty clear.

    Whenever an analyst makes a decision to throw data out, the effect of that decision must be documented. Does it introduce a bias in the mean? does it reduce or inflate variance?

  37. Lucia (#97881)

    “I don’t know what point you are trying to make by explaining why you asked your previous question. Could you try to communicate whatever point you are trying to say directly?”

    I tried to explain it. The statistical artifact that you describe can produce decent hockey sticks only if selected proxies are bad. And they are bad because the selection criteria (instrumental temperatures) are bad. Good proxies exist but they are eliminated or gratified with low weight.

  38. Isn’t the argument being used is that you have more than one variable affecting the data and the second variable can overwhelm the signal being sought relative to the first variable? In the tree ring case, we’re looking for a relationship between the rings and temp. The argument is that if the rings respond well to temps in the calibration period, then we can assume that they will also respond well outside of it. If trees of the same type do not respond the same during calibration, then there must be a second (or more) variable that is overriding the ring/temp relationship and those trees can be removed. So we have the following in the raw data set:

    1. Trees that calibrate to temps both inside and outside of the calibration period
    2. Trees that calibrate inside but not outside because of other variables affecting the tree outside of the period (drought ect).
    3. Trees that don’t calibrate inside the calibration period but do outside
    4. Trees that don’t calibrate inside or outside
    (by calibrate outside I mean that they would if we had the temp data)

    So given the selection methodology, trees from 1 and 2 would be kept and 3 and 4 would be rejected. The kept set may still not represent temp response because the correlation may still be accidental where the second variable may still be driving the response but happens to look like a temperature response. Rejection of 3 and 4 assumes that the second variable is overriding the temp response and not that the temp response is not significant.

    So to model the methodology I think you would need a two variable function that can recreate a data set that covers these situations (a “temperature” variable and a non temp that can cause rejection). Doing that would validate whether the selection criteria actually can be trusted to provide a temperature by proxy.

    My take is that the rejection criteria is only valid if you can identify the other variable(s) and reject based on their existence relative to a specific tree you are going to accept or reject. Otherwise you can just as as easily be grabbing data from a tree that randomly correlates in the calibration period and has no real identifiable response to the temps.

  39. Barry,
    It’s not clear if you’re talking about the toy problem. With real recons, you don’t know what calibrates outside the calibration period, unless you’ve withheld temp data – but you can’t afford to withhold much.

  40. Nick let me correct your statement.

    Barry,
    It’s not clear if you’re talking about the toy problem. With real recons, you don’t know what calibrates outside the calibration period.

    There! a correct statement by Nick, a rarity I admit.

  41. BarryW:

    A more subtle line of reasoning can be made. For Nick, the source is Geographical Ecology, Patterns in the Distribution of Species,Robert H. MacArthur, Harper and Row 1972.

    From page 154; Perhaps the most important question we can ask about communities is “Does the environment dictate the structure of the community, or are the species a fairly random assemblage?” Then we ask, “Are the boundaries sharp, with many species dropping out synchronously, or do species drop out independently?” The second question is clearly different from the first: The environment could exert a profound control over the morphologies, physiologies, and number of species, and the first question would be answered in the affirmative; yet as the environment varied continuously in space, the different species might drop out independently. endquote

    This exert and the following discussion is generally about habitat, environment and their effects. The book is quite good, and has a remarkable amount of information that shows one can make determinations, that Nick has indicated can’t be made, and actually demonstrate something such as temperature sensitivity. But as the author points out, it is hard, long and involved, and requires a large number of samples and sometimes many years of effort.

  42. Phi

    I tried to explain it. The statistical artifact that you describe can produce decent hockey sticks only if selected proxies are bad. And they are bad because the selection criteria (instrumental temperatures) are bad. Good proxies exist but they are eliminated or gratified with low weight.

    That depends on what you mean by “bad”. But if the correlation were very high, it’s hard to screw things up. Screening wouldn’t “help”– but it wouldn’t hurt much. The mathemagic that creates the artifacts acts on the noise.

  43. Nick:

    Steven,

    “Does it introduce a bias in the mean? does it reduce or inflate variance?”

    Mean of what? variance of what?

    In the limit of lots of data, does the reconstruction estimate for the thing you want to reconstruct <y> match y. In this case: It does not.
    1)“Does it introduce a bias in the mean?” means: “does the mean of y match y”?
    2) Does the variability of y about the mean match the real variability about y.
    In my toy problems above, the answer is “no” for (2) in both cases. In the case of not screening, it would be possible to re-inflate correctly. The reason you could do this is that the correlations from the training period are not biased by screening. So you can use those to estimate the fraction of the signal lost and you could reinflate based on your data.

    In the case of screening, you’ve lost the information that would let you reinflate properly. The reason is that your estimate of the correlation for your proxies based on the training period is wrong. It will always be wrong. No amount of invoking “uniformity principle” will fix the fact that because you threw-away more than 1/2 the samples because of low correlation, and as a result, the batch you have mis-represents the correlation between the treenometers and the data during the reconstruction period.

  44. “It’s not clear if you’re talking about the toy problem. With real recons, you don’t know what calibrates outside the calibration period, unless you’ve withheld temp data – but you can’t afford to withhold much.”

    You could withhold if you use longer temp record such as CET but uncertainty creeps in then

    Wouldn’t a measurement of variance work in a similar way? When you screen against the instrument record you generate a subset of a larger database that presumably are better proxies for temperature. This narrows the spread of readings around the real temperature. When you extend the record back outside the instrumental record you don’t know the real temperature but you should still see a narrowing in the variance around the real (but unknown) temperature if the proxy is still remaining good. Is this experiment generally done in these papers? It’s an argument for release of both the screened subset and original larger database as well.

  45. HR, you are supposed to come up with an apriori reason, use a statistical model that will allow you to test your hypothesis by taking samples that will confirm or invalidate, take the samples, correlate per the model, and discuss findings. One does not have the degree of freedom to sort, use an assumption of nonvariance of just cause, and discuss outside the training or calibration period. Since we have decent thermometers, the exercise of such proxy reconstruction is trivial. However, to make conclusions of the past is unsupported, as in may be erroneous. Don’t know means Don’t know.

  46. “It’s not clear if you’re talking about the toy problem. With real recons, you don’t know what calibrates outside the calibration period, unless you’ve withheld temp data – but you can’t afford to withhold much.”

    Think about what you said:
    “you don’t know what calibrates outside of the calibration period”.

    If trees you’ve selected based on the fact that they calibrated with temps would not calibrate outside of that period, then how can they be expected to be proxies for temps outside of the calibration period? You just stated that they may or may not represent the the temps (they can’t be calibrated).

    But back to the toy problem. After thinking about it, Lucia applies the uses one correlation value for the entire data. Some microclimates may have a consistent correlation over their entire history, but some will correlate well at some times and at others not (they will be affected by another variable, nutrients, water). So the correlation value may change over time on a proxy to proxy basis. Maybe that’s a better statement of what I’m trying to say couched in the terms of this thread.

  47. Carrick:

    What is really interesting about the lack of an absolute scale factor for proxy temperature is you can’t conclude that the MWP is actually cooler than the current temperature, until you’ve properly “rescaled” the series for its loss of variance.

    Indeed. Unfortunately, there seems to be little interest in the issue in the paleoclimate community. For as common as rescaling is with them, you’d think* they’d want to examine how it affects things, but that doesn’t seem to happen.

    By the way, Jeff ID has done a bit of work on this too. See this link.

    Von Storch was I think the first to point out this loss of variance out.

    Thanks for the links. I had seen that Von Storch paper before, but not the post by Jeff Id. By the way, I think Von Storch was the first person to explicitly discuss the issue of variance deflation, but not the first to discuss it. It was a core part of the criticisms of MBH.

    Moberg as you may remember proposes to use “known” temperature proxies where you can apply the uniformity principle to reconstruct the low-frequency component, however, these are plagued by being available in only a few remote geographical regions and have very low sampling rates

    And by not always being known temperature proxies!

    *Or at least, you would like to think that.

  48. BarryW–

    So the correlation value may change over time on a proxy to proxy basis. Maybe that’s a better statement of what I’m trying to say couched in the terms of this thread.

    You are discussion a problem this ‘toy’ problem doesn’t capture. You are discussing whether the ‘non-uniformity’ assumption makes sense for trees– and that’s worth discussing. If trees response to temperature actually does change (for some reason) and differs in the pre-calibration vs. calibration periods then obviously, the treenometer method won’t work at all. We don’t need any ‘toy’ problems to show that.

    My toy problem shows what happens if we do assume the trees behave the same inside and outside the calibration period. In that case, screening causes “artifacts”. That is: Assuming we all accept (for some reason) that the trees will behave the same inside and outside the calibration period?

    My toy problem is set up to show that there are problems even if the calibration for treenometers response to temperature completely stable and doesn’t change one jot during the entire reconstruction period.

    So the correlation value may change over time on a proxy to proxy basis. Maybe that’s a better statement of what I’m trying to say couched in the terms of this thread.

    It is true that in reality, the correlation may change over time. But that is an entirely different problem from the one revealed by this toy problem. So, it’s best to avoid introducing that discussion. Otherwise, it will be difficult to keep reminding people that this toy problem shows a difficulty in the optimal circumstance where that problem does not occur.

  49. BarryW, #97978
    “You just stated that they may or may not represent the the temps “
    Not at all. I said you don’t know what calibrates

    You described a procedure that required the user to select proxies based on correlation outside the calibration period. I made the elementary observation that you just don’t know that. You don’t have instrumental temperatures to calibrate with. If you did, you wouldn’t be bothering with proxies.

  50. Lucia,
    “because you threw-away more than 1/2 the samples because of low correlation”
    Why I asked about “mean of what?” is that I think it is a recurrence of the fallacy that this is a sampling problem, and you were trying to get a mean of a population. So the selection would be unrepresentative.

    But it is the very opposite of a sampling problem. You are trying to get very unrepresentative trees. You go to the ends of the earth to find trees that are unlike the others. And in the last stage of selection you reject trees because they still seem too representative.

    And when you’ve finally found a tree that correlates, then you can think abouut whether, for that tree, it’s reasonable to assume that the correlation that you see would have applied over millenia. The answer to that question has nothing to do with the trees you rejected.

  51. This result is predicated on all your proxies being valid treemometers both during calibration and back in time. It shows the effect of screening absent other confoundings. But there are other confoundings. Law dome could have been rising in elevation due to the huge amount of snow over 2000 yrs. The coral data could have been affected by 2000 yrs of sea level rise. Tree rings could have been affected by changes in precip over time. In all these cases a good correlation with recent temperatures could be totally spurious. In this case, the curves prior to the calibration period could be any random shape and averaging them will give a flat shape (straight line) and a recent uptick: a hockey stick.
    But I see something funny in your result. If I take n sin waves, sample them every t years with white noise, and average them using Mathematica I get close to the original curves, not the purple curve. I can only get the reduced variance (purple) curve if there is DATING error. See my paper: Climatic Change
    Volume 94, Numbers 3-4 (2009), 233-245, DOI: 10.1007/s10584-008-9488-8 A mathematical analysis of the divergence problem in dendroclimatology
    Craig Loehle

  52. As followup, the value for temperature in the past from n proxies with white noise is (where the Sum over n samples of white noise):
    n*Sin[t]/n + (Sum(Random[0,sigma])/n

    the first term simplifies to Sin[t] and the second goes to 0 as n becomes large. Thus the mean of the n proxies back in time should be an approximation of the true signal under these assumptions (ie, no red noise, no systematic error, no dating error).

  53. Seems pretty clear what to do.

    withold an early part of the temperature record and a late part.
    calibrate with the center part and then verify in the withheld periods.

    so you end with a few good men.. and big old uncertainty bars.

    who cares.

  54. Craig Loehle (Comment #97983)

    > But I see something funny in your result.

    Yes, what Craig said. If sampling is accurate on the x axis and also accurate on the y axis, then the purple curve should recapitulate the black curve in the second figure, if all of the proxies are used for the calibration step in the final 50 years, and then used to backcast the earlier 950 years.

    Adding noise should, well, just add noise.

    The divergence from “true” temperature could arise when the 44% “best” proxies are selected in the 50-year calibration period. Even then, those final 50 years should be “predicted” accurately by the proxies — since they were calibrated to do just that.

  55. Craig,
    I think Lucia has rescaled in some unspecified way after adding noise.

    I think (guess) that may be the explanation for the odd discrepancy in the post-950 green curve which shoots up. It isn’t trying to follow the cooling. It’s trying to follow the rescaled sine (plus cooling). That’s why it diverges down and then up.

    If that guess (and that’s all it is) is right, you’d get much the same result with the sign of cooling turned to warming.

  56. I have always suspected that when a major volcanic event occurs that, what with the ash and SO2 injected into the stratosphere, the hemisphere the volcano is in should cool.
    Big vulcanos = Big DROPS in temperature. Thus, trees and corals which are temperature sensitive, should have a drop in growth.
    Now when Mount Tambora did the big firework in 1815 the ejected matter caused to global cooling and worldwide harvest failures; the Year Without a Summer.

    So of course this provides researchers with an internal control. Temperature proxies will show a fall in temperature in 1815, and the rebound will be slow.

    In the long 11 proxies; Mt Read, Oroko, Buckley’s Chance, Celery Top Pine East, Kauri, Pink Pine South Island composite , Mangawhero
    Urewera, North Island_LIBI_Composite_1, Takapari and Fiji_AB, no sudden drop in temperature.
    What about Krakatoa? Big bang in 1883.
    Not a sausage. No drop in temperature in the main proxies, or in the recon.

    What complete bollocks.

  57. Nick:

    Mean of what? variance of what?

    Bloody hell, Nick.

    Are you being deliberately obtuse, or are you just lost as a goose?

    Let’s see, I think we were talking about cheese in China. 🙄

    pftttt done.

  58. Seems pretty clear what to do. withold an early part of the temperature record and a late part.Calibrate with the center part and then verify in the withheld periods.so you end with a few good men.. and big old uncertainty bars. who cares.

    If it were that easy. The early part *and* the late part of the temperature record constitute anywhere from 1% to 10% of the series being analyzed, so you’re still at anything from the end to the very edge of the series being analyzed. Your [Withold] groups cannot be more than 0.3 to 3.3% of your Treememter/proxy series being analyzed. So can we agree you’d still be calibrating against the end of the series availble for calibration?

    When you filter proxies to correspond to last segment of the series, you cannot tell yourself and the world (correctly or honestly) that the values in the period that you just filtered for (now) are either unusual or unprecedented compared to the rest of your selected series.

  59. Amac and Craig, Nick is right on this one. With CPS you are calibrating the series to the instrument series during the training/calibration period, so you are rescaling the series, not simply adding them together.

    If you preselect the series based on some external criteria (metadata driven proxy selection) and the series have been previously calibrated to temperature and then add them together (as with Craig’s paper) you don’t get this deflation in your overall scaling.

    I think this is part of why the non-rescaled version of Craig’s reconstruction shows more variability than the CPS version.

    It’s the fact that you’re screening with a signal that has noise that causes the deflation. When people use a variant of that method in my field for frequency based calibrations, the regions where the SNR is poor, there is always a “dip” in the apparent calibration constant. It works great when the SNR is good though and the physical link between the quantities you are correlating (pressure and voltage in my case) are well studied and well understood.

    The reduction in the amplitude of the time-domain signal is the equivalent of a descaling of the low-frequency content of the proxy reconstruction.

  60. Nick
    “You described a procedure that required the user to select proxies based on correlation outside the calibration period.”

    No, that wasn’t what I meant. If you select based on correlation inside the calibration period you are assuming that the correlation still holds outside of that period because what your trying to discern is the temps based upon their proxy still representing the temps outside of the calibration period.

    Lucia
    “My toy problem shows what happens if we do assume the trees behave the same inside and outside the calibration period. ”

    Yes, but your problem assumes all trees have the same correlation over their entire history doesn’t it? In other words, all the data is “good”. If you knew that a priori then throwing away “good” data is a bad thing as you showed. I’m just wondering what happens if there is a random distribution of good and “bad” data. Which is the argument that seems to be made for the screening process. Does the screening still do worse than the whole set?

  61. Brandon:

    Moberg as you may remember proposes to use “known” temperature proxies where you can apply the uniformity principle to reconstruct the low-frequency component, however, these are plagued by being available in only a few remote geographical regions and have very low sampling rates

    And by not always being known temperature proxies!

    Let me change that to “the conceptual idea behind Moberg 2005 is to use “known” temperature proxies…”

    It’s been shown by von Storch btw that Moberg 2005 doesn’t have the deflation of variance present in MBH.

  62. Carrick:

    Amac and Craig, Nick is right on this one. With CPS you are calibrating the series to the instrument series during the training/calibration period, so you are rescaling the series, not simply adding them together.

    Carrick, I think you’re confused. This isn’t CPS, and Nick isn’t right at all. He said he thinks “Lucia has rescaled in some unspecified way after adding noise,” but she has done nothing of the sort. What she did is exactly as she described in her post.

    The “something funny” being talked about is exactly what one would expect from adding noise like lucia did. Adding white noise to a group of series will attenuate the signal you get from averaging them. The only thing remotely mysterious about what she did is the distribution of her white noise. It’s normally, not evenly, distributed, something you might not realize from her post. That difference affects how much the variance is deflated (and the correlation calculations), but that’s just an issue of noise strength.

    Craig, AMac and Nick, I’m confident you can figure out what’s going on if you think about it, but if you’d like, I can type up an explanation after I eat dinner.

  63. Brandon:

    Adding white noise to a group of series will attenuate the signal you get from averaging them.

    E.r., no it won’t.

    If you let y = a cos(w t) + n(t)

    Then E[y^2] = a^2 + E[n^2].

    Then E[y^2] = a^2/2 + E[n^2].

    (But that’s not the right problem. Let me diddle on paper and see if I can write down the correct formulation to this problem.)

    I haven’t looked at Lucia’s code I admit to see what she’s done. I assumed she was scaling the series to match a(w t) + n(t) to her calibration signal.

    The only time I’ve ever seen deflation of the scaling constant is when you are using a correlational based method to compute the sensitivity of an instrument with respect to a measured signal, and you have a low SNR present.

    NB: No need to reply to the above! I’m going write my own simulation to make sure I understand what she’s done. I will get my terminology right here at some point.

  64. Carrick:

    Let me change that to “the conceptual idea behind Moberg 2005 is to use “known” temperature proxies…”

    What you said was fine since you said he “proposes” to do it, not that he managed to do it. I was just ribbing you because of our earlier conversation about the proxies.

    Besides, when a series is said to be a temperature proxy because someone has “undoubtedly” confirmed it is, I have to make jokes.

    It’s been shown by von Storch btw that Moberg 2005 doesn’t have the deflation of variance present in MBH.

    Technically, he only discussed low-frequency variance while MBH had both low and high. Nobody seems to care about Moberg’s high-frequency efforts.

    But yeah, Moberg definitely shows the problem can be avoided.

  65. Nick;

    I think Lucia has rescaled in some unspecified way after adding noise.

    Each proxy is correlated by a specified value of “Rknown” with temperature. The method of generating it results in individual proxies with the same variance as the temperature. I could rescale. But in both cases, the green will show a “record” temperature at the end (which is wrong) while the purple will not (which is correct).

    I think (guess) that may be the explanation for the odd discrepancy in the post-950 green curve which shoots up. It isn’t trying to follow the cooling. It’s trying to follow the rescaled sine (plus cooling). That’s why it diverges down and then up.

    Both are the green and the purple are correlated with the actual temperature (black). At the end, that temperature is sin+cooling– so that’s what both try to follow. The screened method introduced a bias in the shape showing a “record” near the end.

    If that guess (and that’s all it is) is right, you’d get much the same result with the sign of cooling turned to warming.

  66. But I see something funny in your result. If I take n sin waves, sample them every t years with white noise, and average them using Mathematica I get close to the original curves, not the purple curve.

    To get the correlation to be “R”, I do:

    width= R(sin[ω*T]) + sqrt(1-R^2){ N*sd(sin[ω*T)}

    where N is white noise. This gives a proxie whose correlation with T is equal to R. You can multiply by any number and the correlation is R.

    I can rescale to make plots I would get if I did

    width= (sin[ω*T]) + {sqrt(1-R^2)/R} { N*sd(sin[ω*T)}

    We can do this all sorts of ways. I can talk about rescaling a bit later– Brandon wanted me to talk about that a little. But I dont’ entirely know how people do rescale these. So, I’ll just do it a way that might be done.

  67. Brandon:

    Nobody seems to care about Moberg’s high-frequency efforts.

    That’s because there isn’t any way to verify it’s right or has anything to do with temperature. Still it’s useful as a proxy to study ENSO, which is what I’m guessing that Moberg was interested in, in making the high frequency reconstruction.

    (I’m in dangerous territory here. I was tearing apart my kitchen all day as part of a remodel, followed by a 22oz beer. I feel like I think bender would feel after cleaning mosher’s pool.)

  68. BarryW

    Yes, but your problem assumes all trees have the same correlation over their entire history doesn’t it? In other words, all the data is “good”. If you knew that a priori then throwing away “good” data is a bad thing as you showed. I’m just wondering what happens if there is a random distribution of good and “bad” data. Which is the argument that seems to be made for the screening process. Does the screening still do worse than the whole set?

    It’s a big difficult to answer those questions because first we have to look at what a particular paper that uses correlation screening might be trying to do.

    First: There is the possibility that someone really goes out and picks out every single possible proxy in the word even though they have no reason whatsoever to think anything is correlated at all. In this case, your “null” hypothesis about the proxies is they don’t correlate with temperature– and you want to see what happens if you screen. That case is discussed here — and it’s basically, the “you can make a hockey stick out of trendless red noise” case.

    But the rebuttal to that is what motivates this post. That rebuttal is “But you don’t just go around fishing for a correlation. Instead, your first step is to use metadata to find cases that you think based on phenomenology do correlate with temperature.

    In this case, though, somehow a person still wants to screen. They want to say that even though we used meta-data to find cases that ought to have a correlation 0<R (though small), we’re going to throw some away.

    But in this second case where your meta-data really, truly do suggest a bunch of trees are treenometers, your null hypothesis ought to be the trees really all are treenometers. And in that case– this is the bounding situation. It’s basically showing what happens if notion about the first screening based on metadata is right and teases out what screening does.

    If your interested in the third case: What does screening do if using meta-data is really only sort of hit and miss? Well… that depends. If your pre-screening based on meta-data is hit and miss, you really need to improve your ability to pre-screen using meta-data so that you aren’t trying to fix that up by doing a second screening with correlation. (Or… at least, if you are going to do the 2nd screening, you should do it where you only throw things away if you can argue that if you assume the correlation for all the proxies are equal to some constant value R that matches the batch value, a particular proxy is an outlier. That would result in throwing away very few cases. You wouldn’t throw out more than 5%. (In fact, you should throw out fewer.)

  69. Carrick:

    E.r., no it won’t.

    You’re right. It’s actually the standardization process which causes the observed attenuation. The individual proxy series are all standardized before being averaged. Because that standardization is based upon the standard deviation of the proxies (which increases as you add white noise) it causes the resulting signal to be smaller. That is a rescaling step, but I didn’t think of it as one as it is such a common part of averaging series my mind grouped the two together.

    It probably doesn’t help for me to use “attenuate.” The amplitude of the series did decrease, and thus it was attenuated, but that’s the absolute amplitude. The relative (e.g. unit-invariant) amplitude isn’t affected. I imagine that could make my meaning less clear.

    Sorry for any confusion I caused!

  70. Carrick:

    That’s because there isn’t any way to verify it’s right or has anything to do with temperature. Still it’s useful as a proxy to study ENSO, which is what I’m guessing that Moberg was interested in, in making the high frequency reconstruction.

    I honestly don’t know enough to say whether you’re right or wrong. I never had anything come up which made me look into the high frequency component of Moberg’s reconstruction. It’s probably bad I didn’t bother.

    (I’m in dangerous territory here. I was tearing apart my kitchen all day as part of a remodel, followed by a 22oz beer. I feel like I think bender would feel after cleaning mosher’s pool.)

    Don’t let me push you into anything. I can always find more subjects or details to discuss, but I never expect others to be as interested.

  71. I assumed she was scaling the series to match a(w t) + n(t) to her calibration signal.

    As I explained: I scaled to make the variance of the individual proxy match the variance of the temperature over the full period.

    I wanted to show something qualitative– we could rescale various ways and make various arguments. The important thing I wanted to show is:

    “Real” temperature does not have a record at the end.
    End of “green” series shows record. This is wrong.
    End of purple doesn’t. This is right.

    These results are invariant to multiplication by any arbitrary constant “c”.

    Monday, I’ll discuss hypothetical rescalings based on what you would know based on retained proxies in an empirical study.

  72. Brandon:

    Don’t let me push you into anything. I can always find more subjects or details to discuss, but I never expect others to be as interested.

    Nah this is what entertainment looks like to me!

    I had assumed something like standardization was the cause of the problem.

    I think it also can happen if you are using e.g. simple regression to compute the scaling factor for each series.

    P(T) = alpha T(t) + n(t)

    where P(T) is the proxy value and T(t) is temperature.

    If T(t) has an upwards slope to it and n(t) is red correlated noise, you will end up with a bias in alpha.

    “I think.”

    That’s something that I’m going to check now!

  73. Lucia:

    As I explained: I scaled to make the variance of the individual proxy match the variance of the temperature over the full period.

    Thanks!

    I did mange to work it out.

  74. Lucia, in your toy, what happens if a volcano or three alters the line shape with a swift drop and an exponential rebound?
    Can you recover a blip if you recalibrate the right hand side or does it get buried in the mud.

  75. Lucia, in your toy, what happens if a volcano or three alters the line shape with a swift drop and an exponential rebound?
    Can you recover a blip if you recalibrate the right hand side or does it get buried in the mud.

    If the volcanic eruption happens in the past, you should observed the qualitative effect in the pre-calibration period.

    If it happens in the calibration period– depends.

    The only purpose here is to show that if you end on an uptick, and you screen, screening will exaggerate the final uptick relative to what’s happened in the past. Screening doesn’t interfere with getting the qualitative shape of what happened in the past. Monday I’ll show that it can affect your estimate of the actual magnitude of variability in the past.

  76. Lucia:

    If the volcanic eruption happens in the past, you should observed the qualitative effect in the pre-calibration period.

    Yep that’s what I think too, though how much attenuation you get probably depends on the frequency content of the signal. I suspect some frequencies get more attenuated than others (almost has to happen).

    That’s why I’ve taken to call the reconstructed quantity “pseudo-temperature” that these reconstructions produce, though we’ve no guarantee that the quantity has anything to do with temperature, at least for trees—it might be better to just call it a “growth favorability index” because it really reflects how rapidly the trees grew in different periods of climate (assuming there’s even a signal in it).

    As I pointed out to Brandon, it’s possible that one can extract useful information about climate from such an index even if it’s not temperature per sé we’re looking at….

  77. Nick Stokes (Comment #97982)
    June 16th, 2012 at 3:18 pm
    “”Lucia,
    “because you threw-away more than 1/2 the samples because of low correlation”
    Why I asked about “mean of what?” is that I think it is a recurrence of the fallacy that this is a sampling problem, and you were trying to get a mean of a population. So the selection would be unrepresentative.
    But it is the very opposite of a sampling problem. You are trying to get very unrepresentative trees. You go to the ends of the earth to find trees that are unlike the others. And in the last stage of selection you reject trees because they still seem too representative.
    And when you’ve finally found a tree that correlates, then you can think abouut whether, for that tree, it’s reasonable to assume that the correlation that you see would have applied over millenia. The answer to that question has nothing to do with the trees you rejected.””

    Nick, when you have finally selected those rarer-than-hens-teeth, “…very unrepresentative trees”,
    do you think that you have anything other than the the very rare trees that correlate somewhat/slightly with the instrumental record? The answer *is* in the number of trees that you rejected. Monkeys and Shakespeare…

  78. Carrick:

    Nah this is what entertainment looks like to me!

    Glad to hear it. I’m always happy to entertain!

    If T(t) has an upwards slope to it and n(t) is red correlated noise, you will end up with a bias in alpha.

    Once you start introducing red noise, things get a lot more complicated. I wouldn’t care to speak on the matter unless I had sat down and tested/examined it first.

    As I pointed out to Brandon, it’s possible that one can extract useful information about climate from such an index even if it’s not temperature per sé we’re looking at….

    As I recall, that was the premise behind the S&B paper which caused such an uproar, wasn’t it?

    Now, I’m off to see what sort of effects happen if you filter/smooth a series before rescaling it.

  79. Carrick (Comment #97999) -“It’s been shown by von Storch btw that Moberg 2005 doesn’t have the deflation of variance present in MBH.”

    Please clarify: obviously it doesn’t have the same deflation of variance, but does this mean Moberg has zero deflation of variance, or just that it largely avoids the kind and extent of variance deflation present in MBH?

    To my mind, some variance loss seems almost inevitable. It is more a question of, can we minimize it to be negligible? Has it been shown that Moberg’s method should achieve that?

  80. Andrew_FL:

    Please clarify: obviously it doesn’t have the same deflation of variance, but does this mean Moberg has zero deflation of variance, or just that it largely avoids the kind and extent of variance deflation present in MBH?

    If you accept the results of the tests by von Storch, there is no deflation in the low-frequency component of Moberg’s reconstruction. He didn’t test the high-frequency component, and one could (at least theoretically) argue the tests weren’t adequate, but it’s basically as Carrick says.

    To my mind, some variance loss seems almost inevitable. It is more a question of, can we minimize it to be negligible?

    There’s no reason variance deflation should be inherent to reconstructions. You’d basically just need to ensure all processes used are consistent across time periods. Screening over one period causes variance deflation because it operates on only one segment of the data.

  81. Andrew_FL, here’s von Storch’s summary:

    The result of this exercise is shown in Figure 4 – several random cases as light lines and an average across all random cases as grey line. The reconstructions do not systematically overor underestimate low-frequency variability; instead the method operates without obvious biases and reproduces the low-frequency variability faithfully. However, in the periods with lowest temperatures (Late Maunder Minimum, the method shows a slight overestimation of the temperature anomalies.

    How well it does in reproducing temperature depends on how well calibrated the proxy is (whether it has a flat frequency response etc). As I mentioned, one of the “standard” calibration methods (ominously referred to as the correlation method) systematically underestimates the calibration coefficient in [frequency] regions with poor SNR, but there are variations on this method which don’t have this problem.

  82. Lucia, although I find the discussion about the last bit of the reconstruction interesting, your first point that “both reconstructions have lower amplitudes than the true oscillations” appears to overwhelm everything else in terms of consequence. Virtually all real proxy temperature reconstructions seem to show that little has happened to the climate except recently. Your little demo shows clearly why not only any form of splicing, but even just drawing a recent temperature record in the same graph with proxy reconstructions is highly misleading

  83. DocMartyn (Comment #97931)

    They are r2. The (-) is a memory aid for me for inverted datasets

    I thought this meant they have an imaginary component – no surprise there really.

  84. Cees–
    To plot with temperature you have to re-inflate to get your best estimate of temperature. Right now, my “proxies” should be labled something like “width” to make it clear that I haven’t done the step to make them temperature.

    With this groups property, I can do a step that scale them up. What will remain: The green will appear to suggest the recent tempertures are “a record”. The purple will be ok.

  85. jim

    And in the last stage of selection you reject trees because they still seem too representative.

    It’s this final step that introduced teh screening fallacy. Going to the ends of the earth is fine. There are things you need to do correctly to make that process fine– but if based on past information, you’ve come to believe that “trees of the species ‘warmius’ living in a ‘moist’ valley on the south face of mount ‘Imtoocold’ ” will be treenometers, you say in advance your reconstruction will use those, and then you collecte the cores and use them, that’s ok.

    But if you collect a bunch and then screen by correlation, you introduce a bias in your results.

  86. steveta_uk:

    I thought this meant they have an imaginary component – no surprise there really.

    Actually you can have negative R2. It’s the exception like linear regression where mathematically it’s positive definite and their is a correspondence to the linear correlation coefficient r where r^2 = R^2.

    Just though I’d point that out.

  87. lucia,

    Re “lucia (Comment #98034)” Yes, your right; your point is the same as that which I was making. But I will ask the same question: After saying that trees in special environments grow with temperature dependence, a dependence “known” because a short part of their tree-ring history somewhat correlates with most of a short temperature record. And after selecting some of those histories, the ones that correlate best with some part of an instrimetal record history, what is the argument that the rest of the tree-ring history is also equally temperature dependent? How is that confidence quantified? (I know that I should be asking this of Nick, not of you.)

    As for an example:

    http://www.ncdc.noaa.gov/paleo/pubs/darrigo2006/fig5.jpg

    (Looking at the individual proxy values, for each year, the confidence intervals of the functions would be large compared to the magnitude of the graph.)

    I’m old, FWIW; this sort of circular argument or deduction used to be confined to social sciences.

  88. Well is an intermediate result from my Monte Carlo’ing that shows the bias introduced by pre-screening.

    First here’s what a typical Monte Carlo looks like (the signal is the red line):

    Monte Carlo, slope = 0.01°C/year

    This is using a Monte Carlo model I had developed previously for simulating short-period climate variations. It’s not “adjusted” to realistically simulate proxy noise at this point, rather it assumes that proxy noise is a pure univariate representation of temperature—an “ideal proxy.”

    Here is the cumulative histogram of % of data versus r.

    You use this to work out what r to use to retain X% of the data.

    Here’s the effect of screening on the mean value of the slope.

    Basically retaining 10% of the data means picking the r so that you only keep “proxies” where r ≥ 0.55 (for this case).

    Because of this bias, if you use regression to compute the scaling factor between proxy and temperature, outside of the instrumentation region (which we assume is 1950-2000 in this case), you will be reducing the variance by 0.01 divided by the inferred slope.

    This shows the amount of descaling as a function of percent data retained.

    Note that you can correct your proxy series outside of the calibration region by multiplying back by one over this factor. This will inflate the variance of your signal and noise both. Unlike my speculation, it is not frequency selective, so that part is good.

    Going to more complicated algorithms like CPS isn’t going to save you from this scaling issue. It’s inherent in the correlation-based selection method.

  89. jim

    And after selecting some of those histories, the ones that correlate best with some part of an instrimetal record history, what is the argument that the rest of the tree-ring history is also equally temperature dependent? How is that confidence quantified? (I know that I should be asking this of Nick, not of you.)

    Those questions have almost nothing to do with the topic of this post. This post is based on the assumption that we believe that selection based on meta-data works. So we believe, based on the metadata that the correlation exists and will exist back into history.

    Of course, if this assumption is wrong, our results will be meaningless. But the degree to which one believes that assumption is not really something one can quantify with statistics.

    This analysis asked: In the best possible situation where that assumption is correct, what does screening cause to happen.

    It seems to me that Nick and others been trying their hardest to divert the subject to discussion of that assumption, so I am going to say that on this thread I want to forbid discussion of what happens if that “uniformity” assumption is false and/or whether or not that assumption makes any more sense for “treenometry” than it does for things like “the length of femur bones of horses over eaons”. The reason I want to limit that discussion in this comments block is that issue has nothing to do with the toy example.

    So on this thread:
    a) Don’t ask Nick what happens if that assumption is false.
    b) Nick– don’t answer that question.

    If you want to discuss things that related to your confidence in “the uniformity” principle, discuss that on some other thread. That way, we can focus on the fact that when this “uniformity principle” actually does apply, if you screen by correlation you introduce a mathemagical feature.

  90. Lucia, if we can assume that we have a “true” temperature proxy, I think the descaling is fixable by rescaling outside of the “training region” by a factor computed using a Monte Carlo.

  91. Because of this bias, if you use regression to compute the scaling factor between proxy and temperature, outside of the instrumentation region (which we assume is 1950-2000 in this case), you will be reducing the variance by 0.01 divided by the inferred slope.

    Yep. My next post is going to show what they look like if you find the inferred slopes. I was in the process of making graphs!
    🙂

  92. A different thought on this subject: Could pattern analysis be used as a some-what independent conformation of dendro-chronology temperature dependence? (Some-what independent of algebraic correlation)

    Pattern recognition regressions could be frequency independent, getting away from the frequency dependence arguments.

    The argument of strong tree-ring size or density dependency on temperature would suggest strong pattern correlation of proxies, to me.

  93. lucia (Comment #98043)

    lucia, thank you for the reply.

    I see that I’m muddying the waters! I’ll be quiet, and wait for the appropriate thread. Thanks!

  94. jim–
    I think “pattern recognition” would fall along the lines of VonStorch’s proposed “wiggle method”. It was the method Gergis intended to apply.

    This toy problem is limited to showing that the method Gergis actually applied, and which many in climate science apply and defend has this particular flaw. The flaw discussed here is a 2nd flaw that adds to the ‘carves hockey sticks out of red noise’ flaw McKitrick and McIntyre found.

    Neither of these toy analyses tell us anything about the ‘wiggle method’. That may work. Or not. I would have to think about it for a while to decide whether I can think up any problems with ‘the wiggle’ method, but I agree it would seem to be promising. Certainly, even if it turns out to have some sort of bias, it’s unlikely to carve out hockeysticks out trendless proxies. In the context of arguments about AGW, that would be a plus.

  95. So basically Nick is saying that trees that correlate to temperature in modern times will correlate to temperature in the past based on the uniformity principle, and that trees that do not correlate or diverge from temperatures in modern times will not correlate and will diverge in the past due to the uniformity principle, therefore trees that to do not correlate in modern times should be thrown out of the sample.

  96. It is perhaps rather late in the game to be asking this question, but I confess to being befuddled as to what are the general time series under consideration. When discussing a proxy, are people generally referring to a site chronology or other reconstruction, such as Yamal tree ring width or Vostok d18O? Or are people referring to time series of ring width from individual trees? Based on some of the arguments, there seems to be no distinction.

    If one focuses on the former then the perverse uniformity assumption becomes less relevant.

  97. Earle–
    This is a cartoon analysis. It is an extreme example of any of those.

    If one focuses on the former then the perverse uniformity assumption becomes less relevant.

    The bias discussed here assumes uniformity does apply.

    I think it is useful to discuss each contribution to uncertainty (or bias) in a method separately. To get people to focus on the specific bias highlighted here, on this thread, I am trying to get people to not discuss what additional errors are introduced when uniformity does not apply.

  98. Carrick– I wonder if there is some non-dimensionless parameter that could be concocted to collapse all of those?

    Certainly, if people want to screen based on (not detrended) temperature and ‘tree property’, they should correct for this effect — though of course, any correction is in the end going to be tenuous because it’s based on an assumption about whether the ‘treenometers’ really all have the same R but the distribution is due to sample, or whether at least some of the spread is due to real variations in tree response. (This is something that could be tested or estimated statistically.) That, and the magnitude of “R” itself will affect things. With some thought, someone could correct — possibly providing a closed for estimate of the correction. (Certainly, with monte-carlo, all is possible! 🙂 )

    But it may turn out no one “wants” the correction because after doing it the mathemagically conconcted “reportable” results disappear!

  99. There’s a couple of issues worth exploring, I think, which are

    1) is there a confound in the screening process between lower signal sensitivity and higher noise level (naively, I’d expect it to select out signals with higher than a given SNR),
    2) and if so, how do you accurately “rescale” your proxies for the selection bias effect?

    Since the trees don’t have the same sensitivity to temperature, it’s a foregone conclusions, you’d have to use different correct factors for each proxy. So how do you construct a test that self-consistently applies the correct “inflation” factor for each proxy?

    It seems to me to be better to use proxies, as advocated by Moberg, that you have some faith are true temperature proxies over the entire temporal range of the reconstruction, and use these as a basis for calibrating the tree-ring proxies.

  100. Carrick

    Since the trees don’t have the same sensitivity to temperature, it’s a foregone conclusions, you’d have to use different correct factors for each proxy.

    Assuming by “each proxy” you mean call “the proxy” the collection of “treenometers” from a particular site used to create a sort of reconstruction for that site.

    (I’m not always sure about vocabulary because with trees we could conceivably have:
    1) N trees for a site.
    2) M sites (possibly in a ‘region’.)
    3) O regions on the “world” (or hemisphere).

    You could — hypothetically– concoct a “method” that screen using correlation at any point. In the process, you bias which ever sub-sub-reconstruction, sub-reconstruction or reconstruction that was affected by this screening.

    The quantitative impact (and magnitude of correction required) all depends on details that I don’t explore in my posts here. The correction methodology might too!

    It seems to me to be better to use proxies, as advocated by Moberg, that you have some faith are true temperature proxies over the entire temporal range of the reconstruction, and use these as a basis for calibrating the tree-ring proxies.

    It would certainly be better to use proxies whose temperature dependence is based on something everyone accepts as existing and which everyone believes would persist throughout the entire range of the reconstruction. That at least eliminates everyone’s qualms about this “uniformity principle”.

  101. Lucia, it certainly is possible to construct a realistic network of tree-ring proxies using a Monte Carlo method, and by realistic I mean one that has all or most of the blemishes of a real tree-rring proxy network.

    The question is why would you want to???

    Wouldn’t it be better to adopt an approach that avoids the fallacy that your pre-screening operation is statistically neutral? If you use “real” temperature proxies that are based on well-defined physical processes, that is using the “real” uniformity principle and isn’t something that should be eschewed.

  102. Carrick

    The question is why would you want to???

    I suspect no one would want to and no one will want to.

    I think the problem is that:
    1) Some people don’t realize that screening can introduce bias. Instead they image all it does is through out “bad” data.
    2) It just so happens that the bias introduced matches their preconceived notion and results in an “exciting” publication. So, it gets published.
    3) Then others have to explain why they don’t believe the result– because of the screening fallacy.

    If, by doing lots of montecarlo that involved matching loads and loads of details to “correct out” the mathematical artifact, those who “like” the mathemagical artifact would just revert to not screening. So we are back to : If a result is real you should be able to get it without correlation screening.

  103. Lucia, you are addressing the reasons people do use prescreening, and I think this is about right. It’s because of a) ignorance of the statistical consequences, b) expectation bias, c) erroneous understanding of empiricism (e.g., misapplication of the uniformity principle).

    [I could use this as an opportunity to excoriate the bristly, incompetent, arrogant, conceited and hard-headed in the climate science community. Instead, we can keep this positive and talk about what does work.]

    By the way I’m more convinced the “result” is real than I”m convinced it’s a pure temperature reconstruction. Soon and Balianus come to mind at this point.

  104. Carrick (Comment #98091) —

    I have been thinking of “prescreening” as selecting trees on the basis of some factor prior to coring. E.g. species, location within 200′ of the treeline, thickness of trunk…

    It seems that this isn’t the way you are using the word, or perhaps the way the word is generally used. Screening (or prescreening) on the basis of tree-ring characteristics (correlation to temp record) is the issue being discussed in these threads.

    Others might have this vocabulary problem, too.

  105. Amac, sorry I was using short-hand here. By pre-screening , I mean “section based on correlation” before applying the algorithm that performs the reconstruction.

    There’s nothing wrong, and in fact I advocate it, selecting/screening trees based on meta data.

  106. I think Nick has a point when he asks for a precise definition of the fallacy. Lucia is correct in that he has been told a thousand times, but unfortunately he has not been told the same thing each time. I think it would be good for the discussion if she committed herself to one particular definition.

    The toy example is constructed so that the very problem the screening tries to solve is artificially eliminated. Fair enough, it’s reasonable to ask what happens when you screen in this situation.

    One of the several possible definition of the fallacy would be that “the screening introduces a bias in the time series”. And this toy model shows how that can happen.

    Great, now we have a (tentative) definition and an example.

    It is certainly true that the screening introduces a bias in the part of the time interval consisting of recent temperatures. These data tell us absolutely nothing about recent temperatures, since they have been contaminated and are now biased. Got it. No question about that.

    But we are not particularly interested in recent temperatures since that we have better methods for determining those. The whole purpose of this is to study ancient temperatures, not to study recent temperatures.

    And even in the toy example screening does not seem to introduce extra bias in the time interval that we are interested in. It might be that the time series becomes noisier than without screening, but that is a different discussion. That’s not about bias any more, which is what the tentative definition deals with.

    But maybe the true definition of the fallacy is something else? You see how important a precise definition is for communication.

  107. And even in the toy example screening does not seem to introduce extra bias in the time interval that we are interested in.

    Actually, it does introduce a bias when you try to rescale to get temperatures. See the next post were that problem is addressed.
    http://rankexploits.com/musings/2012/screening-fallacy-so-whats-the-past-really-like/

    Screening can introduce a bias– and does in many cases. That by itself is not the fallacy. Lots of things introduce bias– and that’s ok provided you know that they do and interpret your results accordingly.

    The fallacy arise when you either a)forget, b) never knew or c) ignore the fact that screening introduced and interpret the features introduced by the bias as “real” and make conclusions about what happened under the assumption that the bias does not exist.

    There are different ways to word this– but as far as I can tell, the same idea is being stated over and over and over.

    But maybe the true definition of the fallacy is something else? You see how important a precise definition is for communication.

    The problem is not lack of precision of language. You understood that it’s a claim about bias. The problem is that you aren’t recognizing that the bias in the proxy is introduced by screening and it was introduced in this example.

  108. How about avoid discussing the fallacy and talk about the real issue, instead? This being (IMO) the consequences of screening proxies based on correlation.

    That’s what is at the heart of this. Whether it is even a “fallacy” or “erroneous” to screen data this way depends on what are the consequences of that screening (and implicitly what the researchers do with the screened proxies), agreed?

    Why not stick with that?

  109. Carrick

    There’s nothing wrong, and in fact I advocate it, selecting/screening trees based on meta data.

    I agree.

    I would even go so far as equating this to saying “There is nothing wrong with picking the calibrated thermometers instead of the uncalibrated ones”. or “There is nothing wrong with saying we’ll only use temperatures technician Jane measured because technician Joe is a crosseyed, blearly eyed drunk who hasn’t gotten a new eyeglass prescription in two decades can’t read a thermometer. “

  110. Yep!

    There’s also nothing wrong with cherry picking data in general, as long as you have a model reason for picking and know how to handle the effect that has on uncertainty. (Cherry picking trees based on meta data is “good”, cherry picking trees based on temperature correlation is “bad”.)

  111. Lucia, your proxies all contain an actual temperature signal, even the same signal ratio to noise, so it is natural that screening will not do much good and that using them all would describe the signal at least as well.

    But what if only 15% of the proxies carried any actual signal or any useful signal, and the signal to noise ratio differs between the individual proxies?

    Would not screening better describe the actual signal in such a case by increasing the signal to noise ratio in the remaining proxies?

  112. ToddT–

    Would not screening better describe the actual signal in such a case by increasing the signal to noise ratio in the remaining proxies?

    Not necessarily. I haven’t done that problem because it’ a mixed one. But it’s now well known and accepted that if none of the proxies contain any temperature signal, screening can create a hockey stick out of nothing. So, you are wondering what happens if
    * 85% contain no signal at all and
    *15% contain a signal.

    Well… if 100% contained no signal: Screening can create a hockey stick out of nothing.
    If 100% do contain a signal: Screening exaggerates the uptick at the end.

    So what happens if we have 85% no and 15% signal? Something in between!

    The reason I’m not discussing that yet is that once you have a mixed problem like that, we have a big range of what could happen. So, you could have 15% with R=0.80 and the rest with no signal. In that case, I think you will be able to devise a screening where screening improves the result. But if the 15% contain a very weak signal– it’s going to be very, very difficult to come up with a screening method that improves stuff.

    BTW: I thought I was going to discuss this today. But I just created a decline!!!!! By screening!!!! I’m excited and I’m going to blog that today.

  113. Thanks Lucia, I have been thinking along similar lines:

    1. There is no signal – screening introduces one.
    2. All proxies contain signal – screening distorts signal.
    3. Some proxies contain signal – the right screening method MIGHT better describe signal.
    4. Borderline cases (95% contain signal – 3% contain signal) – ?

    The problem is that there is no way of knowing which one of these is actually the case for a particular series.

    So the result is meaningless – it might just as well be an artifact of screening as any actual signal.

    Noone knows.

  114. ToddT–

    The problem is that there is no way of knowing which one of these is actually the case for a particular series.

    There are statistical ways to diagnose the problem. They are based on the distribution of correlation coefficients R in your sample populations.

    They are probably even pretty good — or at least good enough to detect strong bimodality or just a little broadening etc.. Like all statistical tests one never ‘knows’ but one can test the hypothesis that all have the same R, etc. That’s what I planned to look at today. Robert Way wants to see it too– and so do lots of people.

    But I just created a decline by screening!!!! I usually avoided reading all the proxy papers and only do toy problems. But now I have to read them.

  115. So, I think i start to understand the mathematics of this model, and how it could lead to a “fallacy”. I’m still not convinced that this model is relevant for tree chronology, that is, I’m not convinced that this particular fallacy can show up there.

    The reason is the following. I think that the premise is that there is a relation between temperature T and ring width w, and that if “everything else were equal”, this relation could be written as w = f(T), for some suitable function f. Possibly f is linear, at least in a range. It doesn’t really matter.

    Now, not everything will be equal for these trees. There will be random fluctuations. We model this by replacing f(T) by some random variable f(T) + Q, where Q is stochastic. That is, we put

    w= f(T)+Q.

    Now, I think that a reasonable assumption is that the expected value of Q is 0. If we use a Q with expectation value very different from 0, I would like to see an argument for why this is reasonable.

    In the toy model, we write w= Rf(T) + sqrt(1-R^2) N where R is a constant, and N is a stochastic variable with average 0 . This means that we are using

    Q = (1-R)f(T) + sqrt(1-R^2) N

    The thing that worries me is that this Q does not have average 0, it has average (1-R)f(T), which could be anything.

Comments are closed.