Method 1: Unbiased but (maybe) shared error.

I have been participating in an email exchange with Bo Christiansen to discuss whether the methods I call “method 1” and “method 2” are unbiased based on my understanding of the definition of that word. It has been my contention that ‘method 1’ provide an unbiased estimate of the average temperature at M proxy locations and does so both when all proxies share the same correlation with the applied temperature field and that, moreover, the method is unbiased for both finite and infinite calibration periods. I’m going to show this in this post.

But in the course of discussion, I had an “aha” moment based an argument Bo made. I realize there is an issue with Method 1, which he calls bias, but which I would call ‘unbiased error shared by every point in the reconstruction’. I leave it to statisticiand to tell me if I’m using word correct. But at least for this post, the way I am using (or possibly misusing) a term is this:

  1. Bias: An error whose mean will not be zero if we repeat an experiment many many times.
  2. Shared error: If a ‘experiment’ results in a time series with N points, a shared error is one that is shared by all N points.
  3. Other sorts of weird errors that aren’t precisely ‘bias’ errors.

Note that with my usage shared error may be biased or unbiased.

Suppose for example, we do an experiment where the “truth” is know to be T=(1,1,1). During one experiment, we get T=(1+ε,1+ε,1+ε), so the entire amount is shifted by ε , I consider that a shared error because all data shared the same error. But we can’t tell yet whether it biased. To determine if the error is biased, we must repeat the experiment Q times, and determine whether shared error of ε has a mean of 0 over the repeated experiments. If the mean is 0, this would be an unbiased shared error.

I don’t know if this nomenclature is correct or incorrect. But I do want terms that distinguish between the different possible errors because they are different. Sometimes we need to know which type we are talking about. Any real statisticians who know correct terminology please let me know!

  ——   ——   ——

I’m now going to show a method I called “method 1” is biased but potentially suffers from a ‘weird’ sort of error. (For a time, I thought it was shared. But I did a little more algebra and convinced myself it’s just weird– but small and non-biased).

I’m going to do some math to discuss this. If someone can find errors (or highlight typos which generally do exist when I interlace html and LaTex, let me know. Now for the math.

(The nomenclature for methods 1 is described here.)

The organization is as follows:

  1. Describe the forward equation describing proxy response to temperature.
  2. Discuss how $latex \lambda_{i} $ is estimated from data.
  3. Discuss how we obtain a reconstruction using method 1 and show the result is unbiased
  4. Discuss the shared error in method 1 (and how it can be corrected.)

The forward equation: P v T.
(1)$latex \displaystyle P_{i}(T) = \lambda_{i} T + w_{i} $

where $latex \lambda_{i} $ is a property of the $latex i^{th} $ proxy, $latex w_{i} $ is random white noise with mean zero and standard deviation $latex \sqrt{(1-\lambda_{i}^2)} $ , $latex P_{i} $ is the reading from a proxy and $latex T $ is temperature applied to the proxy. (Note: The magnitude of the noise is selected such that the variance $latex var { P_{i} }= var { T } $ which also assures that the magnitude of $latex \lambda_{i} $ equals the correlation coefficient between T and P.)

When discussing proxy reconstructions, for later convenience, it will become necessary to temperatures in the time series measured at proxy ‘i’ from those that were applied to some other proxy. To denote these, we could equation (1) as

(2)$latex \displaystyle P_{i}(T^{i}) = \lambda_{i} T^{i} + w_{i} $

with $latex i $ as a superscript on “T” to indicating that this temperature was applied to the $latex i^th $ proxy, while emphasizing that the magnitude of $latex \lambda_{i} $ is independent of temperatures that happened to be observed during any particular experiment. This notation is selected because I wish to emphasize that the quantities $latex \lambda_{i} $ and $latex T^{i} $ are statistically independent and I would like to draw this attention at this point rather than later when the statistical independence is invoked to simplify a final result.

Linear regression to estimate $latex \lambda_{i} $
If we wished to determine the magnitude of $latex \lambda_{i} $ we could perform an experiment in which we collected N pairs of temperature denoting the each of the readings using a subscript ‘j’. In this case we would have N $latex ( T^{i}_{j}, P_{ij} ) $ pairs,

(3)$latex \displaystyle P_{ij}(T^{i}) = \lambda_{i} T^{i}_j + w_{ij} $

and apply ordinary least squares linear regression the $latex N $ (j subscripted) data pairs for the ith proxy to obtain an estimate $latex \lambda^s_{i,N} $ .

We know that the estimate $latex \lambda^s_{i,N} $ would rarely be equal to the target value $latex \lambda{i} $ but expect the error unbiased. That is if we expand the estimate into the target value and an error term $latex \zeta_{i} $ as

(4)$latex \displaystyle \lambda^s_{i,N} = \lambda_{i} = + \zeta_{i} $

and repeat the experiment to obtain a new estimate of $latex \lambda^s_{i,N} $ Q times, we will find the mean of $latex \zeta_{i} \to 0$ as $latex Q \to \infty $ .

Estimate the mean value of $latex \lambda_{i} $

Now, suppose we take $latex M $ independent samples from the population of all possible proxies that responds to temperature as in (1). For each proxy, we obtain a time series series of $latex ( T^{i}_{j}, P_{ij} ) $ at times $latex t_j=(1:N) $. From each time series we could obtain an estimate $latex \lambda^s_{i,N} $ for each of the $latex M $ proxies.

If we wished, we could estimate the mean value of the $latex \langle \lambda \rangle $ for the larger population by computing the average over the $latex M $ estimates based on times series with $latex N $ points (i.e. $latex \lambda^s_{i,N} $ ) . We will denote this $latex \langle \lambda^s_{i,N} \rangle_{i,M} $ ; here the angle brackets and subscripts $latex \langle \rangle_{i,M} $ represents a sample average over $latex M $ realizations of any random variable contained inside the brackets; the redundant index i is retained to remind us that the average is taken over the ‘ith’ proxy value.

Mean of P and T values at time ‘j’
We could similarly estimate the mean proxy response and mean temperatures observed at time period ‘j’ of our experiment. This can be done by taking a simple average over the $latex M $ proxy values at the $latex j^{th}$ point in the time series resulting in $latex \langle P_{ij} \rangle_{i,M}$ , $latex \langle T^i_j \rangle_{i,M} $.

More could be said of these variables, but my goal is merely to define both. They are they are merely the average value over the ‘M’ proxies at point ‘j’ in the time series.

Demonstrate that reconstructed temperatures obtained using Method 1 provide an unbiased estimate $latex \langle T^i_k \rangle_{i,M} $.
When performing a paleo reconstruction, the ideal goal is to obtain the best least squared unbiased (BLUE) estimate of some regional average temperature at time ‘k’ based on knowledge of the proxy values P at time ‘k’. (I have switched indices to emphasize our interest is to estimate values outside the calibration period. Those time were denoted ‘j’) To achieve this goal all other things being equal, we should prefer unbiased estimates over biased ones.

For our purposes we will assume proxies have been selected such that if we could measure the temperature at times ‘k’, the average $latex \langle T^i_k \rangle_{i,M} $ over those proxies would result in an unbiased estimate of that target temperature at time ‘j’. (How to correctly select proxies is beyond the scope of this blog post.) In this case, we wish to use method of computing the reconstructed value of T at time k that provides an unbiased estimate of $latex \langle T^i_k \rangle_{i,M} $.

One possible method method 1 discussed previously. Using method 1 discussed we estimate the temperature at at time ‘k’ as follows.

(5)$latex \displaystyle T^{rec}_k= \frac {\langle P_{ik} \rangle_{i,M} } { \langle\lambda^s_{i,N} \rangle_{i,M} } $

It is possible to show this is an unbiased estimate of the average of temperature over the M proxies, i.e. $latex \langle T^i_j \rangle_{i,M} $.

Substituting in (3), we obtain a

(6)$latex \displaystyle T^{rec}_k= \frac {\langle \lambda_{i} T^{i}_k + w_{ik} \rangle_{i,M} } { \langle\lambda^s_{i,N} \rangle_{i,M} } $

which is the sum of two terms.

Turning to the second term in the sum, because every value of $latex w_{ik} $ represents an error has zero mean, the average of this noise value over all M proxies has zero mean. The standard deviation will also diminish as $latex 1/\sqrt{M} $. So this term represents an unbiased error whose variance decreases as the number of proxies M increases. If the original noise each time series was white, loss of generality we can represent this as white noise at point j. It’s worth nothing that if the magnitude of the noise at point ‘k’ is independent of noise at all other points in the time series $latex w_{ik} $. This is a nice property for a paleo reconstruction where real variations may occur over low frequencies. In this case, this noise can potentially be reduced by smoothing over time.

The first term $latex \frac {\langle \lambda_{i} T^{i}_k \rangle_{i,M} } { \langle\lambda^s_{i,N} \rangle_{i,M} } $ is a bit more difficult to simplify.

Let us begin by examining the numerator: $latex \langle \lambda_{i} T^{i}_k \rangle_{i,M} $ .

Both $latex \lambda_{i} $ and $latex T^{i}_k $ are random variables. But recall that $latex \lambda_{i} $ is a property of the ith proxy. At the outset of the analysis when we wrote equation (1) we assumed this property is unaffected by the temperature $latex T $ one might apply to the proxy at any particular time. Consistency requires that we continue to assume so despite any intervening steps in which the magnitude of $latex \lambda_{i} $ was estimated from sampled values $latex ( T^{i}_{j}, P_{ij} ) $ values obtained at times (1:N).

Also the temperatures $latex T^{i}_j $ that might be applied to any proxy are utterly uninfluenced by the magnitudes of the proxy variable $latex \lambda_{i} $ . This means $latex \lambda_{i} $ and $latex T^{i}_j $ are independent random variables. As such, the two are uncorrelated. For any two uncorrelated random variables,

(7) $latex \langle \lambda_{i} T^{i}_k \rangle_{i,M} = \langle \lambda_{i} \rangle_{i,M} \langle T^{i}_k \rangle_{i,M} + \eta_{k} $

where the term $latex \eta_{k} $ is a random variable with mean zero; its standard deviation will approach 0 as the number of proxies M increases.

Collecting together the error terms, the result in

(8)$latex \displaystyle T^{rec}_k = \langle T^i_k \rangle_{i,M} + v_{k} $

where now $latex v_k $ is an error term with mean 0 and a standard deviation whose magnitude approaches zero as a rate approximately proportional to $latex \frac{1}{\sqrt{M}} $.

So, method 1 provides an unbiased estimate of the unweighted average of the temperature at the M proxies where I am using unbiased in the following sense: If this experiment was one that could be repeated with a fresh set of proxies, and we repeated it over and over, the distribution errors in the determination of the temperature at any point ‘j’ would have a mean of zero. The standard deviation of these errors would decline to zero as the number of proxies M increased.

Method 2 has shared error.
It’s worth noting however, that there is a small hitch in method 1 and that hitch is worth of considering. To understand this hitch, I will return to equation (7):

(7) $latex \langle \lambda_{i} T^{i}_k \rangle_{i,M} = \langle \lambda_{i} \rangle_{i,M} \langle T^{i}_k \rangle_{i,M} + \eta_{k} $

Let us now examine $latex \eta_{k} $.

Without showing the algebra, I’m going to claim that if we temperature at all proxies to set the mean to zero over the calibration period, then $latex \eta_{k} $ will consist of an ordinary, garden variety white noise term of the sort we expect in all experiments and an error that is of the form:

(8)$latex + \frac {\langle (\lambda_{i} – \langle \lambda_{i} \rangle_{i,M}) ( T^{i}_k – \langle T^{i}_k \rangle_{i,M} ) \rangle_{i,M} } { \langle\lambda^s_{i,N} \rangle_{i,M} }$

I claim it is unbiased in the sense I discussed above. That is: If we are selecting proxies at random from the vast universe of potential proxies and repeat the reconstruction using this method, the mean of the error will be zero. It’s standard deviation is also proportional to 1/M, where M is the number of proxies used in the reconstruction.

The other thing I can say is that if the true temperature at all proxies rose (or fell) uniformly relative to the calibration period, this error would be identically zero. So, this is an error whose magnitude is proportional to the divergence in warming (or cooling) across the proxies. It is also proportional to the divergence in response to temperatures values across the proxies. So if all proxies exhibited the same value of $latex \lambda_{i} $, the error would be zero.

So this is a bit of a weird error that needs to be considered. I hadn’t thought of it before, but email discussion with Bo Christiansen caused me to explore what happens when the behavior of each proxy is different and the temperature experience by each proxy is different.

(I still tend to think this method is likely better than method 2 which is biased and ‘hella noisy, but more thought might be required.)

161 thoughts on “Method 1: Unbiased but (maybe) shared error.”

  1. Lucia
    On semantics, the international guides for measurement uncertainty prefer to use “systematic” over “bias” (since “bias” has a definition for instruments.)

    D.1.1.6 systematic error [VIM 3.14] mean that would result from an infinite number of measurements of the same measurand carried out under repeatability conditions minus the value of the measurand.

    NOTES

    Systematic error is equal to error minus random error.

    Like the value of the measurand, systematic error and its causes cannot be completely known.

    For a measuring instrument, see “bias” ([VIM] 5.25).

    TN 1297 Comments:

    1. As pointed out in the Guide, the error of the result of a measurement may often be considered as arising from a number of random and systematic effects that contribute individual components of error to the error of the result.

    2. Although the term bias is often used as a synonym for the term systematic error, because systematic error is defined in a broadly applicable way in the VIM while bias is defined only in connection with a measuring instrument, we recommend the use of the term systematic error.

    TN1297 Appendix D Clarification and Additional Guidance
    Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results TN1297 pdf

  2. Lucia,
    This is a very interesting post. Ive been trying to do something similar. I’ll work my way through it.

    A sticking point for me has been something implicit in your Eq 1. You effectively assign a variance to T which you add to that of the noise to come to 1. My problem is that T isn’t at all stationary. You can calculate a variance for a period, but can’t really assume that it’s a property of the series as a whole. You’ll get a different value for each time segment that you look at.

    In real proxy, the calcs you describe following from (1) can only be done for the calibration period.

    Even though T isn’t stationary, I agree that the calc done assuming it is worth doing.

  3. You can calculate a variance for a period, but can’t really assume that it’s a property of the series as a whole.

    I agree the variance in T or P we compute for a period isn’t the true variance. But this has nothing to do with weather T is stationary. Even if both are perfectly stationary, the sample variance is an estimate of the true variance.

    In real proxy, the calcs you describe following from (1) can only be done for the calibration period.

    I could place an arbitrary multiplier in (1) to generalize and work through to see what that does. But I think the problem has nothing (or little) to do with “stationarity”, but rather just finite number of proxies and time period.

  4. On latitude dependence, I should clarify that such dependence would come under “systematic error” if it has not been included, but it becomes part of the model once it is recognized and quantified.

  5. Lucia,
    Stationarity of T is an issue. If you calculate the sd of a linear function, the result is proportional to the segment length. I did a calc for Hadcrut 4 NH for time periods of (50, 60, 70, 80) years going back from 1980 (plausible calib periods) and the sd’s were (0.116, 0.127, 0.172, 0.189).

  6. Lucia,
    In your Eq (5), you say you have an unbiased estimator. I think that’s true as far as summing the proxy values is concerned. But you’ve inverted λ. Now you had a claim to unbiased λ, but that doesn’t mean unbiased 1/λ. Inversion turns noise into bias. And producting with P will also create bias. Any nonlinear op creates second order terms from the noise that don’t sum to zero.

  7. Not sure I follow Nick. If you compute E[lambda], then take 1/E[lambda], if E[lambda] is unbiased then so is 1/E[lambda].

    What’s wrong with that argument?

  8. Carrick,
    It’s basically your Taylor series argument. If λ=E(λ)+w
    where w is noise, E(w)=0, then
    E(1/λ)=E(1/(E(λ)+w))=1/(E(λ)-0+E(w*w)/(E(λ)^3+…

  9. Nick–
    But in (5) it’s not E(1/λ). It’s just 1/E(λ). The expectation is already taken.
    The E(1/λ) issue happens in method II. That’s why method II is biased for finite numbers of samples.

  10. This is getting quite interesting from a theoretical point of view. It seems to me that there must be some statistical literature on this question. It is fine to do testing using software, but the normal distribution has been studied so thoroughly that it should not be necessary to use this approach, one should be able to calculate the answers analytically. That is, one should be able to get expressions for the bias of various methods depending on the screening etc. expressed as functions in the amount of screening.

    I tried browsing around arxiv.org, but didn’t find much. There is some stuff there on screening, but for a non-expert like me, it’s not obvious that it’s directly relevant. What I did find was work on linear regression for a sum weighted by coefficients of normally distributed stochastical variables, where the problem was to estimate the coefficients. Maybe that can be reformulated to our problem somehow, but I don’t see how.

    Where do statisticians usually publish theoretical work?

  11. Nick, see Lucia’s comment, except she meant to say ” The E(1/λ) happens in method II.”

    In method 1, you take the expectation value of λ before reconstructing the series.

  12. mb, you don’t necessarily need to got to sources of authority. The method one uses is pretty well studied already.

    For example, let $latex \lambda = \lambda_0 + n$ where $latex n$ is a normally distributed number obeying the probability distribution $latex {\cal P}(n) = {1\over \sqrt{2\pi}} e^{-{1\over2}(n/\sigma)^2}$, then:

    $latex E[\lambda] = \int_{-\infty}^\infty {\cal P}(n) (\lambda_0 + n) dn$.

    But $latex \int_{-\infty}^\infty {\cal P}(n) dn = 1$ by definition for any $latex {\cal P}(n) $, and $latex \int_{-\infty}^\infty {\cal P}(n) n dn = 0$ by assumption.

    The problem you run into is that

    $latex E[1/\lambda] = \int_{-\infty}^\infty {\cal P}(n) {1\over \lambda_0 + n} dn \rightarrow \infty$,

    plus it diverges logarithmically, so it isn’t readily apparent with Monte Carlo simulations that there is even a problem with this method.

    When computing $latex E[1/\lambda]$ you have to pre-screen the data or the method is ill-defined. (And of course, you can compute the expectation value if you for example require $latex \lambda \ge \lambda_{cut} > 0$, or as I’ve suggested, $latex \lambda_{min} < \lambda < \lambda_{max}$, which would be closer to unbiased.)

    It's possible to build up other methods, like the RMS value of $latex 1/\lambda$, that is $latex \sqrt{E[1/\lambda^2]}$.

    [Note on the other hand if $latex P(n) = {n\over \sigma^2} e^{-{1\over2}(n/\sigma)^2}$, with $latex P(n) = 0$ for $latex n 0$ are both defined.]

  13. mb

    That is, one should be able to get expressions for the bias of various methods depending on the screening etc. expressed as functions in the amount of screening.

    I only want to show it exists and to get order of magnitude calculations first. I’m not sure the exercise of getting closed form solutions for the cartoon problem is worth the effort. Closed form solutions worth developing if they are someday going to be applied. But if the goal is to first identify potential problems so as to avoid a method that is problematic and then you actually decide not to use that method, then you don’t want to do all the work to develop equations that estimate the magnitude of the problem under a huge variety of circumstances under the assumption the noise is gaussian white noise.

    Whether the issues with ‘method 1’ are fully explored likely depends on whether the method is widely used (or even has ever been used). If it has not been used, you are unlikely to find much literature on the method.

  14. Carrick – in your 8:19 am comment 98920 is your first expression for P(n) correct? Should you have the sigma denominator in the exponent? Apols if I’m off track as I’ve not been following this closely. mb’s comment caught my eye so I read your follow up. I’ve not followed the development of the arguments but mb’s comment reflected my own pov.

  15. Curious:

    Carrick – in your 8:19 am comment 98920 is your first expression for P(n) correct?

    You’re right. I left off a factor of $latex \sigma$. Latex formatting makes it easy to mess up formulas. This is correct (*coughs*):

    $latex \displaystyle P(n) = {1\over\sqrt{2\pi} \sigma} e^{-{1\over 2} {x^2\over\sigma^2}}$.

  16. Lucia, about the only thing I would add to your comment, is sometimes it’s difficult to notice obvious problems with a method if you don’t do sanity checking. E.g., does E[…] exist?

    Logarithmic divergences can be really hard to spot when you Monte Carlo. I wonder is Bo is aware of the problems with $latex E[1/\lambda]$ for example.

  17. Carrick> I suppose the integral is not even diverging towards $latex +\infty$, since the function is both positive and negative close to $latex -\lambda_0$. It’s just undefined.

    If we do the screening, it will be defined, but there is (probably) a bias. The bias will depend on $latex \lambda_{cut}$ or on your $latex \lambda_{min} Well, we do want to understand the emergence of bias in various situations? A general theoretical framework would be helpful there I think. And maybe there are also other ways of estimating T, that has been studied in the literature?

    Well, maybe not, but there’s no harm in asking…

  18. Sorry, it seems that part of my last comment got eaten and my sad attempt at Latex failed completely.

    Anyhow, isn’t it a well defined problem to compute the bias of say method 2 in dependence of the screening rule? We agree that there is a problem here, but it’s not obvious (to me) how big the problem is.

  19. Lucia> Why can you assume that the variance of the white noise w is related to lambda? I understand that it is convenient, but isn’t it a very special case? Or am I missing something?

  20. Logarithmic divergences can be really hard to spot when you Monte Carlo. I wonder is Bo is aware of the problems with E[1/\lambda] for example.

    He is now…. 🙂

    Actually, I think he always was. But I’m not sure if the awareness was right out in front of him (if that makes any sense?) Sometimes you are more aware of the points someone already brought up and so you are set to counter argue those. It can make it difficult to recognize that a stranger is bringing up something else.

    I suspect (but I’m not sure) that when Christiansen read my first post, he sort of thought I was bringing up the topics those who wrote comments to the journals brought up. But really, those issues were in some way “more sophisticated”. I’m worrying about something that is in some ways “simpler”.

    We’re discussing back and forth by email. One difficulty is equations in email. Another is terminology and precisely what one means by a word.

    I agree on the problem with Monte Carlo. It is good for quantifying issues but you can miss things too. Also– while I quite like Christainsen’s use of pseudo proxies, I think that to clarify issues, a series of “limiting cartoon” problems is useful. Limiting cartoons that are useful include:

    1) “the truth” is featureless & trendless + noise. The noise may be white, red, complex, but other than that, what happens.

    2) “the truth” has large scale features. It might have noise or not. But this is the case I’ve been looking at with the oscilltion cartoon.

    3) “the truth” is piecewise linear with 1 patch. It might have noise or not.

    4) The truth is like (2 or 3) but individual proxies diverge in mean temperature.

    There may be others. But we know the cartoons have useful for showing:

    1) Some methods can carve hockey sticks out of trendless noise.
    2) Some methods can exaggerate or suppress the max/man associated with slow frequency changes in the past.

    I’d suggest that the 3rd and 4th methods would be useful for showing:
    3) The same thing that (2) shows but to people who don’t ‘get’ the fact that problems with (2) translate into getting the rise over the past millenium wrong. and
    4) Some features of the sort of noise described by equation (8) above. I consider that ‘unbiased’ but it is a little bit ‘weird’.

  21. mb– You can’t just wrap in $ signs. You have to put the word latex after the first $ sign. (I can’t show in a comment because it will latexify everything after!

  22. Re: lucia (Jun 29 10:50),

    Actually, you can. If you use HTML code for symbols they don’t work like they would if you type them correctly. HTML symbols start with an ampersand and end with a semicolon. So if you want to show latex code, use ampersand #36 semicolon which displays as $. The # defines the number following as decimal rather than hex. So to show the code for $latex \lambda_{min}$ would be $latex \lambda_{min}$ (If I got the latex correctly).

  23. “The other thing I can say is that if the true temperature at all proxies rose (or fell) uniformly relative to the calibration period, this error would be identically zero. So, this is an error whose magnitude is proportional to the divergence in warming (or cooling) across the proxies. It is also proportional to the divergence in response to temperatures values across the proxies. So if all proxies exhibited the same value of \lambda_{i} , the error would be zero”

    Spot on. However, we know from the thermometer data that about 50% of the readings, with in a smallish area, record the rates of temperature change Mean/2 to 2xMean; a factor of four.
    So one proxy may ‘see’ four times the temperature change as another proxy, and yet have the same signal to noise ratio.

  24. Doc

    So, this is an error whose magnitude is proportional to the divergence in warming (or cooling) across the proxies.

    Yes. In a single instance or a reconstruction, that’s what you get.

    So one proxy may ‘see’ four times the temperature change as another proxy, and yet have the same signal to noise ratio.

    Hmmmm… I have to think about this.

    Do you think that proxies that warmed the most over the full period will tend to have higher signal to noise ratios over all? I assumed there is no such association. That’s why I claim this isn’t biased over all. That is: I would expect if proxies abounded and you could pick a whole bunch out of a hat, the ones in places that warm more won’t have higher signal to noise ratios than the places with lower signal to noise.

    But if my assumption is not true and proxies in places that warm more have greater signal to noise ratio than proxies in places that don’t warm…well… this could be biased.

    Bear in mind, we have many types of proxies we also have different types of proxies i.e. different trees, isotopes, varves etc. Also, as far as calibration goes, a good deal of the variability has not been in any linear trend but in the “weather noise”. These factors favor my assumption.

  25. Please do not be annoyed Lucia, but have a question about the n.

    Imagine you have 100 proxies. One measures the correlation with the temperature and ranks them from best to worst fit.

    Then you do a pseudo-Jack-knife.
    You start putting the series together using the ‘best’ proxies first and after adding a new series, you estimate lambda.
    A plot of lambda vs n, using a real set of proxies, will have a phenomenological lineshape.
    A plot of lambda vs n, using synthetic proxies, will have a lineshape that is a function of how you added noise to signal.
    Can you see if you are adding, at each n+1, signal plus noise or noise?

  26. DocMartin

    Can you see if you are adding, at each n+1, signal plus noise or noise?

    I have no idea. Why do you ask? On this:

    You start putting the series together using the ‘best’ proxies first and after adding a new series, you estimate lambda.

    You now have an estimate of lambda based on the ones you kept. Those are the ones that had the best correlation during the verification period. They may or may not really be the ‘best’ ones.

    A plot of lambda vs n, using a real set of proxies, will have a phenomenological lineshape.
    A plot of lambda vs n, using synthetic proxies, will have a lineshape that is a function of how you added noise to signal.
    Can you see if you are adding, at each n+1, signal plus noise or noise?

    Out of curiousity, are you trying to suggest that we could infer something based on the difference between the shapes of the two lines? maybe we could. Or maybe we couldn’t. I suspect we would have to make assumptions about the type of ‘noise’ in equation (1) or have some theory about the distribution of true λs.

    Do you have a proposed answer to your question?

  27. “Out of curiousity, are you trying to suggest that we could infer something based on the difference between the shapes of the two lines? ”

    Yes I was. I suspect that if you generated different simulated proxies, with known ‘noise’, then you would get different lineshapes for different noise colors.
    With different types of noise as you increase n, the change in lambda ‘should’ be different.
    By ‘should’ I mean that I am making this up as I go along by imagining what the plots should look like instead of actually getting round to learning ‘R’ so I could do this sort of thing myself.

  28. Yes I was. I suspect that if you generated different simulated proxies, with known ‘noise’, then you would get different lineshapes for different noise colors.
    With different types of noise as you increase n, the change in lambda ‘should’ be different.

    It would not surprise me if the shape of lambda v. n would be different for different types of noise. I have no idea what I would learn from this.

  29. Lucia, I have not been following closely your toy models and thus should probably not comment on this thread – but I will anyway.

    I assume you have simulated cases where the proxies are truly responding to temperature in the calibration and historical periods and thus the proposition is: given a valid proxy thermometer if you select based on correlation of the proxy response to temperature in the calibration period what happens in the historical period. I also assume that you are not dealing with a deterministic trend in the calibration period and thus detrending is not an issue.

    While I agree that these toy models are very useful in understanding the weaknesses (strengths) of the selection process given a valid proxy thermometer, I still think that the bigger issue is demonstrating an a prior criteria for what could be a valid proxy thermometer based on the physics of the matter and then testing that hypothesis. In other words, when you mix your discussion and analysis with someone who is truly either neglecting the selection fallacy or not aware of it in assuming that the proxies are valid thermometers, I think you give the poser a step up to your point where you know the truth and the connection between the proxy and temperature and for all time.

  30. Kenneth Fritsch

    I also assume that you are not dealing with a deterministic trend in the calibration period and thus detrending is not an issue.

    Why would you assume that? If the only data we had was a deterministic trend, the data would appear to have a deterministic trend in the calibration period.

    I still think that the bigger issue is demonstrating an a prior criteria for what could be a valid proxy thermometer based on the physics of the matter

    I don’t disagree with you. But it’s not the one I’m looking at.

    In other words, when you mix your discussion and analysis with someone who is truly either neglecting the selection fallacy or not aware of it in assuming that the proxies are valid thermometers, I think you give the poser a step up to your point where you know the truth and the connection between the proxy and temperature and for all time.

    I’m not sure what you are saying. Are you trying to say you are uncomfortable with someone discussing methods that would apply if treenometers were valid proxies because you think we should be discussing whether treenometers even are valid proxies?

    I realize many people would rather talk about the problems inherent in the assumption that treenometers even are treenometers. There really isn’t much “analysis” to be done on that though. If you one cannot count on trees to behave uniformly over time, they you will not believe any paleo reconstructions that use trees. There isn’t much more to be said on that.

    My posts don’t happen to be discussions of why we should not believe treenometers. We are seeing more and more attempts to create reconstructions using something other than trees. In that regard, I think it’s worth looking at the methods.

  31. “Why would you assume that? If the only data we had was a deterministic trend, the data would appear to have a deterministic trend in the calibration period.”

    Do your simulations have deterministic trends? The one I looked at appeared to be a sine wave.

    “I’m not sure what you are saying. Are you trying to say you are uncomfortable with someone discussing methods that would apply if treenometers were valid proxies because you think we should be discussing whether treenometers even are valid proxies?”

    I see problems with proxies other than those produced from TRW and MXD. Further I do not see where screening would be used for proxy thermometers that were validated by proper methods. It would appear that screening is the lazy person’s approach: I do not understand the physics of why this proxy is a thermometer therefore I shall look for correlations of my proxy to temperature and that will be my basis for a good thermometer during calibration and all of history. Never mind that that correlation is low and thus explains only a small part of the effect that could be due to temperature and thus ignored is all those other efects at work and whether they might change over time. And never mind that those correlations could happen by chance.

    There is often a disconnect between those providing the proxy data and better understanding it and those who use it to construct temperature reconstructions as they are more into data manipulation and attempting to tease out a signal no matter how weak it might be.

  32. I agree that we should look closely at the reconstruction methods. I would question the whole idea of using least-squares linear regression to estimate any relationship based on noisy data. The choice of the dependant variable introduces an enormous bias, since the slope of the derived line is artificially compressed towards the independant axis.

    In other words, if you choose a regression based on an equation of the form T = m * P + b, where P is the proxy, the slope m will be lower than the actual relationship. But if you use P = s * T + i, the slope s will also be too small. Instead of s = 1 / m, ie the inverse equation, it is actually s = r^2 / m, where r is the correlation coefficient. With no noise, the two slopes are the inverse, and m / s = r^2 =1.

    A truly unbiased estimate of the actual slope would be m / r, which would be the same no matter which variable was chosen as dependant.

  33. mb:

    Carrick> I suppose the integral is not even diverging towards , since the function is both positive and negative close to . It’s just undefined

    No, it’s actually logarithmically divergent.

    The integral involves the exponential integral function Ei(x), which has an expansion about $latex x = 0$ with $latex x > 0$,

    $latex \hbox{Ei}(x) = (\log (x)+\gamma)+x+\frac{x^2}{4}+\frac{x^3}{18}+\frac{x^4}{96}+\frac{x^5}{600}+O\left(x^6\right)$

    It’s the $latex \log(x)$ term that gets you in trouble.

  34. Craig,
    “The choice of the dependant variable introduces an enormous bias, since the slope of the derived line is artificially compressed towards the independant axis.”

    That’s why I tried Deming regression. Hegerl et al use this under the name of TLS. It doesn’t solve everything though.

  35. Craig:

    In other words, if you choose a regression based on an equation of the form T = m * P + b, where P is the proxy, the slope m will be lower than the actual relationship. But if you use P = s * T + i, the slope s will also be too small.

    If I get what you are saying, both T and P have noise associated with them, and so no matter which variable you regress against you get a compressed scale. If P has measurement noise, but not T, “s” in P = s * T + i would be unbiased, at least if the measurement noise in T were uncorrelated and normally distributed (this is a sufficiency condition).

    If both P and T have measurement error, and the errors can’t be neglected, I think I agree with this. For typical paleoclimate reconstructions, P typically has much larger variability than T, so I surmise that the bias associated with noise in T would be small compared to the uncertainty introduced by the noise in P.

    Something that would need checking using “realistic” synthetic proxies.

  36. Sorry make that:

    If P has measurement noise, but not T, “s” in P = s * T + i would be unbiased, at least if the measurement noise in P were uncorrelated and normally distributed (this is a sufficiency condition).

  37. Carrick:
    I think that noise in either variable will result in an estimated slope biased toward the independant axis. This is because the regression procedure mimimizes the square of the differentials between the observed P and the calculated one.
    I don’t know that I agree that the proxy measurement is inherently less accurate than the temperature, since we are not actually measuring the temperature at the time and place that the proxy was formed.

    Nick:
    It looks like the Demming regression should be unbiased. A long time ago, I tried to derive a method to minimize the orthogonal distances to the line, but couldn’t hack the math. If the noise on both variables is similar, the calculated slope should approach the m/r value.

  38. Craig–
    The regression issue is one that can be explored later. Right now, my toy problems have no noise in the ‘measurement’ of the Temperature variable. But later, I can add noise to that and use Deming.

    I’m not familiar with Deming– but does that assume you want to minimize distance perpendicular to the line? Would that be assuming the errors in both variables are equal? (Nick you looked at it. Is it?)

    It seems to me which error has the largest magniitude can go either way. (Certainly, if the correlation is with global temperature, it can be very large on both.)

  39. Craig/Carrick
    This following is incorrect:

    I think that noise in either variable will result in an estimated slope biased toward the independant axis.

    The following is correct:

    If P has measurement noise, but not T, “s” in P = s * T + i would be unbiased, at least if the measurement noise in P were uncorrelated and normally distributed

    You can run monte carlo and see that what Carrick said is correct. Also, even if the noise is red, arima etc. Carrick’s result holds.

  40. Lucia,
    I have been traveling for a couple of days and so not followed closely. It appears the there is (maybe) a concensus on at least several key points from your synthetic data models. Might a brief summary of the main conclusions make sense at some point?

  41. SteveF–
    A brief summary would make sense– but I need to collect together a number of points. Much of this is so I can understand what features are introduced by methods, so I just have to apply them and then have a look.

    People will be most interested when I apply this to data that is more similar to paleo data. But I prefer the order of understanding methods first. (It seems to me a lot of people want to jump to applying methods to the paleo data first. But I’m not comfortable doing that. I want to have some idea what the methods are susceptible to first. )

  42. Bias: An error whose mean will not be zero if we repeat an experiment many many times.
    =========
    Bias result’s from a change in the probability distribution. It can thus affect the mean, the deviation, and the error estimate.

    Thus, if your method leaves the mean unchanged, but changes the error estimate, then it has introduced bias.

  43. DeWitt Payne (Comment #98943)

    Thanks for the link, DeWitt. That hockey stick – even if pointed upward – and series would have little visual appeal in a show casing by the IPCC.

  44. SteveF (Comment #98951)

    “Might a brief summary of the main conclusions make sense at some point?”

    I agree that such a summary would be helpful. It is something that I find Lucia better at than many who who present similar analyses on blogs, but I was going to make the same suggestion.

  45. Lucia,
    “Would that be assuming the errors in both variables are equal?”
    You need to supply an estimate of the ratio of error variances. Which generally means you estimate the variances. I think it is effectively the minimum dist line in normalised variables.

    I don’t think “error” for this purpose means something identifiable like measurement error, or noise you added while constructing. The algorithm doesn’t know anything about where what it sees as noise came from, but still creates bias.

    I think it’s more a frequency issue. Nonlinear processing shifts frequencies and turns noise (or oscillation) into bias, like the 1/λ I described above. Similar to audio demodulation in radio.

    I estimated the “noise” for Deming ratio purposes by the deviation of each from a loess smooth (f=0.4). I’m sure just deviation from the mean is used (sd) but I wanted to avoid the stationarity issue. Fancy detrending.

  46. Nick–
    Thanks. I guess when (or if) I look at Deming, I’ll run some synthetic tests to see what it does and also consider methods to estimate the ratio of the variance. Obviously, the choice of least squares is estimating that all the variance is in P and none in T. We might be able to get an estimate of the variance of error in T from what we think we actually know about the magnitude of the measurement error and what we think we know about correlation of R2dx=e[T(x)T(x+dx)]. People creating temperature series seem to think they have some clue about both.

    Pegging one error might give a sufficient constraint to permit estimate of the ratios. (One might need to iterate in application. Or it might be possible to do closed form. I don’t know enough about Deming to say for sure. But it would be a constraint, based on external estimates rather than fitting (P,T, time) data itself to a “smooth” whose smoothing parameter is selected somewhat arbitrarily. (After all, the smooth of T v time smooths, but that doesn’t mean that the smooth T(t_i) is the actual temperture in year ‘i’. The tree did respond to the actual temperature. Weather really is noisy. So, the variations about the smooth will include both measurement error and “weather noise”. With respect to fitting P v T, only the ‘measurement’ error is error. If you take out the ‘weather noise’ from the reported value, you introduce error into P,T.

    FWIW: I suspect “measurement error” is close to white.

  47. Kenneth–
    Yes. But I can only ‘summarize’ when I’m at a point where I think a ‘summary’ makes sense. Right now, where we are is:
    1) There are many ‘parts’ to putting together a reconstruction.
    2) If we set aside the question about the uniformity principle, we are left with the question of ‘picking methods’.
    3) Each method has at least these parts each of which has “features”:

    Method Choice I: Pre-selection method for choosing proxies. (i.e. use meta data etc.)

    Method Choice II: Regression method for P v T. (i.e. do we regress P = mT or T = aP or use deming? If Deming, how do we estiamte variance in noises? Etc.)

    Method Choice III: Choice of estimating the global or regional temperature in the past based on those regressions. (So is T=mean(P)/mean(m) or T=mean(T/m) or something else.

    Method Choice IV: After having made choices on meta-data, will we downselect basedon on something? (i.e. Throw away outliers? Only keep “really good ones? How do we do that.)

    The summary so far is only this:
    Related to choice IV: We can show down selecting based on correlation introduced bias and that bias is to reduce the low frequency variability in past reconstructions.

    Related to choice III : W can show that different methods of estimating based on (P,T) from the proxies can also bias the results. Also, even when some of the results are not ‘biased’ the unbiased result can have some ‘features’ that will affect our thinking.

    That’s the ‘summary’. I can’t do any more of a summary than that. And I want to spend time looking at things related to II, III and IV. To have time to do that, I don’t want to stop “summarize” because there is no “summary” beyond what I wrote above.

    Sometimes, you do need to go into the forest and focus on the individual trees before you “summarize” your findings about the health of the forest. While some might want me to dash out of the forest and give a panoramic view, right now, I am focusing on ‘the individual trees’. (But that is a metaphor. The literal truth is I am absolutely not going to spend time looking at what I think if a bristlecone pine yet!)

  48. If bias is defined as an expected error approaching zero, would not all lines going through the point (averge x, average y) be unbiased? I was using the term unbiased to mean invertable, where the derived line was the same no matter which variable was chosen as the regressor. As far as I can see, the only line with this property has the slope m/r, where m is the slope calculated by linear regression.

  49. Craig Hamilton

    If bias is defined as an expected error approaching zero, …

    That’s how I define it. If you create data sets of (P,T) using

    P=mT+ e

    with T known perfectly and e gaussian white noise, and estimate mest using ordinary least squares the the expected value of m E(mest)=m. This is a well accepted theoretical result, taught in numerous undergraduate statistics, and if you don’t believe it you can run montecarlo in the statistics package of your choice and see that when you do the thing N times to get N estimates of ‘m’, the average over all “N” fits will converge to the true value of ‘m’.

    would not all lines going through the point (averge x, average y) be unbiased?

    I have no idea about the second part of this question.

    unbiased to mean invertable,

    Unbiased doesn’t mean invertable. As far as I am aware, the two things have nothing to do with each other.

    Unbiased means if E(mest)=m. Unbiased is a very desirable quality in a statistical fit. Invertability — not so much.

  50. “While some might want me to dash out of the forest and give a panoramic view, right now, I am focusing on ‘the individual trees’. (But that is a metaphor. The literal truth is I am absolutely not going to spend time looking at what I think if a bristlecone pine yet!)”

    Lucia, thanks for the update. I would not expect you to spoil the ending.

    As for endings I was thinking more of finding that single very old tree that can be shown to respond to the global mean temperature with an r=0.999 (trended or detrended) and a p value=10^-100. No selection is required because it is the TREE. The only debate is how to use it to reconstruct past temperatures and a toy model shows the way.

  51. “with T known perfectly and e gaussian white noise, and estimate mest using ordinary least squares the expected value of m E(mest)=m.”

    Which, of course, means that a proxy even with a very low but statistically significant correlation with temperature would make an excellent thermometer providing the noise is white and gaussian. Which in turn would mean that all the factors known to affect a proxy response, other than temperature, occur randomly over the shortest period of valid temperature measurement claimed for the proxy thermometer.

    That would be a good example to use for a beginning statistics student who might not be yet prepared to deal with the harsh realities of that other world out there.

  52. Kenneth–

    Which, of course, means that a proxy even with a very low but statistically significant correlation with temperature would make an excellent thermometer providing the noise is white and gaussian.

    You would need many replicate independent samples to average over toachieve “excellent”.

  53. Somewhat OT

    Lucia stated: “Obviously, the choice of least squares is estimating that all the variance is in P and none in T. ” Not to detract from the “toy” modelling. But biological proxies intergrate the temperatures in a non-linear method from T(grow min T) to T(grow maxT ) that reflects the minimum temperature their protiens and enzymes can work and the maximum temperature that their proteins and enzymes can work. Each P(i) or P(i,j) depnding on definition, can have a different “ideal” range of temperature response. But this should be a later toy model. In his paper or at tAV, IIRC, of Christensen & Ljungqvist 2012, pointed out evenif one gets rid of the bias (compression) in some methods, least squares AND phenotypic expression will still mean the signal that has extracted is compressed.

  54. This is completely off topic 🙂 Though you might end up understanding why there are issues with paleo and modeling.

    http://redneckphysics.blogspot.com/2012/07/plumbers-nightmare.html

    Plumber’s Nightmare is my basic model of the hydrology cycle that is core to the Atmospheric effect. No radiant model will provide consistent results until it properly considers the core hydrology cycle.

    Since paleo reconstructions are general derived from proxies on the edges of the frozen boundary, they would provide valuable information only if they consider the advance and retreat of ice sheets, I think Steven McIntyre mentioned timber line would be better than tree rings? He was correct, but combining both, the trees do a pretty good job. This basic model also includes two ocean layers, but is modified to allow bi-direction heat flow that results with sea ice advance and decline.

  55. “You would need many replicate independent samples to average over toachieve “excellent”.”

    I should have added what you say here, Lucia, to my post above. I do think it is important to point out that, either implicitly or explicitly, behind the thinking of those doing temperature reconstructions is the proposition that this averaging over noisy proxies (even with low correlations to temperature in the calibration/verification period) is what (they think) gives at least decent proxy thermometers.


  56. Are you saying that green was obtained by adding white noise to the red trace and smoothing? Or what? If it’s just white noise added an smoothed, why does it look the way it does?

  57. Lucia,
    Sorry, I got the colors wrong on the legend. Black is temp, red is recon. I’ll fix it.

    The proxies were made by adding white noise to the black curve. The green is the average (normalized) so it should be temp plus white noise too.

  58. Nick–
    It’s 10:30 pm here. So… maybe I’m a bit dim. Did you fix the graph while I was fiddling with my script? I’ve still got the other tab opened… and the colors were different.

    Ok.. on this tab (new graph?)

    1) You had the black trace.
    2) You added white noise to the black.
    3) You loess smoothed. That’s green.

    So, green is close to black as it should be.

    Red (CPS reconstruction) is consistently high in past? Is that what you are getting?

  59. Lucia,
    I’ve posted the code now. Yes, I did post new figs – there’s an explanation in the text that when I reran to get the legend right, it calculated new noise for the proxies. Without selection, the red can go either way from run to run, though through the years the bias factor is the same. With selection, the bias is almost certain to be up (red above black).

  60. Nick

    Without selection, the red can go either way in any one run, though through the years the bias factor is the same.

    That’s the way the one I called “method 1” is mostly going to go too. Of course, I’m doing my ‘toy’ things because I find it easier to ‘see’ method issues.

  61. Lucia,
    Just to summarize the theory – the recon (red) is
    (T+e)/(1+T.e)
    e is a scaled white noise. The dot prod is over the calibration years. T is normalized wrt calib mean and sd.
    The green curve is T+e, smoothed. The bias factor is 1/(1+T.e). For unselected, that can go either way; for selected, E(T.e)>0.

  62. Nick Stokes, could you clarify something for me. In that post, you say:

    Pseudo-proxies are created by taking some computed temperature curve going back centuries. Here I use output from CSM from 850AD to 1980, as described in Schmidt et al. Then noise is added. It could be red noise thought to be characteristic of real proxies, but for present purposes, to focus on analytic approximation, I’ll use white noise.

    You talk about adding noise to pseudoproxies, pointing out it is white, not red. However, I though the CSM output you refer to already uses red noise. Is my memory mistaken, or am I perhaps missing something else?

  63. Brandon,
    This time I only used the CSM modelled NH temperature (T), not the proxies. I added white noise:
    P_i=F_i*T+w_i

    Deep Climate believes that the 59 proxies that I (and RomanM) used earlier (from Schmidt et a\l, red noise) are constructed not with NH_CSM but with local grid temperatures, which complicates their interpretation.

  64. Nick

    which complicates their interpretation.

    In what sense? At least for the current posts where we are focusing on methods, it seems to me it doesn’t matter what we use as a temperature. That’s why I’m using the oscillations (and also varying number of proxies etc.)

    Does Deep mean it affects interpretation of what the final reconstruction would mean? (If so, yes. One of the reasons I use the obviously contrived “Temperatures” is to avoid having people ‘interpret’ anything other than ‘the method does X’).

  65. Nick Stokes, based on the temperature/recon series you have presented above what can you conclude?

  66. Oh, I see. When I quickly skimmed your post, I assumed you used the 59 pseudoproxies from the csmproxynorm_59red file for your 59 proxies. Sorry about that.

  67. nick,

    have a look at Downmark and knitr for R.

    Cool way to do posts in R. I’ll post bits later
    but its now integrated in RStudio.

    Lucia you should look at it as well

  68. Kenneth,
    Conclusion – for CPS at least, selection does have some effect. The selection is basically on F_i+T.w_i, where the dot is mean product over calibration years (F is like S/N). The intent is to select for high F, which improves average S/N. But proxies can pass with a favorable noise/temp alignment in calibration. This means they are credited with more temperature sensitivity than they really have, and pre-cal, the truth emerges, and there is less variation reported.

    It’s hard to quantify, because it depends on how variable F is between proxies. Uniform F means all the selection is via T.w – much bias. If F varies a lot more than T.w, the selection will work better.

  69. Humm… OK, so selecting based on correlation with the calibration period makes a CPS reconstruction understate the pre-calibration variance. I thought Lucia (and several others) had already demonstrated that several times; some have even had the temerity on earlier threads to classify this loss of variance as an example of “screening fallacy”. Are additional confirmations of this fact in some way needed or useful?

  70. Steven Mosher (Comment #98993)
    I assume that is a report no one would read? Someone would then have to write the real report based on that report.

  71. SteveF,
    Yes, the analysis is useful – it quantifies it, and also shows how it depends on variation in F – earlier demos tended to use uniform F. But remember, it’s only shown here for CPS which is a method where you don’t usually select by correlation.

    As for “selection fallacy” – well, again, it was never specified. There are other sources of bias too, and noise is a big problem. Selection makes a trade-off between bias and noise, which is a legitimate deal – not a “fallacy”.

  72. Humm… OK, so selecting based on correlation with the calibration period makes a CPS reconstruction understate the pre-calibration variance.

    For CPS, I get that exact selection bias I get with methods 1 and 2. This is best shown with toy problems where the variable Nick calls F is the same for all cases. You still get selection bias.

    If F varies a lot more than T.w, the selection will work better.

    It would be better to say “might”. If you had a batch of high F and a batch of duds, selection might be useful. I”m trying to organize to show things better and continue proof reading. Because my code drew in new “junk” as we discussed it was terribly disorganized. So…. I reorganized. I’m checking stuff now. I think it might be ok, but I thought that before… and… found bug.

    I need to go buy burgers now. My brother’s in law lost power when the storm passed right through Wheaton (and skirted Lisle ) and everyone is eating here. (http://www.suntimes.com/news/13542771-418/comed-power-might-not-be-restored-to-many-for-days.html) They are going to need to get an electrician because evidently the powerline to their house is down. A tree might have fallen on their roof. I’m waiting to hear more. (They are fine. They decided to sleep at their parent’s house to have AC, lights and refrigeration!)

  73. Selection makes a trade-off between bias and noise, which is a legitimate deal – not a “fallacy”.

    As far as I can tell, if you start out by picking a crappy method, then you can reduce noise by using selection. But if you use a less crappy method to start with, you don’t need to. But… I need to check some stuff still.

  74. Thanks, Steven,
    Timely – there’s a monthly temp report coming up. Markdown could help, though I’m using quite a lot of R automation already.

  75. Lucia,
    “If you had a batch of high F and a batch of duds, selection might be useful.”
    I think a good test case would be a regular progression of F values – say 10db, 12dB, 14dB etc. Then a test would be how often does noise causes proxies to be selected out of sequence.

  76. Nick Stokes, I decided to download your code, and when I did, I immediately had a problem. I’m not sure how you got the values you set for the SNR for your proxies, but one thing I’m sure of is you only set values for 24 proxies. You then used the length of the variable containing those values (fp) to determine the size of your noise matrix (ep) and number of proxies (stored in p). This means you only made 24 proxies.

    I went ahead and added 35 more values to your fp variable (I just copied the earlier values over) to see what effect it has. When I did, I got radically different results for your Figure 2 and 3.

    I can post images/my code if you’d like, but I think you should easily replicate what I describe.

  77. Brandon,
    The proxy values are set in the statement that starts:
    fp=0.15*c(0.993,1.186,…
    There are 59 there. The program loops (for(ip in 1:2)) and on the second pass it cuts down to 24, so it ends up looking as if you only had 24.

    Also each run generates different noise.

  78. Nick Stokes, yeah, after I said that, I went back and realized what the problem was. I didn’t notice you trimmed down the variables, so when I went back to run segments of your code, I didn’t realize I’d need to reinitialize them.

    It seems this week is a week for me to miss the obvious.

  79. Nick Stokes (Comment #98998)

    > As for “selection fallacy” – well, again, it was never specified.

    Frowny face 🙁

    Over the past few threads, the concept has been discussed in ways that are specific enough to be useful.

    If the deeper meaning of your comment is to remind others that consensus (definition 1a, unanimity) on this point will not be attained — I agree (obviously).

    On the other hand, on this point, consensus (definition 1b, “the judgment arrived at by most of those concerned” has been satisfied.

  80. I think a good test case would be a regular progression of F values – say 10db, 12dB, 14dB etc. Then a test would be how often does noise causes proxies to be selected out of sequence.

    What’s the purpose of this test? I don’t care specifically about ‘out of sequence’.

    It seems to me once we make a general observation, after that the issue of whether screening helps our not would need to be done based on reasonable assumptions about distribution of F values in a proxy set for a study. There isn’t going to be any “one” answer that applied to everything. I suspect it would be rare to find using a regular progression of F to be useful for learning much any more than with any other ‘cartoon’ distribution. Certainly, I don’t see how you are going to learn much more than using something like a combination of two batches one “good” and one “dud”. But if someone wants to do that and it helps them, I guess they can.

  81. Lucia,
    I think selection effect is a bit like the trees themselves and temp – it’s important in a critical region. Proxies with F well above the cutoff were always going to be selected anyway. And proxies way below were never going to be (OK, maybe if there is enough noise). It’s the proxies that are close enough that noise gets them selected when they shouldn’t be that are the issue.

  82. Nick–
    A) There is a consensus as to what the selection fallacy
    B) There is a consensus that you have some sort of mental block preventing you from knowing what it is
    C) There is a consensus that you are incapable of perceiving that most people agree that the definitions you differ from each other are similar definitions and
    D) I suspect there is a consensus that you are never going to “get it”.

    Please go back and re-read the discussion of applies, cherries, bannanas and fruits.

  83. Nick

    It’s the proxies that are close enough that noise gets them selected when they shouldn’t be that are the issue.

    Proxies that don’t get selected when they should be are also “the” issue.

    (Actually, I think you need to learn to that sometimes there is more than one issue and consider using the indefinite article “an” instead of eliminating issues down to “the” issue.)

  84. Lucia,
    “Does Deep mean it affects interpretation of what the final reconstruction would mean?”
    I think he just pointed out the fact. My interpretation is that it would add to the noise the discrepancy between local and target temperature, which might not have the same noise-like behaviour.

  85. Nick

    I think he just pointed out the fact.

    I guess my question is which fact did he point out?

    As far as I can tell, the current purpose is to merely see how the “red, black and green” lines compare to each other on your graph. I don’t know how the provenance of the black line affects our interpretation of what factors make the green one differ from the black one and what factors make the red one differ from the black one.

  86. Lucia,
    The important properties I wanted of pseudoproxies was that the deviation from T be zero-mean and stationary. DC put that in doubt in my mind wrt the Schmidt proxies.

    It’s also very handy to know what F is.

  87. It’s also very handy to know what F is.

    In a synthetic experiment one can always know. That’s why they are useful. It would be handy to know for a real reconstruction, but alas, that is unknowable.

  88. Nick,
    “As for “selection fallacy” – well, again, it was never specified. There are other sources of bias too, and noise is a big problem. Selection makes a trade-off between bias and noise, which is a legitimate deal – not a “fallacy”.”
    It does more than that of course; it opens the entire reconstruction to questions about uniformity of response to temperature over extended periods. You have been given several clear examples of the negative consequences of selection based on correlation with the variable you are trying to reconstruct, and even verified at least one of those yourself. My personal evaluation is that this kind of selection is, for lack of a more eloquent description, incredibly dumb data snooping, and a nearly perfect way to fool yourself into believing things which may well not be true.

  89. Lucia 98999,
    We sensible Floridians usually have a backup generator alailable. 😉
    Avoids lots of problems.

  90. SteveF,
    “it opens the entire reconstruction to questions about uniformity of response to temperature over extended periods”

    No, it doesn’t. It tends to bias low one of the scaling factors. That is applied uniformly for all times past.

    And it has nothing to do with data snooping. You aren’t looking at the data you are trying to predict, which is pre-calib. You can’t.

  91. Lucia.

    Markdown is a style of creating a document where you write text
    and code and latext and the document “executes”
    So you dont have code in one document and text in another and graphics in the third. You write one document.

    that document is execute able.

    Think of a program> you have code and text that you
    “comment out”

    flip that paradigm. You have text, and then “markdown” tags
    that say “run this text as code and put the graph in the document”

    easy peasy.

  92. SteveF, I’m up in Pocatello ID tonight… there is a lot going on around here.

    Major forest fires and people displaced from their homes, many with nothing to return to. I was listening to refugees at the restaurant talk about leaving their homes behind, and hearing propane tanks exploding and stuff like that. There were probably 30-40 people at this restaurant, which was by my hotel (they all left together in a crowd when the restaurant closed).

    Over in Wyoming the skies were overcast all the way from Cheyenne to its western border due to smoke from fires. I also got to listen to firemen talk on the radios about not being able/permitted to bring in C130s to fight fires in some of the high canyons that are so risky to put people in there to fight.

  93. Mosher–
    It looks like an ok thing. It’s certainly an advantage to be able to have stuff linked. It’s just that isn’t really a “report” of the sort that ought to be circulated beyond two or three people who are working on something and who either
    a) don’t need a real report or b) are working on a real report.

    Right now that looks ok as possible course lecture notes in a CS class or as very well documented code. I’ll believe it can be used to write an informative report that doesn’t waste the readers time when it is used to write an informative report that doesn’t waste a readers time.

    Carrick– That sounds grim!

  94. Nick 99020,
    We have been down this road before. For me at least, collecting data then comparing and selecting what data to use for your analysis based on how well it satisfies your expectations (in this case, correlates with recent measured temperature) is a perfect example of data snooping. The selection casts doubt on the entire analysis because it demands the “uniformity principle” be applied to processes for which the need for selection itself indicates may well not be applicable.
    And no, I don’t expect you will ever see that problem.

  95. Carrick,
    Let’s hope that more sensible fire policies are put in place to avoid the buildup of underbrush which leads to the most destructive of fires where the canopy burns. Of course, sensible policies may be difficult to adopt, especially since those policies may conflict with protecting homes which could be destroyed as a result of allowing frequent natural brush fires to burn.

  96. SteveF,
    I think that what many people understand by the “selection fallacy” is what is shown in the cartoons by Josh and Jeez – that if you select for HS shape in the last century, then you’ll get HS shapes throughout.

    The reality here is quite different. Selection by correlation aims to selct proxies with good S/N. But in 80 years, noise means that the underlying S/N differs from the average estimated over that fairly short time. Without selection, errors in either direction largely cancel.

    However, when you impose a cutoff on what you think is correlation, some with low real S/N get in by chance. Conversely, some with high S/N are lost by chance. Both these effects lead to actually having less S/N than you think. You scale to what you think (no better choice), and so variation is reduced.

    It’s a scaling issue relating to having just 80 years to observe and a lot of noise. It has nothing to do with uniformity. In the demos, we’re imposing perfect uniformity.

  97. Nick-

    I think that what many people understand by the “selection fallacy” is what is shown in the cartoons by Josh and Jeez – that if you select for HS shape in the last century, then you’ll get HS shapes throughout.

    I have no idea why you keep trying to conflate the “selection fallacy” with a single example and even more with the cartoon of a single example.

    The reality here is quite different.

    No. The reality is not ‘quite different’ from Josh and Jeez’s cartoon.

    But in 80 years, noise means that the underlying S/N differs from the average estimated over that fairly short time.

    Actually, the underlying S/N does not differ from the average estimated over the short time. Reality only differs substantially from the S/N estimated over a short time when you also screen. The fallacy is to believe that screening must improve when in fact, it can degrade.

  98. Lucia:

    The fallacy is to believe that screening must improve when in fact, it can degrade.

    In fact, I’ve yet to see proof (either by Monte Carlo or more formal methods) that selection by correlation is actually improving SNR.

    Many of the early papers that used it did not understand that they had simply rescaled their reconstructed series, so that a “quieter looking” reconstruction which they improperly interpreted as having improved the SNR had simply deflated the scaling constant between their reconstructed “temperature” and real temperature.

    (As with your sine wave examples.)

    In fact, if all of the proxies really belong to the same ensemble group, it’s my prediction that selection by correlation won’t improve the SNR, but reduce it (you’re tossing out perfectly good data that happen, over the selection interval, to not correlate well with temperature due to noise) but will also generate a difficult to repair distortion of your reconstructed series.

    (This same problem likely applies to using correlation to scale against as shown by von Storch.)

    I’ve pointed out before that if you compare the different reconstructions against each other using Pearson’s correlation (which ignores the scaling constant and therefore sidesteps the problems with reduction in scale), most of the “better” reconstructions agree pretty well with each other.

    That’s fine if you want to know relative information about different periods. Not so good if what you are trying to argue is “unprecedented warming”. And in fact we now know beyond any doubt that people have used screening methods to produced flattened hockey sticks (during the reconstruction period), and have improperly interpreted a bone-headed statistical error as evidence for “unprecedented warming”.

    Is that the “screening fallacy”? I don’t care. It’s wrong regardless of the label. Nick seeks to distract with cheap theatric tricks that I don’t think even he takes seriously, but that’s the real issue. They made a mistake, they were wrong, and they don’t want to admit it.

  99. SteveF, you’d hit on one of the problems when you mentioned protecting people’s properties.

    After the 1910 giant blow up, there was a big movement to protect forests, not just for natural conservation, but because loggers were afraid of losing tree stock. Allowing people to build on some of these ridges is reminiscent to the people who are allowed to build on barrier islands on the East Coast. People are putting their homes in beautiful, scenic places, but let’s face it, surrounding your home by massive amounts of natural kindling isn’t exactly the most sensible approach you can take in home site selection;

  100. Another debate trick is to say, “oh yeah, But we already knew that. Its nothing new.”

  101. Carrick

    In fact, I’ve yet to see proof (either by Monte Carlo or more formal methods) that selection by correlation is actually improving SNR.

    I think there is no doubt it would improve SNR in the extreme case where we had 10 “perfect” proxies that were perfectly correlated (R=1) with the target temperature and 10 “utterly broken” proxies that reported absolutely nothing but noise (R=0.)

    The difficulty is that proxies with very high R are difficult to find (and likely do not exist.) If any had been found, scientists wouldn’t be spending lots of time trying to screen based on correlation. Instead, they would discover the meta-data that resulted in high R, and shedule trips to go out and take more ‘cores’ (or whatever they needed) to assemble more proxies of the correct type and then reporting that– as anticipated– the freshly gathered proxies also have high R. Then they’d created a system based on the proxies whose meta-data consistently gave high R.

    Of course, in this case, they could also take the proxies whose meta-data says “high R”, add a lot of R=0 junk (for god knows what reason), then use correlation screening and say “Look! Using a screen we can eliminate most of the junk proxies! Screening works!!!!”

    But why would the do it when they already had — or could easily obtain– a whole bunch of reliable high R proxies? No one would.

    If it was possible, selecting known high R proxies based on metadata would be a much more sensible course than trying to deal with the low signal to noise proxies we have and screening. So, the issue is: given that the proxies do have fairly low S/N and none have been shown to have high S/N can we increase S/N by screening? More importantly, can we know we’ve increased it? It’s sensible to try to explore that. But it will take a bit to tease it out. (I suspect the answer to the second question is that even if we can increase S/N, we can’t have any certainty that we did so. So, our uncertainty intervals are, for all practical purposes, dictated by proxies picked based on meta-data and then not screened based on estimates of R from the calibration period.

    I’m setting things up to report back:

    1) Mean difference relative to target. (bias if Nproxy->infinity.)
    2) RMS difference relative to target. (bias & noise together.)

    I’ll have a few curves today. And then tomorrow. And then a few more. Mostly, I need to report because I revamped my code organization!

  102. Carrick,
    “Allowing people to build on some of these ridges is reminiscent to the people who are allowed to build on barrier islands on the East Coast.”
    Yup. And along with that problem comes “public” outcry (at least the wealthy public that owns ocean-front properties) for subsidized loss insurance on homes built where only a fool would build. Similar problem, similar solution: build in a foolish location at your own risk; and don’t bitch that you can’t get cheap insurance.. or bank financing for your foolishness.

  103. Nick,
    “It has nothing to do with uniformity. In the demos, we’re imposing perfect uniformity.”
    Yes, of course you are imposing perfect uniformity. That is my point. Under a set of unsupportable assumptions about uniformity you carry out a statistical analysis and confirm problems with loss of variance, then go on to say, “see, not all that bad”, and declare that there may be ways to resolve that problem by changing the statistical methods. Even were this true (and I suspect it is not, since the uncertainty will probably inflate along with whatever ‘improvements’ you make), that doesn’t address the uncertainty in the underlying assumptions. The way out of the morass is to get data with better S/N based on your understanding of the physical processes involved (that may mean more research on the underlying processes and data collection based on more rigorous metadata constraints), then use all the data.
    That would eliminate the need for contorted analysis and indefensible decisions like choosing to ignore inconvenient reality like “the divergence problem”. If trees responded to temperature in a meaningful way in cooler 1910, but not in warmer 2010, then absent a clear (and clearly proven) explanation for that observation, there is no defensible reason to believe they did not stop responding during the warmer MWP as well. I find the fact that you (and apparently many others) can’t see this is as prima facie proof that trees can’t be used in long term temperature reconstructions risible. The risible efforts in treenometry seem to continue unabated.

  104. SteveF

    The way out of the morass is to get data with better S/N based on your understanding of the physical processes involved (that may mean more research on the underlying processes and data collection based on more rigorous metadata constraints), then use all the data.

    I suspect that you are suggesting that to the extent that the S/N is low during the observational record, it’s unlikely that the response is “uniform”. This is plausible– but not entirely certain.

    For example: Suppose a tree-nometer grows at location X, and it’s proxy value over time P(X,t) is perfectly correlated with the temperature at “X”, (T(x,t)) . But we don’t have thermometer measurements at X. So, we don’t know T(X).

    Instead, compute the correlation between the P and T_m, a temperature that is only correlated to that at X. Now, the correlation between P(X,t) and T_m(t) will be less than 1. It will be equal to the correlation between T(X,t) and T_m(t).

    But we do understand there is a physicalto explain the correlation between T(X,t) and T_m(t). So, we might expect this to be uniform. (Whether that would really be true particularly if the calibration period exhibited rising trend for global temperatures and outside the calibration period we didn’t see that. If the whole things is ENSO related, T(X,t) and T_m(t) could be negatively correlated outside a calibration period but positively correlated during a calibration period! That’s the odd way these stupid things work– and there is no violation of physics involved!)

    Still, if we have a long enough calibration period, and we merely wish to pick up thing with longer periodicity than something like ENSO, we expect the correlation between T_m and T(x) to persist well enough and create a reconstruction under the assumptoin of “uniformity”. But there are problems with it!

  105. Yup. And along with that problem comes “public” outcry (at least the wealthy public that owns ocean-front properties) for subsidized loss insurance on homes built where only a fool would build. Similar problem, similar solution: build in a foolish location at your own risk; and don’t bitch that you can’t get cheap insurance.. or bank financing for your foolishness.

    I’m always rather amazed at the luxurious condos in hurricane prone regions in FLA. Way back when we would visit Popsie-Wopsie but refusal to get a hearing aid meant we didn’t want to go deaf staying at his place, we staid in a nice rental unit in a condominium unit where the vast majority of units were owned by snowbirds. Near the pool, they had various discussions of trying to get insurance at ‘reasonable’ prices.

    Well… of course the real difficulty is that to a large extent, it is not ‘reasonable’ to have a large number of very, very nice condominiums owned by people who — when it comes down to it– are well-off but not wildly wealthy retirees, who just don’t want to face the notion that they really, honestly made a decision to sink a large chunk of money into something that has a fair risk of getting heavily damaged.

    (My Dad, who lived in FL year around actually used to sort of chuckle at some of the ‘snowbirds’ in his condo many of whom would a) redecorate often, b) eat in restaurants every day, c) fly home to NY or the midwest in the summer, d) and bitch incessently about how they could barely afford their prescription drugs.

    He was like, “whoa! Obviously, if you are going out to eat every day and getting new wallpaper, painting etc. pretty regularly, your drug costs can’t be impoverishing you!”

    Dad was on the condo board too- he had funny stories about some of the people who didn’t sit on the board and didn’t call insurance agencies would say about “what he should do” to find cheaper insurance.

  106. “Is that the “screening fallacy”? I don’t care. It’s wrong regardless of the label. Nick seeks to distract with cheap theatric tricks that I don’t think even he takes seriously, but that’s the real issue. They made a mistake, they were wrong, and they don’t want to admit it.”

    Yes, instead of admiting or correcting the mistake, they point to a “mistake” about their mistake ( the cartoon version is not the
    “same thing”) and think that they are impressing people who

    A) understand the mistake
    B) understand that the cartoon is just a cartoon.

    Now, of course, we also realize that Trenberth’s cartoon, for example, is just a cartoon meant to get the idea across. And we realize that Greenhouse is just a mistaken metaphor to get an idea across.. but those “mistakes” about a truth are ok, while a cartoon about the truth of a mistake are somehow not ok..

    It’s like thinking you destroy the theory of AGW by nitpicking trenberths cartoon or the greenhouse metaphor.

    cartoons are not proof. no serious person takes them as proof. they are attempts to do in pictures what can only be done in math. They are attempts to simplify ( and will always of necessity get it wrong in some detail ) and communicate an idea.

    The IPCC did not object to the hide the decline cartoon. That, in the final analysis, is what Briffa’s chart is: A cartoon. And no one objected because they liked the story it told. It played well. Well, you wanna fight on the cartoon playing field then my bet is you are going to lose. why? because Josh kicks the crap out of Jones and Briffa when it comes to drawing cartoons.

    or.. stop dumbing down the science cause your opponent has much more experience talking to dumb people.

  107. Mosher: “or.. stop dumbing down the science cause your opponent has much more experience talking to dumb people.”

    Nicks opponents in the “selection fallacy” threads have been Lucia, Steve F, Carrick…. more experience talking to dumb people?

    Or are you saying that Nick is really adressing opponents not present in these discussions? You may be exactly right and he is not helping his cause as you observe.

  108. “while a cartoon about the truth of a mistake are somehow not ok”
    Nothing wrong with cartoons to reinforce something that has already been properly described. The problem is when cartoons are all the specification you can get.

  109. Nick

    The problem is when cartoons are all the specification you can get.

    Then in the case of the definition of “the screening fallacy” there is no problem since it’s been defined over and over and over in ways everyone except you agree on.

  110. Re: lucia (Jul 3 11:31),

    I’m always rather amazed at the luxurious condos in hurricane prone regions in FLA. Way back when we would visit Popsie-Wopsie but refusal to get a hearing aid meant we didn’t want to go deaf staying at his place, we staid in a nice rental unit in a condominium unit where the vast majority of units were owned by snowbirds. Near the pool, they had various discussions of trying to get insurance at ‘reasonable’ prices.

    The thing is, though, it’s not just the condos themselves. There’s the infrastructure like roads, bridges, water, sewer and electricity. That’s likely to go in a big hurricane too and the local governments won’t have the money to rebuild that either. John D. MacDonald’s 1977 novel Condominium should be required reading before signing a contract on a barrier island condo. Buying anything on a barrier island is gambling. If you can’t afford to lose it, you shouldn’t play.

  111. Lucia,
    “I suspect that you are suggesting that to the extent that the S/N is low during the observational record, it’s unlikely that the response is “uniform”. This is plausible– but not entirely certain.”
    .
    For sure not certain; but I fear I have not been clear. The uncertainty due to something like your example of the local temperature versus global temperature is not what I am concerned about. My concern is that the unexplained range of responses in the same geographical area means it is imossible to determine if the same proxies behaved the same ways in other times. Suppose for a moment that the range of sensitivity to temperature in a stand of trees is due to genetic differences between trees. Suppose further that the genetic makeup of the stand changes over time as phenotype ‘coldlove” is gradually replaced by phenotype ‘floridian’. At some point in the distant past there was an unknown mixture of the two, and so in aggregate a certain average response to a change in temperature. Unless we know how the mix of phenotypes changed over time, any long term reconstruction we do based on correlation to recent temperatures is bound to be wrong. I grant that this example is an extreme case, but the issue is real: when there is substantial local variation in responce to temperature without a clear explanation, then there is undefined uncertainty in any analysis based on that data. The uncertainty seems to me to be multiplied when correlation with temperature is used as a selection criterion. Correlation is not causation of course, but large unexplained variation in correlation means you do not understand enough to conclude much of anything about what correlation there is.

  112. Nick–
    I’m not going to spend time googling for you. It’s pretty obvious you’ve read the various definitions by lots of people because you’ve commented on them.

  113. SteveF

    I grant that this example is an extreme case, but the issue is real: when there is substantial local variation in responce to temperature without a clear explanation, then there is undefined uncertainty in any analysis based on that data.

    Agreed. And so, to check whether there is a ‘substantial local variation in response to temperature’, if there is a stand of trees, one should at least want to check if the variations in “R” are consistent with every tree sharing the same “R”. But clearly, as with everything in frequentist statistics, we are setting up a “null hypothesis” (i.e. all bristlecone pines in ‘stand A’ share the same R.) Then we test that null. If we have very few samples to test the null, it will be accepted even if it’s wrong.

    Correlation is not causation of course, but large unexplained variation in correlation means you do not understand enough to conclude much of anything about what correlation there is.

    To my mind, generally speaking low correlations with no strong physical principle to suggest the correlation ought not to be zero are dangerous too.

  114. Re: lucia (Comment #99035),

    I think there is no doubt it would improve SNR in the extreme case where we had 10 “perfect” proxies that were perfectly correlated (R=1) with the target temperature and 10 “utterly broken” proxies that reported absolutely nothing but noise (R=0.)

    The difficulty is that proxies with very high R are difficult to find (and likely do not exist.) …

    That comment is a nice perspective on (part of) (one aspect of) the screening issues under consideration.

    To riff a bit more along this line: the toy examples you have floated offer clarity, because we know beforehand what the characteristics of the proxies are, with respect to correlation to the variable of interest. (We know, because you specified them and told us.)

    In the real world, we know almost nothing about the distribution of the temperature signal carried by the data series we are considering as proxies for the reconstruction period. Well — we know there aren’t any that are very high, say, above 0.5 or so, per your and SteveF’s comments. For a given set of 100 candidate composite treemometer ring width or latewood density (MXD):

    0.4 < R < 0.5 for A
    0.3 < R < 0.4 for B
    0.2 < R < 0.3 for C
    0.1 < R < 0.2 for D
    0.0 < R < 0.1 for E
    R < 0.0 for F

    We can state that A+B+C+D+E+F = 100.

    As far as I can tell from this and other discussions, we can hardly say anything more than that, about the distribution of signal in the data series (or the distribution of noise, if you wish).

    If we could characterize that distribution, then it might be possible to compose a screening-based strategy to improve the S/N ratio, and thus decrease the size of the uncertainty error bars we must add to the recon period.

    But we can’t. Meaning that we can’t know whether our screening strategy has removed noise or signal. The error bars on the recon are pleasingly narrower. But these narrow bars result from the fact that they haven’t been widened as a consequence of the uncertainty that we added by choosing to screen. Thus, they don’t reflect a real knowledge of the climate of the past.

  115. Amac

    The error bars on the recon are pleasingly narrower.

    If you compute error bars under the assumption that your screening was totally effective at distinguishing a proxy with 0<R from one with R=0 your computed error bars will be narrower after screening. Also, your reconstruction may look “less noisy” using “method of eyeball”. But the latter is often a bad thing if you interpret the ‘smooth’ as being “closer to correct”. It might not be.

    Of course, to get half way decent error bars, you need to consider the possibility that some deleted proxies were “informative” and some retained were “non-informative”.

  116. There are lots of trees in Central Park, long temperature record. Same with Kew Gardens, England too. You could even core the trees in Amarh and compare those cores to the thermometer records.
    Want to bet you can find a correlation?

  117. DocMartyn, you wouldn’t find such a correlation and I don’t think anyone would expect a detectable temperature signal in those trees.

  118. “Niels A Nielsen
    I don’t think anyone would expect a detectable temperature signal in those trees.”
    That’s right, we need magic ones and only a Shaman can identify the magic trees and they are wrong about half of the time.

  119. Doc, cores are usually collected where cold temperature limits trees, and growth is believed to be sensitive to temperature variations.

  120. Niels:

    Doc, cores are usually collected where cold temperature limits trees, and growth is believed to be sensitive to temperature variations.

    I think you meant cores for temperature reconstructions are usually selected from trees located in regions where cold temperature limits tree growth. “Liebig’s law of the minimum.” Of course, cores from trees are sampled all over the planet, for many reasons (often economically driven).

    However, that doesn’t stop researchers from picking trees that aren’t in any way temperature limited as was done in Mann 08 (most of his specimens are from the central US). In this case, the hypothesis is long-term there is a regional correlation between temperature and precipitation, and that the proxies are really precipitation limited.

    This is also the justification for applying a 10-year low-pass filter to the data before correlating them with temperature, and why using short-period climate fluctuations would make no sense in screening by correlation for these proxies.

  121. Nies, trees are biological organisms and exist to procreate. The function of a tree is not to have big fat tree rings, but to scatter its seep as wide as possible. Why, in a good year, should a tree invest in self (tree ring thickness) instead of offspring (seedlings).
    The optimal strategy for a long lived species like a tree is to generate as many seeds as possible when the climate favors the chance of its seedling being successfully transformed into viable trees. The worst strategy is to miss the small window, filling up with the seeds of other species, and investing in wood.
    So as a biologist I must ask why you think that trees squander an opportunity for genetic immortality to make a thicker trunk?

  122. As a biologist, you would know that biological organisms have strategies for procreation. Some just do it as often and as much as possible, others take a long time to do it. It all depends on how they balance longevity and procreation.

  123. bugs, there is a 4 to 6 month period between when trees must begin investing in fruiting bodies and when seeds are released. My guess is that immature trees ‘guess’ what the conditions in the late summer/early fall are going to be like based on the winter/early spring.
    Some plants have a system for detecting, temperature, storing the signal and reading the information and acting on it.

    http://www.pnas.org/content/early/2010/06/01/0914293107

    Trees are in it for the long haul, so I would expect a quite complex information storage and information retrieval system. As a guess I would expect a large level of inprinted information, based on a trees early years, then slight alterations based on a common, learned, theme.

  124. Bugs, you and DocMartyn are both right. WRT Carrick’s comment of Lieberg’s law, it depends on the temperature response of the plant species and individual’s enzymes and proteins. I don’t assumptions they make wrt tree rings hold much truth. I would expect a good study would falsify many of the assumptions in such works as Mann 08. In fact a good thought experiment shows that the underlying assumption of most paleo dendro work is falsified from the get go. Or you can just read Loehle.

  125. Re: DocMartyn (Jul 4 13:29),

    The optimal strategy for a long lived species like a tree is to generate as many seeds as possible when the climate favors the chance of its seedling being successfully transformed into viable trees.

    Flowering plants generally flower more when they are stressed. I dunno if that applies to trees. But consider the alternate. When growing conditions are good, a plant would benefit more by trying to outgrow anything around it instead of creating more things that will compete. During hard times, survival of the species means seeds are needed in case the plant doesn’t survive. The seeds will sprout when things get better.

  126. A useful experiment occurred to me while sweating during yesterday’s 4th of July festivities, as a step towards estimating the extent to which certain tree’s rings reflect temperature. (The same design could be used to look at precipitation, too — likely useful to look at temp and precip simultaneously.)

    1. Pick an area (“A”) where you think a species’ treerings contain temperature information. This area has to have a reliable long-term (say 100 years) temperature record, and the temperature has to show a significant warming trend over the period of the record. E.g. bristlecone pines in the mountains of Eastern California, larches in northern Siberia, list of other choices at NOAA.

    2. Pick another area (“B”) where the conditions are as for area A, except there is no significant long-term warming trend at that locality.

    3. Read ahead, and pre-determine the methods for the remainder of the experiment. This is a prospective study, not a post-hoc analysis. No data snooping allowed.

    4. As discussed on prior threads by Carrick, SteveF, & others, choose meta-data criteria that is likely to select those trees with treerings that correlate to temperature. E.g. location just below treeline, no evidence of being shaded, no evidence of insect infestation, expert opinion.

    5. Core the selected trees. Sample size determined in #2 by a power analysis.

    6. Perform the standard analysis to obtain the treering data series of interest (e.g. ring width, MXD), as determined in #2.

    7. Compare the trees’ records with the temperature record, again as determined in step 2. Calculate the metrics of interest (e.g r^2, Pearson’s, RE) for each data series.

    8. Mix & Match Sanity Check: Analyze the treering data for area A against the temperature record of area B, and B’s treering data against A’s temp record. To the extent that treerings are driven by temps, A-to-A and B-to-B should give better results than A-to-B and B-to-A. Only A-to-A should have a treering record that suggests “long-term warming.” To the extent that treering variability is due to white noise, red noise, local factors, non-temp-climate factors, etc. etc., the four comparisons should resemble each other.

    9. Display Results (tables, graphs) and Discuss.

    Once this is done, you will have an idea of the contribution of temperature to treering characteristics. An upper bound, actually — you still might be fooled by red noise that happens to look like the temperature trend. To tackle that, you could repeat the experiment at other locations.

    The downside of this experiment is that it would be a lot of work. The upsides are multiple: it wouldn’t be a huge task (many experiments that are done seem to require much more effort); it doesn’t require any novel or expensive-to-design methods; it would provide solid data for analyzing other data acquired for that tree species; it would provide a framework for “selecting on the dependent variable” (i.e. data snooping), if this is still an approach that you want to take with this species’ data series.

    Has something like this already been published?

  127. 2. Pick another area (“B”) where the conditions are as for area A, except there is no significant long-term warming trend at that locality.

    This may be both unnecessary and too stringent. I’d be all for it if this were a lab experiment, but if the planet warmed, it warmed. Finding a location “with warming” and another “with no warming” is going to be tough.

    As it happens, I’d settle for: “We believe this meta data is consistent with a tree responding to temperature. So, we get fresh cores from fresh area with that meta data.”

    Anyway, the test has more power if there was warming.

  128. I think Lucia’s right that you won’t find many places (on a regional scale at least) where temperature trends have been negative over the last 100 years.

    Here’s 1900-2012

    You should be able to correlate the strength of the signal in the proxy with the amount of warming (e.g., a scatter plot horizontal axis is temperature trend, vertical axis is tree-ring index trend.)

    My bet is you won’t get a significant correlation. But it’s certainly something that can be tested.

    As it happens, I’d settle for: “We believe this meta data is consistent with a tree responding to temperature. So, we get fresh cores from fresh area with that meta data.”

    I think something along this lines is probably a more powerful test. Screen first based on metadata decisions, then based on e.g., correlation. See if you can reproduce the correlation with fresh specimens.

    Again, my bet is you won’t be able to reproduce your results with fresh specimens. But it’s objectively testable.

    So we’re again left with the magic trees explanation.

  129. lucia (Comment #99073

    > Finding a location “with warming” and another “with no warming” is going to be tough.

    You could be right. On the other hand, I’ve often read about the regional variations in temperature patterns that exist, and that are to be expected under an overall AGW regime.

    Here, for instance, is a 120-year temperature record from a weather station in Finland that shows no obvious warming trend. (How did I come across that particular one? 😉 )

    I think it would be worthwhile to try to build in Feynmann-style tests into an experiment like this, especially given paleoclimatology’s decades-long love affair with confirmation bias. The trend in area B wouldn’t have to be “no-warming”. It would just need to be sufficiently different from that of area A to enable the mix-and-match. E.g. “rapid early warming, then slow late warming” versus “slow early warming, then rapid late warming.”

  130. Carrick (Comment #99074) —

    Curious to contrast that central-Finland record (#99075) with Finland being solidly in the 1C to 2C rise area of the GISS map you link. I haven’t checked to see if nearby stations also show little-to-no increase (via the Mark I eyeball method), or if this record is a one-off outlier that doesn’t reflect regional trends.

  131. On the other hand, I’ve often read about the regional variations in temperature patterns that exist, and that are to be expected under an overall AGW regime.

    Yes. Some exist. But you would need to find some that also have meta-data that suggest the local trees will be treenometers. Evidently, only a small fraction of regions are expected to have trees that are treenometers, so this starts being difficult to fulfill both criteria. Oh– and you need a thermometer record somewhere near that batch too!

    I’m also not sure it’s necessary. If tree-scientists were able to consistently point to identify that an as yet unsampled region of trees would be sensitive to temperature, then core trees and be consistently correct with 0.25<R in at least 95% of their experiments, I’d be happy with that. This would show that one can anticipate some correlation with temperature based on meta-data.

    If they can’t correctly anticipate whether R will be significantly different from zero based meta-data then they probably don’t know what does or does not make whatever feature or treegrowth they want to use for a proxy “temperature limited.”

    But equally important: If they claim they can predict based on meta-data, their statistical methods should be consistent with believing that claim. This means not screening by correlation!

  132. “DeWitt Payne
    During hard times, survival of the species means seeds are needed in case the plant doesn’t survive.”
    Oh dear, of dear. So not only do tress monitor temperature, they have a system of collective intelligence based on information in/out of individual trees into the ‘hive’ mind.
    When I was a hippy, I used to have hippy-like thoughts on the oneness of all things. I then grew up and actually read some evolutionary biology; I suggest you do the same.

  133. DocMartyn (Comment #99078)

    > Oh dear… [snip] When I was a hippy…

    Um, not to put works in DeWitt Payne’s mouth (or yours), but I don’t think you’ve interpreted his remarks as he meant them.

    For instance, my cousin is an entomologist studying pollination of almond trees in Califormia’s Central Valley. Because the valley extends hundreds of miles on a north-south axis, the date of almond flowering around Fresno comes much earlier than it does near Redding. Both the bees and the trees have “information” about when flowering/pollination activities take place. Further, both species have to get it right, or the insects will starve and the fruit won’t set. My cousin could tell me something about the bee side of the story, but very little seems to be known of the mechanisms that trigger the almonds. I thought it would be interesting to swap Fresno and Redding trees, to see how much control is genetic/epigenetic, and how much the timing is readily altered by local environmental cues. He agreed…

  134. “Both the bees and the trees have “information” about when flowering/pollination activities take place.”

    Design.

    Andrew

  135. AMac – your recent comments make me want to speculate about needing multiproxy information in the same region. I don’t feel like that was done in the early years, maybe it is being done now. goes back to what Carrick and Lucia were discussing earlier about bootstrapping proxies. Anyway pollen counts are a proxy, right? bees and trees…

  136. billc–

    your recent comments make me want to speculate about needing multiproxy information in the same region.

    It would be nice if you could get it….

    Really, if any proxy had consistently very high R’s with what you want to measure, you woudn’t need multiple ones in the same area. One would do. Given that we have roughly 100 ‘good’ years in the thermometer record, anything with R<0 could benefit with some backup!

  137. “billc
    Anyway pollen counts are a proxy, right?”

    Up to a point Lord Cooper. By studying the levels of pollen from different species, one is able to partially reconstruct the type of ecosystem that was present in the past. Note that as temperature rises, the pollen levels of ‘Plant A’ rise, plateau, and then fall.
    The Chinese have some really nice records of the recent 1,000-5000 years based on pollen.

    http://www.clim-past-discuss.net/6/1/2010/cpd-6-1-2010-print.pdf

    http://epic.awi.de/21625/1/Zha2009c.pdf

  138. Would it really be that difficult to find areas with no temperature increase? About 12,000 stations (1/3 of the total) show a decline over their full length of use (average of 80 years). Presumably some of them occupy a contiguous area, possibly large enough to support treemometers. Lubos showed a Voronoi graph of the stations, which ought to allow idenifying the largest areas showing no increase.

  139. Lance Wallace, as I recall, the stations with a cooling trend had a notably shorter average lifespan than the ones with a warming trend.

  140. Lucia:

    But equally important: If they claim they can predict based on meta-data, their statistical methods should be consistent with believing that claim. This means not screening by correlation!

    I think you could use correlation as part of selection though.

    For example, imagine Population A of tree-rings, which based on various assumptions, you postulate to be temperature proxies.

    Gather samples, test using correlation. Pull new samples “Population B” from the same geographic regions as the trees that passed the correlation screening… use only Population B in the reconstruction.

  141. Carrick,
    I don’t see any reason to not do what you suggest, but neither do I see any advantage to it. Am I missing something?

  142. Carrick–
    But you wouldn’t screen B by correlation, right?

    I’m a little concerned at not just using “A” though.

    What I would think is “ideal” would be:

    An analyst/experimentalist team identified meta-data that they think will result in correlation. So, for example: They might think “pines near the treeline on a south facing side of a mountain in a misty region will be treenometers.”

    Then they find at least 3 locations in the world where tree stands match the metadata and which tree-stands have not previously been cored. (This is important. If they have already been cored, then their theory that this meta-data results in treenometers is based on data-peeking.)

    The send people out to all sites identified and core some number of trees. (Maybe 20? )

    Then, they create a proxy reconstruction at that site, and test their theory that this meta data does result in positive correlation. That test can be based on the pooled results over all sites. They also test the theory that all three stands have the same correlation with temperature. (With only 3 you can do the test– but it’s going to be a pretty weak test!)

    If they can’t reject the null that all trees with the same meta-data have the same correlation, then going forward, any screening for outliers will at least assume all trees with those same meta-data have the same correlation with temperature. This means: You can’t throw out treestand “2” for low correlation while keeping treestands “1” and “3” and so on.

    Now: one permitting this, the question is, should they be allowed to also use data from trees that were in the archive before they tested with the fresh stands. I say: yes– with a caveat. Because they did confirm that meta-data using fresh trees (and locations) they should be permitted to draw from the broader archive, provided they add all trees that match those metadata. But this caveat should apply: Before drawing from the broader archive they should test whether the correlation with temperature for the trees in the pre-existing archive matches that for the new sites. If it does not, and especially if the correlation with temperature in the archive is higher than from the new tree-stands, we need to suspect that the archive contains trees that were pre-screened based on correlation. This pre-screening may have been unintentional– but it may have happened nonetheless. In that case, the analysts should revert to only using data from the new sites. (They can, of course, go out and collect more.)

    One of the difficulties is there needs to be a balance between permitting people to use data already in archives and protecting against inadvertent (or intentional) cherry picking by screening against correlation.

  143. Lucia,

    I agree those methods would generate much more robust/credible reconstructions for trees.
    .
    But I suspect there are enough treenometer specialists around to have already considered more robust sampling methods, and they probably already know that robust sampling methods don’t yield statistically convincing results. After Briffa got beat up over the Yamal ‘dirty dozen’ paper, at least he is aware how skeptical people are about the routine data-snooping that takes place in the treenometer field. I would be shocked if much unscreened tree core data, with associated meta-data, is available; that is the kind of data the treenometer specialists seem reluctant to release. It is a very strange field.

  144. SteveF–

    But I suspect there are enough treenometer specialists around to have already considered more robust sampling methods, and they probably already know that robust sampling methods don’t yield statistically convincing results.

    At a certain point, if trees don’t work, people need to admit that trees don’t work. If the people proposing treenometer based reconstructions don’t or won’t do things properly, they really can’t expect that others will believe tree-nometer based reconstructions. The same goes for everything.

    After all: The reason we believe mercury in glass thermometer is that everyone everywhere has been able to manufacture these things to reasonable tolerances and people find they can calibrate and/or test operation by sticking the things in boiling water and ice water baths. Manufacturers know that if their production process is not screwed up the output of the process will be thermometers that pass a calibration test!

  145. “(This is important. If they have already been cored, then their theory that this meta-data results in treenometers is based on data-peeking.)”

    Lucia, I think you are on the right track here in that you have to use all the data (with exceptions for that shown to be outliers) once you establish a criteria for selection of trees and sites. I would worry also about datasnooping the historical results of the tree rings and would want a means of providing testing only those tree rings within the instrumental period – unless a hard criteria was established a prior for proceeding from the calibration period to the hiostorical period.

    I suspect that a properly established a prior criteria and then measurements in the instrumental period would end a lot of reconstructions right there.

    I would also insist that any specifications about correlations with months of the year and correlations with local versus regional temperature be made a prior. In the case of dendrochronology you can also fit models with both the TRW and MXD and that needs to speciified a prior.

    Unfortunately the present state of the art in tree ring reconstructions allows for doing all these model fittings after the fact and being accepted by a climate science community as accetable statistical practices.

  146. Kenneth–
    Yes. They should be specifying which proxy variable is temperature dependent prior to testing. If they want to leave that open, then they need to adjust their test to account for testing two hypothesis.

    So, for example: If your hypothesis is that either TRW or MXD or both are temperature correlate with temperatures in period “N”, then you need to state this. Then, when testing, you need to recognize if you do this:

    1) Test null 1: No correlation with TRW in time period N.
    2) Test null 2: No correlation with MXD in time period N.

    (The time period might be “june, july, aug average” or some such.)

    If the outcome of the tests is uncorrelated, and both nulls were true, the correct outcome is ‘fail to reject’. But if you set your rejection criterion at p=0.05, you will ‘fail to reject’ both at a rate of (0.95^2) ~ 0.9025 — so you will end up incorrectly accepting one nearly 10% of the time.
    If after not getting a ‘rejection’, you let yourself test period N-1 and period N+1, but really there is no correlation, you’ll now have a chance of incorrectly rejecting a hypothesis at nearly 25%. (Of course, this might overstate because the rejection in period N-1 might be correlated with rejection in period N, but still…. it’s worth seeing just how bad it can be.)

    So, if you are going to permit squishy theories like “either one, the other or both” correlate with temperature “over any of a number of possible periods” , you need to adjust your rejection criteria so that you don’t get false positives by mining the data to find the ‘convenient’ theory. If prior to coring tree-ometer specialists cannot predict which trees stands are responsive to temperature based on meta-data, that means they don’t know.

  147. Bonferroni corrections are used in statistics to account for some of the problem arising from making multiple comparisons. The problem with that method is that it depends on the user understanding it and applying it correctly. One does not need to make a calculation to reject a model or model parameter so it amounts to recounting all the “mind” choices that were made in your model.

    Another obvious problem arises by not strictly accounting for all the models tested and rejected or trees and tree sites tested and rejected.

  148. Lucia, I wouldn’t screen population B.

    I might use screening for site selection, but this would disallow any data used in the screening for site selection to be used in the global reconstruction.

    My own ideas would require a lot more tree-ring samples than usually get collected and analyzed.

    Basically it would involve collecting multiple trees from the same region, average them, then use that to regress against regional temperature.

Comments are closed.