Bo emailed me with questions and comments on my previous post. He was curious what happens to some of my graphs if I remove the covariance between the target temperature and the proxy response parameter $latex \lambda_{i} $. I thought you guys might be too. So I created these graphs. Each compares the screened by correlation during the calibration period and unscreened results if I screening using local temperatures. Each case is done at a different magntidue of correlation between $latex \lambda_{i} $ and $latex T_{i} $. (The convention followed in the legend is negative is based on the trend during the calibration period which is a sign error. But I started that way and haven’t fixed the sign error in the legend yet.) I’m showing screening with local values because that a lighter level of screening (and so better!)
Below, I’ve compared the synthetic results with $latex cov [ \lambda_{i} $ and $latex mean[T_{i}] = 0. ] $.
Above you can see that the screening picked out 4123 out of 5000 proxies (false rejecting 877 as ‘bad’). The consequence is the screened results have a slight bias during the reconstruction period; this bias results in the aqua trace being consistently higher than the black line representing the target. In contrast, the red line is unbiased.
Positive covariance between “T and gamma”:
I repeated the exact same runs, but this time set the synthetic runs to make proxies were “warm” during the pre-calibration periods more reponsive to temperature variations and those that were “cold” during the historic period less so.

several things:
- Looking at the red (unscreened) trace, we can see method I is too warm during the reconstruction period. But also, the reconstruction in method I does not follow the target well during the calibration period. What this means is that when the bias exists, someone is likely to notice the issue. (Inspecting the algebra shows why this happens.) This effect could be called the the fundamental bias in method I. It’s in the math. This effect is peculiar to method I (and is a reason one might wish to avoid method I.)
- Looking at the aqua trace wee see that method I becomes even warmer when the proxies are screened. This extra warming over an above the amount seen in the red line happens because screening picks out proxies that response better to the local temperature during the calibration period. Absent covariance local temperature and the proxy, this results in a warming bias seen above. Since the responsiveness is better at higher temperatures, the 4123 proxies that were retained were on average warmer during the reconstruction period than the original 5000 proxies. This results in a warm bias. And on top of that, we have the warming bias introduced by the existence of the correlation which we see in the red trace. So we have three separate effects all resulting in warming bias when correlation between the average temperature at a proxy during the reconstruction period and it’s responsiveness to temperature is positive.
Note that when correlation exists and and we screen we will find that the underlying temperature at the selected proxies (i.e. the the 4123 proxies ones retained here) is biased relative to the full 5000 samples. This will happen with any method and is not unique to method A. So, when this correlation is not zero, some bias can be introduced into many methods even if it’s not introduced absent screening. (Some methods may, however, be immune to this. We would need to examine each one individually.)
Bo also wants to see how things look if I use fewer proxies– something closer to the number of proxies one might obtain for a real reconstruction. I’ll be posting some of those on Monday. I need to figure out how I want to show that because with fewer proxies, I think it’s best to show several realizations for each method.
I want to look add method IV for rescaling results– it will match the final step in Mann08. I haven’t yet figured out how I do RegEm with proxies. But I’ll think about that. I do want to see how each method works on toy problems. 🙂
This isn’t clear to me:
On the whole, paleoclimatologists are fairly smart and fairly math-adept (at least compared to me). Since I can see the problem with the proxy-based temp estimate (red trace) not matching the instrumental record (black line) during the calibration period — they could, too. Since I would try and fix this problem, I assume they would, too. This sort of readily-visible problem wouldn’t make it past an early draft of a paper being readied for publication.
Or, does the higher level of noise in real-world proxies and the year-to-year variations in the real-world instrumental record obscure the sort of trend being discussed here for Method I? In that case, I could see psychology making this an issue worth talking about.
On the whole, paleoclimatologists seem prone to confirmation bias, and avid nurturers of their careers (at least compared to me). If Method I was used in a manuscript that passed peer review and was published, I would expect most paleo authors to be unable to see the shortcomings in their work, and to be unwilling to deal with the criticisms of outsiders.
Amac–
I agree. For this reason, over time method I would be abandonned. At some point, someone will run across a data set that contains the autocorrelation, see bad results and recognize that method 1 has problems.
The only way it would continued to be used if they consistently got results like those in the upper panel– which is what happens if the proxy response parameter are independent of the temperature during the reconstruction period. This may tend to be the case– but it’s not certain.
That might happen. But I think getting bad agreement during the calibration would bother enough people that they would switch methods and then just explain that they found a “weakness” and the new method was an “improvement”.
But this method has a bias. Bo was aware of the bias while I had overlooked it assuming that these two things must be independent. (But in fact, there is no ‘must’ about it. They could be correlated for a variety of reasons.). But it’s still worth showing the case where the method is unbiased.
I’m thinking I should show the case where the autocorrelation has the opposite sign. It’s interesting because there are three biases. I this case, when there is screening, the biases go (+,+,+), in the other case they to (-,-,+). The first plus is from the algebra and for the parameters I picked the second and third due to screening raising the scaling parameter and autocorrelation reducing it exactly cancel each other. So screening ends up looking almost exactly like unscreened.
Lucia, if you repeat this with reflected such that you have a U (somewhat) shape, i.e. value -1=1, -2=2, etc. I think you will find that the signal is compressed as well. Or perhaps better, it is _1 to -150 but the noise is different than 1 to 150, such that it could have the noise that helped in the calibration period would not necessarily match the noise in the reflected part. Perhaps you have done this?
John F. Pittman
The bias is a multiplier. So T_recon =A * T_true where A is related to the bias in the conversion between the proxy values (P) and the temperatures.
I’m not sure exactly what you are saying about the reflectoin part. Do you want a U in the calibration period? Or in the reconstruction? Oddly, the shape in the reconstruction period shouldn’t really matter. What matters is that we’ve mis-estimated the fit parameter for
P=mT.
We could use nearly any “shape” for T vs time in the calibration period, the screening will cause us to mis-estimate ‘m’.
The existence of a correlation between “m” and[T(t)] will also result in a bias. I don’t think changing the sign of the ramp in this figure or turning it into a “U” would matter much. The issue is whether at a give average value of T(t) computed over proxies, if the individual proxies have different T_i(t) and those T_i (t) are correlated with the m_i, you will see this particular bias. The ramp is the least complicated way to generate the effect and illustrate it.
An example at Time =150, Temp looks to be about 75; so at time =-150, Temp would be about 75.
What I am thinking is suppose that our proxies correlate to temperature, but the noise changes, say from white to pink at some point. I would want the reflection from 0 to -150, so I could examine this. Also, as you pointed out with the (+++) and (–+). In particular, the second graph you have in CPS with variance matching, I would wonder how they change, if at all, with the same noise, then compare with changing the noise with respect to time.
To me it seems something that should be checked along the lines of what can cause bias. In this case, I am not assuming that the noise has the same structure backwards in time. In fact, I have about convinced myself that it is an assumption. But I am still thinking about it.
John
The idea that noise changes character is too fancy for my toys. My toys are all going to be based on the notion that the responsiveness of the “treenometers” does not change over time. That includes their “noise” not changing from white to pink, red, blue or any other color.
In the current post, I don’t use variance matching. I’m not sure how large the CPS umbrella is. But this converts by taking the average proxy value at (t) and dividing by the average of the correlation coefficent m in the fit P=mT_local obtained during the calibration periods. So.. I think not CPS because that would use a regional or global temperature.
John:
Do you want the reflection in the calibration period? Or when? I don’t know what you are wanting done. Anyway, the important detail is how the temperatures in individual proxies spread around the mean temperature. A “U” in the mean temperature isn’t going to make any difference– unless you are envisionsioning some particular change in the way the temperatures at individual proxies spread around that mean temperature.
You need to draw a cartoon of involving your “U” and also showing how individual proxy tempeatures tend to spread about the “U”