338 thoughts on “SkS TCP Front”

  1. Interesting to see Robert Way’s inclusion and John Cooks background in physics but I seem to have lost the page. Has it been deleted?

  2. Thanks – this is very helpful. I was able to immediately find Robert Way’s comments (brought in http://climateaudit.org/2013/11/20/behind-the-sks-curtain/) and verify that they were accurate and in no way taken out of context. They are the clearest proof in existence that many top climate scientists (a) acknowledge privately that McIntyre was right all along, and (b) are unwilling to say so publicly.

  3. MikeR, I don’t know. Robert Way is a relative newcomer to the field. Maybe he’s just capable of figuring things out none of the rest are 😛

  4. Look – I’m not a climate scientist, I haven’t studied the math. Mostly I take people’s word for it, and enjoy watching the ping-pong match. But a very frequent meme is that The Science Is Only on One Side. Even a non-scientist like me can tell that meme is not true – because Robert Way says so. I can also identify people who just have no clue – because they quote that meme.

  5. Interestingly, Robert Way commented at my blog on the post right after the one announcing I’d uploaded the forum. I don’t know if he saw the announcement or not, but it amuses me he commented (for the first time) so soon after it.

    Especially since he complained when Steve McIntyre quoted his comments in the forum.

  6. “Especially since he complained when Steve McIntyre quoted his comments in the forum.” As well he might. He was collateral damage in a war. I have no doubt that his friends were quite tolerant of his quibbles – in private. But when they were made public, they became a weapon against the cause that SkS, including Way, supports.
    But that’s how wars work.

  7. The interesting thing about the SKS forum is the frequent meme that accuracy of climate science doesn’t really matter; all that matters is providing justification to institute public policy which forces large reductions in fossil fuel use. This pretty much confirms what has been clear to many for a long time: climate science is a hideous mixture of science and politics, with politics the dominant portion. And climate scientists think all that is needed is better communication? No, what is needed is to get the politics out of climate science.

  8. It seemed that Lacatena’s hack tale is missing an earlier date. On December 19, 2011 a user named xiaoliu posted gobbeldygook with the subject “Michael Kors bags on sale.”. The post was made 2 months before “the German’s” first visit.

    http://www.hi-izuru.org/forum/deleted/2011-12-19-Michael%20kors%20bags%20on%20sale.html

    Googling the first part of the user’s email addresses (xiaoliu201123) shows that he/she/it posts junk all over the web. It looks like a blog spam bot. But apparently this bot found a way to make an account, login to the forum, and then post.

  9. DGH–
    That should certainly been a sign something wasn’t secure.

    OTOH….. one might suggest you are just taking xiaoliu’s comment ‘out of context’ and drawing a conclusion. 🙂

  10. In his response on CA, Robert Way absolutely destroyed the image of McIntyre as a fair, dispassionate commenter on climate science. There was no reason to publish Robert Way’s comments except to feed McIntyre’s ego and to “hurt” Mann–McIntyre’s enemy. Odd thing to do to somebody who supposedly sees the science as you do.

    As for Way’s comments on the science, I think he’s pretty much right. I would argue that MBH98/99 wasn’t completely worthless (and that, remember, is the NAS view). In fact, a reasonable Steve McIntyre might sound a whole lot like Robert Way. Instead we get meaningless and exaggerated statements like “Mann’s methods create hockey sticks.”

    But anyway, back to the wars I guess.

  11. I like this:
    http://www.hi-izuru.org/forum/Moderation/2011-03-31-Cadbury%20creme%20eggs.html
    Daniel Bailey

    Remember to use the “no follow code”!

    Sure is an awful lot of deaf & blind going on over there.

    CadMan is here (no follow code included):

    http://wattsupwiththat.com/2011/02/18/friday-funny-6/#comment-601868

    muoncounter

    As in rel=”nofollow” ? Didn’t know about that.

    Daniel Bailey

    Yup. Will keep Watts from tracking back to a firewall, which would help apprise him on the Forum’s existence (and scope, if everyone didn’t use it).

    Maybe I give him too much credit.

    John posted on the code here (the last comment):

    http://www.skepticalscience.com/thread.php?t=307&r=8

    Uhmmm… ordinarily the only function of rel=’nofollow’ is to tell Google and other search engines who elect to obey it not to follow a link. It has no effect on browsers which go ahead and set the referrer however the person who wrote the browser choses to do so. Ordinarily, they code the browser to set the referrer and ignore any ‘nofollow’ tag. So, unless John Cook coded something to notice ‘nofollow’ and then forced people through a magic “redirection” page, for only those links that contained this ‘nofollow’ had no effect on whether Anthony could ‘discover’ the old forum!

    We do know Cook later did write a poorly thought out redirect. But it would be intriguing to imagine he went to the trouble to look for silly ‘nofollows’ in the html!

  12. Boris,
    I don’t think Way destroyed McIntyre’s image as fair etc.

    . There was no reason to publish Robert Way’s comments except to feed McIntyre’s ego and to “hurt” Mann–McIntyre’s enemy.

    No reason? Your claim is utter nonesense.

    Publishing to show that even the more technically inclined at SkS agree with McIntyre has a function beyond either feeding Mc’s ego or hurting Mann. It can show the less technically inclined who might be unable to plow through the math that — in fact– people do agree with Mc’s math. Whether you “like” this reason or not, it is a reason. It’s i>a reason some — in fact many– people consider a good reason. It is certainly just as “good” a reason as doing something like saying “peer review” or “IPCC says” or pointing to anything that stands in for quality-by-showing people accept things.

    Instead we get meaningless and exaggerated statements like “Mann’s methods create hockey sticks.”

    You may not “like” this statement but it’s neither meaningless nor exaggerated. Some of Mann’s methods do create hockey sticks when noise is contains serial autocorrelation. The data Mann used when creating “his” hockey stick contained noise with serial autocorrelation. It’s simple to show both.

  13. I believe ‘rater fatigue’ is an issue Tol raised
    http://www.hi-izuru.org/forum/The%20Consensus%20Project/2012-03-20-Ari%20hits%203,000!!!.html

    Ari Jokimäki

    arijmaki@yahoo…
    194.251.119.199

    Well, actually, yesterday I did a good day’s effort (100+ ratings) and later I had some time, did some more and then realised that I’m not that far from 3k. So I ended up doing another 100+ ratings in the same day.

    Now, taken out of context, one might think that one stray guy was rating at a breakneck pace but other wiser sorts knowing that rating quality can be affected when someone goes on a red-bull fueled rating binge disapproved and stopped this.

    But lets look at some context.

    2012-03-20 22:07:52
    John Cook

    john@skepticalscience…
    121.222.175.176

    Yeah, what’s another 100 ratings, what the hey?!

    Dana, your cyborg title is in doubt…

    Cook seems to approve. And his comment suggest Dana also may engage in rating binges.

    2012-03-20 22:56:48
    Ari Jokimäki

    arijmaki@yahoo…
    194.251.119.199

    By the way, I think Sarah might still have the lead in most ratings during a day.

    Ari’s response suggests Sarah has rated more than 200 papers in one day.

    Of course on can read the full thread. (Sarah later says “I find it as adictive as some find gaming. And I really want to stamp out those big red numbers!”)

    One has to wonder what this might mean….

    John Cook

    john@skepticalscience…
    130.102.158.12

    Everyone’s suffering rater fatigue. Turns out 12,272 papers is a helluva lot of papers to rate – no wonder noone has cracked 10,000 papers before (the largest survey I know of was 8000 papers in the Netherlands).

    Presumably the admission of rater fatigue on the part of Cook would be an ‘out of context’ quote?

  14. OMG
    Cook:

    I just did a half hour exercise on the cross trainer and knocked off 30 ratings while I exercised

    Jokineen not only speed rated

    I also have had rather pleasant moments with rating; in the other day I practiced my guitar playing and rated papers at the same time. 🙂

  15. It can show the less technically inclined who might be unable to plow through the math that — in fact– people do agree with Mc’s math.

    I thought the NAS report showed that.

    You may not “like” this statement but it’s neither meaningless nor exaggerated. Some of Mann’s methods do create hockey sticks when noise is contains serial autocorrelation.

    Of course it’s meaningless when you use a graph’s resemblance to a piece of sporting equipment as a benchmark.

  16. Boris you’re letting your animosity towards McIntyre interfere with your ability to reason. Nobody is taking what you are seriously as you are just prattling nonsense as a result.

    I don’t think anybody saw this as abusive towards Way. Nor is he the only one to have his emails private correspondence published.

    Of course it’s meaningless when you use a graph’s resemblance to a piece of sporting equipment as a benchmark.

    That term has been around for decades before McIntyre used the term. And it’s a remarkably stupid argument even for you.

    Perhaps you should use Mark Twain’s advise about keeping your mouth closed in this case.

  17. After reading over Robert Way’s comments, I suggest he be careful about public statements which might be considered “unhelpful” by the climate pooh-bahs, or his career in climate science might suffer.

  18. Boris

    I thought the NAS report showed that.

    Nothing wrong with showing the finding is replicated. 🙂

    Of course it’s meaningless when you use a graph’s resemblance to a piece of sporting equipment as a benchmark.

    Your complaint about the meaningless term should be with Jerry Malman then
    http://en.wikipedia.org/wiki/Hockey_stick_graph

    The term “hockey stick graph” was coined by the climatologist Jerry Mahlman, to describe the pattern shown by the Mann, Bradley & Hughes 1999 (MBH99) reconstruction, envisaging a graph that is relatively flat to 1900 as forming an ice hockey stick’s “shaft” followed by a sharp, steady increase corresponding to the “blade” portion.[1][2]

    He’s dead, but perhaps you can use your time machine, go back to the time when he coined it, lodge a protest and persuade climatologists not to embrace the term. Then maybe McIntyre wouldn’t have used the term they were already using!

  19. I don’t think anybody saw this as abusive towards Way.

    Way did. Obviously publishing something that somebody wrote privately against their expressed wishes is “abusive” to some extent.

    That term has been around for decades before McIntyre used the term. And it’s a remarkably stupid argument even for you.

    How does the history of the term make it automatically meaningful and precise? Can you define “hockey stick” for us?

  20. Boris

    How does the history of the term make it automatically meaningful and precise?

    I’m not sure what your objection is. The graph is called a hockeystick. Is that a metaphor? Sure. Are metaphors imprecise. But the graph was called a hockey stick before McIntyre used it. Scientists use metaphors all the time especially to give qualitative descriptions. That’s how this term is used by both McIntyre and by those who applied it to ‘the’ hockey stick graph before McIntyre used it.

    If your complaint is that the term is qualitative— well… uhmmm yeah. The hockeystick metaphor is qualitative– and that’s how McIntyre uses it. There is nothing wrong with people using qualitative terms. Scientists use them all the time. Of course these uses are not “precise” because precision is irrelevant to qualitative descriptions.

    If you are suggesting that for some reason McIntyre is not allowed to use a qualitative term you are going to have to provide is the “special rule” that tells us that McIntyre alone is not allowed to use metaphors or provide qualitative descriptions while everyone else is not only permitted to do so, but are often encouraged to do so. Or possibly elaborate further.

  21. Boris:

    Way did.

    Which doesn’t make it abusive. It just means he saw it that way (possibly).

    He definitely complained about the tone of some of the comments on the thread towards him, and he was right about that, and that was responded to.

    Obviously publishing something that somebody wrote privately against their expressed wishes is “abusive” to some extent.

    Which would be whoever originally publicly released the correspondence, but likely this was not McIntyre.

    Reprinting information that was originally stolen, or just released without the permission of the people who authored, is not—as far as I know—unethical. Perhaps you can find some discussion of this in journalism, since blogging falls under under that category.

    How does the history of the term make it automatically meaningful and precise?

    It doesn’t. It shows that McIntyre didn’t coin the phrase. It’s an existing term of art.

    Can you define “hockey stick” for us?

    It has a straightforward mathematical definition:

    You can define a hockey stick as a horizontal line with a definite breakpoint followed by an ascending or descending straight line.

    One can easily develop metrics based on that definition for how well a hockey stick represents a particular times series.

  22. He’s dead, but perhaps you can use your time machine, go back to the time when he coined it, lodge a protest and persuade climatologists not to embrace the term. Then maybe McIntyre wouldn’t have used the term they were already using!

    It’s one thing to use a term like “hockey stick” as a metaphor for the shape of a graph. It is another to use the term as the conclusion of your argument.

  23. If your complaint is that the term is qualitative— well… uhmmm yeah. The hockeystick metaphor is qualitative– and that’s how McIntyre uses it.

    Right–and that’s what makes it meaningless. Does this metaphorical shape bias the conclusions? By how much?

    The implication is that it does bias conclusions. Skeptics believe this as a matter of faith despite all the evidence to the contrary.

  24. It has a straightforward mathematical definition:

    You can define a hockey stick as a horizontal line with a definite breakpoint followed by an ascending or descending straight line.

    Thank you. So we don’t even know if McIntyre is referring to hockey sticks with an ascending or descending blade. That’s about as meaningless as it gets.

  25. Boris

    It’s one thing to use a term like “hockey stick” as a metaphor for the shape of a graph. It is another to use the term as the conclusion of your argument.

    Huh? Qualitative, descriptive and metaphorical statements are permitted in conclusions of arguments and in summaries.

    Does this metaphorical shape bias the conclusions? By how much?

    The implication is that it does bias conclusions. Skeptics believe this as a matter of faith despite all the evidence to the contrary.

    Uhhmmm… read the rest of the paper to see what claims are made. If you concern is that some people might mis-interpret what the paper claims, possibly they do. But I don’t see how the term “hockeystick” is the cause.

    As for whether people do misinterpret– I guess that depends on what you mean by “bias”. The fact that a method mines for hockey stick does “bias” in the sense that if the person who uses the method is not aware of this when trying to estimate statistical signifince of finding they will be likely to deem a finding “statistically significant”. That is: they are biased toward false positives. Also: The fact a method is prone to creating hockeysticks where they do not exist makes the method “biased” to conclusions that “something remarkable” has occurred.

    So, yes, the fact that a method is prone to “hockeysticks” is biased in some sense of the word. It may not be biased in other senses. It’s not at all clear to me that ‘skeptics’ misunderstand what the paper showed. But even if they do, I don’t see how McIntyres using the “hockeystick” already embraced by the climate community to describe that graph caused any confusion about ‘bias’.

  26. Boris:

    Thank you. So we don’t even know if McIntyre is referring to hockey sticks with an ascending or descending blade. That’s about as meaningless as it gets.

    From context we do.

    You’re being an ignoramus and you’re hijacking this thread. I’m ignoring you for now. Later.

  27. Boris

    Thank you. So we don’t even know if McIntyre is referring to hockey sticks with an ascending or descending blade. That’s about as meaningless as it gets.

    So? I mean… seriously, your complaint is that one of the specific words used in a paper containing many many words does not, in and of itself, communicate every possible result or claim shown in the paper? Gosh, I’m going to find a place where a researcher used “the” in the conclusions and complain that the word “the” is not-quantitative and doesn’t communicate the full findings of the entire paper!

    In fact: given the method, if the historical record is a positive trend, the method gives gives an ascending blade even if the proxies are nothing but red noise. If the historical record is negative, it gives a negative blade.

    So the method tends to bias one to believe that trend in the historic record is confirmed by proxies and to believe it is somehow ‘remarkable’.

  28. Qualitative, descriptive and metaphorical statements are permitted in conclusions of arguments and in summaries.

    My problem is not with qualitativeness, but with meaninglessness.

  29. From context we do.

    And the impact on the recon–we’re supposed to guess that from context too? Oh, I remember a Carrick that cared a lot about evidence, where’d that guy go?

  30. Boris,
    We’ve already established that term of art “hockeystick” is not “meaningless”. It’s use is widespread even outside meteorology (e.g. biology). Application of the term to Mann’s graph is attributed to a meteorologist — who presumably does not consider it “meaningless”. The term was embraced by meteorologists– who clearly do not consider it “meaningless”. All these people think the term communicates something which it does. That a term communicates something ensures it is not “meaningless”.

  31. I mean… seriously, your complaint is that one of the specific words used in a paper containing many many words does not, in and of itself, communicate every possible result or claim shown in the paper?

    My problem is with vague statements–especially when they imply to certain readers things that are completely unsupported.

    If we agree that Mann’s method creates hockey stick shapes, but there is no evidence that these shapes bias the results of the reconstruction and there is evidence that the shapes are too small to have an effect, then it’s clear that leaving off that last vital bit of information is misleading at best.

    At best.

  32. Boris

    we’re supposed to guess that from context too?

    Who said “guess”. If you want to know what any term means “in context” you get the text containing the term and read it. In the cases you claim of these would either appear to be:
    (a) full journal articles containing many terms and citations to other papers.
    (b) blog posts and their comment threads, often linking to other blog posts.

    Telling you to read the surrounding material is not suggesting you guess.

  33. You’re being an ignoramus and you’re hijacking this thread.

    Petty insults and disengagement…classic Carrick!

  34. Lucia, it’s as if you don’t know that McIntyre never provides the effect on the recon.

  35. Boris

    My problem is with vague statements–especially when they imply to certain readers things that are completely unsupported.

    Ok. Previously, you seemed to be concerned about “meaningless”, or “not precise” and so on. Could you elaborate? Specifically:
    1) Which statement do you think is “vague”?
    2) What is the “certain thing” you think this ‘vague statement’ implied?
    3) Why do you think this “certain thing” is unsupported.

    Until you express your concern less vaugely, no one can engage really engage it. (I believe we’ve already engaged the claims of “meaningless” and “not precise”.

    On this:

    but there is no evidence that these shapes bias the results of the reconstruction

    Nonesense. There distorting the shape of a reconstruction in a systematic way is “bias” be definition. The paper shows that the method Mann uses will result in distortions in shape when the data contain “red noise”. The data docontain red noise. I don’t know how you can begin to claim there is “no evidence” these shapes bias the results of the reconstruction!

    there is evidence that the shapes are too small to have an effect,then it’s clear that leaving off that last vital bit of information is misleading at best.

    If you have evidence the bias due to the amount of red noise in the trees is too small to have a noticaeble effect, you should demonstrate this. That is: you should quantify what your claim by bringing in the specifics you claim McIntyre ought to have brought in. If you know your claim to be true, you ought to be able to do this (and have presumably either done it or seen someone do it.)

    As for “misleading”: leaving out information is only ‘misleading’ if
    (a) the author was aware of the information
    (b) knew it made a material change or at least thought others might think so.
    (c) and left it out of his full argument for that reason.

    If you think this occurred you should explain it. But I don’t see how mere use of the term “hockeystick” would amount to having done (a)-(c). I should think you have now moved onto an entirely different complaint.

    As for whether McIntyre’s actual paper was vaugue or misleading: at least at the time the McIntyres paper was written, it appears reviewers disagreed with your assessment. It appears that those in the NSA thought it sufficiently “not vauge” to actually understand and investigate the claims. Robert Way seem to understand what the claims were and so on. So I’m mystified by your suggestion that there is something to vague about the paper. I’m even more mystified that the “vagueness” is somehow due to the use of the term “hockeystick” which has been sued widely by climatologists who don’t seem to have any difficulty understanding what the term conveys.

  36. Boris

    Lucia, it’s as if you don’t know that McIntyre never provides the effect on the recon.

    I’m not sure what you are trying to say here. Are you complaining that McIntyre doesn’t come up with a “correction”? Of course not. His paper is a discussion of a flaw in a method which– owing the the flaw– gives results we can’t have any confidence in. As the claim in the original paper is meaningless unless we have confidence in it, the demonstratin that “we can’t have any confidence in the original result” is the clai in McIntyre. That is an important demonstration.

    I get that you may not like the fact that Mc’s paper shows we can’t have any confidence in “the hockeystick graph”. But that doesn’t make the claim “we can’t have confidence” in that graph either (a) vague, (b) meaningless, (c) misleading, (d) deceptive, (e) imprecise or any other negative adjective you have been applying to here.

  37. Heading out to lunch, but Van Storch and Zorita 2005 shows that the effect doesn’t change Mann’s result.

    AS for the vagueness of ” hockey stick” when you use it to mean simply an upturned graph that can create confusion.

  38. lucia:

    His paper is a discussion of a flaw in a method which– owing the the flaw– gives results we can’t have any confidence in. As the claim in the original paper is meaningless unless we have confidence in it, the demonstratin that “we can’t have any confidence in the original result” is the clai in McIntyre.

    In fact, when we look at newer reconstructions, we find that no only should we have no confidence in MBH98/99, we find that his results are not reproduced by new methods. Including Mann’s own methods.

    Figure.

    If MBH98/99 had gotten the same result as newer reconstructions, it would have been pure luck.

    Regarding Van Storch and Zorita… it’s a mistake to only look at one error at at time, and assume there is no interaction between them.

    Lunch is over so I’ll leave Boris to argue what “context” means and other drivel that he wishes to use to hijack this thread.

  39. As I said before, of course Way doesn’t like McIntyre’s posting his comments; it will harm him in the society of his friends at SkS. That doesn’t make it unethical by McIntyre: in a war you use the tools you have. SkS would do the same if they got dirt on McIntyre: they say so in that same thread I posted before http://climateaudit.org/2013/11/20/behind-the-sks-curtain/
    They (including Way) are just sad they don’t have any dirt yet.

  40. Boris,
    Do you mean the paper whose last sentence is “Other concerns raised by MM05 [see, e.g., Crok,2005] about the MBH methodology have been not dealtwith.” ?

  41. Way never explained how he would be hurt professionally by having his criticism of Mann made public. It’s not like other climate scientists aren’t in that forum. So Mann would already know about it.

  42. MikeN, and more to the point, any harm that occurred happened when Way’s comments were publicly released.

    Moreover, as many of us have pointed out, it’s hypocritical to censure other people for redistributed the unauthorized SKS files but to themselves redistribute the known-to-be stolen Heartland documents.

    I don’t have a problem with Heartland’s documents being redistributed (though I think Gleick is lucky to not have served jail times). I don’t have a problem with the SKS files associated with the Consensus Project redistributed—I think it should have been made available to other researchers, the same as lab books of students are when there is a dispute.

    I also think this whole “I can’t believe you are distributing these stolen documents!” meme is an agreed to strategy by the SKS group for combating the dissemination of the SKS forum documents.

  43. lucia, your response to Boris misses a key point:

    Lucia, it’s as if you don’t know that McIntyre never provides the effect on the recon.

    I’m not sure what you are trying to say here. Are you complaining that McIntyre doesn’t come up with a “correction”? Of course not. His paper is a discussion of a flaw in a method which– owing the the flaw– gives results we can’t have any confidence in. As the claim in the original paper is meaningless unless we have confidence in it, the demonstratin that “we can’t have any confidence in the original result” is the clai in McIntyre. That is an important demonstration.

    Steve McIntyre did say what the effect on the reconstruction was. In fact, following from his examination of the stupid methodology Michael Mann used, McIntyre figured out far more than that. He was able to show what exactly data caused the hockey stick, and why it received so much weight.

    Heck, I’ve done what Boris claims McIntyre never did. So have many others. He’s just making things up.

  44. MikeN,
    ” So Mann would already know about it.”
    Sure, but statements made in a private setting may illicit a different reaction once they become public. Way was straight out saying Mann’s statistical methods in both MBH98 and Steig et al (2009) were rubbish; Mann’s reaction to that sort of thing in the past has not been good.

  45. On damage to Way: Perhaps Mann’s irrational reactions to criticism could harm Way. But the blame for whatever vengeance Mann might seek when learning about criticism should fall on Mann, not whoever happens to quote Way.

  46. I would think that Way would take more damage than just from Mann. Remember, he was part of a secret cabal (SkS forum) with the goal of winning the propaganda battle against the evil deniers. Now McIntyre scored a PR coup by quoting Way saying that he is right all along. Way’s whole secret cabal must have been very upset with him: why did he have to say those things on their public private secret forum?

  47. It sounds like Way has picked a bad set of friends, and made some other poor choices if he is damaged by accurate quotes of what he actually said.

  48. My point is that any reaction by Mann would happen regardless of whether it was publicized by Steve McIntyre. Mann would have already known of the criticisms by Way when he originally made them in the forum. The statements were noticed enough that there was no counterargument to upside-down Tiljander placed at that site. So it is likely that Way is already on Mann’s enemies list prior to McIntyre’s posts.

  49. DGH, the RankExploits post for that is here

    Rather an interesting thread, even if Dana didn’t think it should have been a post. 😉

  50. Possibly my asking Dana whether he would acknowledge that charlie was right and Dana was wrong was “shrill”. Or possibly being correct was being “shrill”. Who knows.

  51. DGH,

    Clicking on your link is the first time I’ve actually looked at any of the “stolen” e-mails.

    Amazing what all those he-men like to brag about.

    It’s interesting that they spend so much time…er…harping about how much “harping” was going on here!

    And didn’t Tamino ban Lucia from his site after she sliced and diced him?

    And then they brag that she doesn’t go over there? (OK, that last one may be rhetorical.)

  52. Carrick,

    Actually, I perused RankExploits and the two SKS posts on this issue before posting. When Dana modified his graphic he created a new post at the SKS site. Lucia kept poking at the follow-up post after s/he made the update.

    And that’s the ironic part. The forum provides context in this instance. The context is that the SKS kidz are quite afraid of Aggro Lucia.

  53. Two quick and minor clarifications.

    John M, these aren’t e-mails. They’re messages posted on a message board.

    DGH, Dana Nuccitelli is a he.

  54. John M,

    The reaction of the SKS kidz to Brandon posting their forum will be interesting. I’ve bugged several of the team members about the fact that they continue to host/link to the documents that Gleick stole while they complain about their forum being “stolen.”

    As for irony, let’s just say that the whole thing is dripping in it.

  55. JohnM,

    And didn’t Tamino ban Lucia from his site after she sliced and diced him?

    And then they brag that she doesn’t go over there? (OK, that last one may be rhetorical.)

    Yes. Tamino posted a “two box model”. Made a claim it was “physical” because he’d imposed 1st law of thermo on system. I asked if he’d checked 2nd. He said yes.

    Then one of my readers asked how one would check. I checked and his system violated 2nd law of thermo. Then he banned me. The announcement is verbal, but I figure if a blog owner says I’m banned, I’m banned. I don’t comment there because I respect Tamino’s right to ban. That’s pretty much it.

  56. Heck, I’ve done what Boris claims McIntyre never did. So have many others. He’s just making things up.

    When you don’t know what is being discussed, it can indeed seem like people are just making stuff up.

  57. Lucia,

    I get that you may not like the fact that Mc’s paper shows we can’t have any confidence in “the hockeystick graph”. But that doesn’t make the claim “we can’t have confidence” in that graph either (a) vague, (b) meaningless, (c) misleading, (d) deceptive, (e) imprecise or any other negative adjective you have been applying to here.

    I’m talking about a specific claim that McIntyre has made repeatedly and that is repeatedly repeated by skeptics. A claim that is technically “correct”, but only because of the extremely vague definition McIntyre uses for the term “hockey stick.”

    Yes, that is the Van Storch paper.

    DCA,

    I object to that on many levels, but none of them are scientific in nature.

    Carrick,

    Regarding Van Storch and Zorita… it’s a mistake to only look at one error at at time, and assume there is no interaction between them.

    Even if we accept this as reasonable, it doesn’t absolve McIntyre from his ridiculous misleading statement.

  58. A sample of forum comments
    2011-09-23 10:03:27 gryposaurus@gmail…
    Zeke left a comment. He’s trustworthy. I’ll fix his link and thank him.
    Albatross Julian Brimelow
    Watch out for Lucia, she is angry and aggressive.
    nealjking dana, The questioner is Lucia, so it’s probably worthwhile to deal with that carefully.
    Dana Nuccitelli Yeah that’s fine. I probably should have looked for the data instead of digitizing the graph
    Julian Brimelow
    Watch out for Lucia, she is angry and aggressive. She was trying to tell Neven how to do stuff. Note though– she stays away from Tamino’s site, so she is just smart enough to cause trouble,
    Alex C
    The problem here is what running average you use.
    Without that smoothing you get a negative trend.
    dana1981
    Dana Nuccitelli
    Alby – correct, it’s a pretty minor change. It changed the model mean trend from 0.12°C to 0.18°C from 2000 to 2010 – basically 0.3°C higher than the observed trend as opposed to 0.3°C lower.
    I’m actually glad for the update, because in this case the digitization wasn’t accurate, and I’m going to put this in my book, so I’d prefer it be reasonably correct. I just hope lucia doesn’t harp too much. She’ll probably write a blog post about what a dolt I am or something 🙂
    Julian Brimelow
    Huh, she says that she is willing to believe you and then goes on to try and insinuate that you should have known that something was wrong but that you forged ahead regardless. That is implying dishonesty and/or incompetance [?]. It is innuendo, and i would not be surprsied if she has a post along those lines soon. For crying out loud he made a mistake, he accepted that he erred and then he fixed it with lightning speed– what more do they want? Oh right, blood….
    Did you mean to say 0.03C/decade? A didfference of 0.3 C sounds too high…
    Could someone please politely deal with her innuendo in that comment. I can’t i’m waaay to crabby tonight
    Dana: Check your mail box, we have some feedback.
    Dana Nuccitelli Yeah Alby, I meant 0.03°C. It’s been a long day 🙂

    [For the record 0.12°C to 0.18°C is 0.06 not 0.03]
    brackets are my comments

  59. They’re doing smoothing before they calculate trends? Do they remember to expand the CI as a result of losing degrees of freedom from smoothing?

    Probably not.

  60. 2011-10-08
    Kevin Cowtan, from York, UK. I’ve got a PhD in computational physics, and am a long standing post-doc with fellowship-in-the-pipeline working on computational methods development in X-ray crystallography.
    maintain
    2010-08-15 Robert Way
    I am a Masters student at Memorial University of Newfoundland in Eastern Canada. at the University of Ottawa in Geography with a minor in Geomatics and Spatial Analysis.
    My primary interests lie in paleoclimatology, remote sensing of techniques for glaciers and ice sheets, and ocean-atmospheric dynamics.
    My 2 poster boys from SS with their brilliant science

    Kenneth Fritsch | July 14, 2014 at 10:55 am linked at Climate etc
    “From the Cowtan Way paper linked by Carrick above we have:
    firstly the station network in the high Arctic is sparse,
    and secondly the Arctic has been warming rapidly at the same time as boreal mid latitudes have shown a cooling trend,”
    Amazing no [sparse] data yet complete knowledge of warming and cooling trends.

    followed by “The close proximity of regions of warming and cooling on both the Eurasian and Alaskan Arctic coasts mean that it is possible for neighbouring stations to show a very different temperature trends. Automated homogenization could potentially introduce unnecessary adjustments to reconcile these trends.”

    So they go against science and Mosher who said it is a fundamental axiom that close things are more similar than distant things and actually include this flaw in scientific thinking in their algorithms

    Kenneth agrees we need to change science and thermodynamics and says,
    “Close proximity of stations with significantly different temperature trends, as calculated from difference series – while still maintaining reasonably good correlations – is something that is not unique to the Arctic region – although the difference might be greater in that region. The weaknesses that Cowtan Way point to in this algorithm are going exist to some extent in other regions and localities. We need a benchmarking test that will demonstrate potential weaknesses in all temperature adjustment algorithms.

    Yes fundamental science is now a weakness.
    Chuck homogenization in the bin.
    Who cares if A is close to B and C is on the other side of the world A agrees with C so B must be wrong.
    Not a bit like Zeke’s breakpoints really.
    sorry if too off topic [these guys post as SS authors] please delete.

  61. @ Angtech,

    Those sample forum comments are timely following Boris’s weak attempt to deflect conversation away from the original topic and follow some semantic wild goose chase. A common tactic at Judith’s blog. SKS forum comments regarding “smoothing before trend” to gin the resulting model mean trend clearly define making stuff up and calling it science.

    No surprises here.

  62. Brandon,

    The s/he was a reference to Lucia’s confusion about Dana’s gender at that time. In light of the Kidz calling her “shrill,” I found her several attempts at avoiding sexist assumptions funny.

  63. P.S. The Very not. phrase reminded me of this excellent bit of logic from The Big Bang Theory:

    Stuart: Oooh Sheldon, I’m afraid you couldn’t be more wrong.
    Sheldon: More wrong? Wrong is an absolute state and not subject to gradation.
    Stuart: Of course it is. It is a little wrong to say a tomato is a vegetable, it is very wrong to say it is a suspension bridge.

  64. Boris

    I’m talking about a specific claim that McIntyre has made repeatedly and that is repeatedly repeated by skeptics. A claim that is technically “correct”, but only because of the extremely vague definition McIntyre uses for the term “hockey stick.”

    Yes, that is the Van Storch paper.

    Could you specify the specific claim you are talking about?
    Ok. So it’s the paper that did not address McIntyre’s full paper, but only a sub-issue. So, it doesn’t actually quantify the effect of all problems collectively.

  65. angech (Comment #130974)

    Interesting ‘secret forum’ discussion. Dana’s claim he changed things at lightening speed is absurd.

    Charlie A asks… Dana sticks with wrong value telling Charlie Charlie is wrong. Bishop Hill blogs… Dana sticks with wrong value. I tell him– and provide a link to an official source. Dana sticks with wrong value. Zeke tells him.

    Finally the ‘secret forum’ thinks someone ‘trustworthy’ has spoken. Now Dana checks. This is rarely called “lightening speed”.

    Hilariously, he says he wants it to be right for his book. Well… If he’s really eager to get things right for his book, checking when Charlie asked might have been a useful thing to do.

  66. Lucia,
    ‘Trustworthy’ in many circles is limited to those who share your personal values and world view. Unfortunate, but I think quite true. I think there are two parts to this behavior: there is real lack of trust (‘that a-hole disagrees with my views, so is probably wrong about most everything.’), and there is a strong desire to never admit your opponent is correct, since this gives credibility to someone you want to discredit.

    Pointing out once again that the fundamental disagreement is about values and world views rather than science. Greater disagreement about values and world view makes constructive technical communication much less likely. SkS is but one of many blogs where technical communication is almost impossible if you disagree with the dominant values and world view.

  67. SteveF,
    The thing is, Dana kinda-sorta should have suspected his numbers might be incorrect and double checked before any “untrustworthy” people were presented his results. He knew the nominal value was 0.20 C/decade. He got a value that was only 60% of that value. Eyeballing the graphs, you could see the trend is almost linear during the first few decades of the century.

    The his “result” has the rather starting conclusion that the observed trend is for faster warming than the projections — and tons of people have been saying the opposite. Yet, Dana posts this. Then when the first person commenting notes that the method of the eyeball suggests Dana’s claim is wrong, Dana persists in insisting he’s right. And so on….

    Also: in background comments, it’s clear that no one at SKS is considering the possibility that they ought to check! They are complaining that I am “shrill” or “just smart enough” or whatever. But it doesn’t seem to occur to them that the people telling them the numerical values in the Dana’s post are simply incorrect nor does it seem to occur to them that they should check!

    Whether or not they consider me “trustworthy”, this appears to be ‘confirmation bias’ in the first place (the mistake gave Dana an answer he ‘liked’) and strange wagon circling in the second.

  68. shrill – (of a voice or sound) high-pitched and piercing.
    Funny. I’ve never heard Lucia’s voice, but somehow I always imagined a smooth contralto.

  69. Lucia,

    For sure Dana suffers from confirmation bias, as most all of us do; his is just a particularly bad case. 😉 He should of course always check a claim of factual error, but that does not mean he will.
    .
    Had it been Zeke, Gavin, or some other more ‘trustworthy’ source who initially pointed out that Dana’s numbers looked wrong, I suspect he would have checked immediately. The wagon circling behavior is mostly what I was talking about: don’t take the word of anyone outside your trusted group, even in matters of substantive fact, because, mostly, you don’t like the way they think about things, and so really don’t want to believe anything they say. It is a common behavior in all kinds of groups, but strongest where the values and world view being ‘defended’ against opposing views are personally important and fervently held, like some religious sects, some political groups, and…… um….. climate scientists. The part which is most humorous (in a dark sort of way) is how blissfully unaware of their own prejudices and misbehaviors the most fervent, of all persuasions, usually are; I find the lack of self awareness amazing.

  70. Mark Bofill,

    Funny. I’ve never heard Lucia’s voice, but somehow I always imagined a smooth contralto.

    Soprano. Chicago-area-nasal. Not good….

  71. SteveF–
    Correct. Choir directory sometimes sat me soprano, but if they divided us up, I usually sang 2nd soprano. Depending on the choir, I was sometimes 1st, sometimes first alto. So…. generally, I sang around “2nd soprano”. Never contra-alto.

  72. I personally think Dana is the one who was sounding shrill on that thread. On this one, it’s obviously Boris.

    lucia:

    So it’s the paper that did not address McIntyre’s full paper, but only a sub-issue. So, it doesn’t actually quantify the effect of all problems collectively.

    That is well put.

    Likely the reasons give by McIntyre and McKitrick contributed to the problems of this paper:

    is that the values in the early 15th century exceed any values in the 20th century. The particular “hockey stick” shape derived in the MBH98 proxy construction – a temperature index that decreases slightly between the early 15th century and early 20th century and then increases dramatically up to 1980 — is primarily an artefact of poor data handling, obsolete data and incorrect calculation of principal components.

    When it is compared to more recent reconstructions.

    If the newer reconstruction are closer to the truth than this old paper that was full of problems (I think there were more issues than M&M addressed in their paper), then fundamentally the result of this paper, whether you want to call it a “hockey stick” or “Mann’s schtick”, is badly flawed and misleading.

    Moreover, the response of the community then was interesting—rather than admit the paper was likely flawed, it got a prominent position in the next two IPCC reports.

    Classic academic circle the wagon. Defend ourselves from outside attack. Truth be damned.

    The phrase “hockey stick” even appears in those technical documents, which is a bit ironic given Boris’s denial that “hockey stick” is a useful descriptor of Mann’s curve or could be defined precisely enough to be used in a scientific context.

    By the way here, here’s McIntyre & McKitrick’s list of errors:

    (a) unjustified truncation of 3 series;
    (b) copying 1980 values from one series onto other series, resulting in incorrect
    values in at least 13 series;
    (c) displacement of 18 series to one year earlier than apparently intended;
    (d) unjustified extrapolations or interpolations to cover missing entries in 19 series;
    (e) geographical mislocations and missing identifiers of location;
    (f) inconsistent use of seasonal temperature data where annual data are available;
    (g) obsolete data in at least 24 series, some of which may have been already obsolete
    at the time of the MBH98 calculations;
    (h) listing of unused proxies;
    (i) incorrect calculation of all 28 tree ring principal components.

    As long as the list is, it is incomplete.

    What is missing from this series is “bias in selection of proxies”.

    I’m not convinced that if you did everything else correctly, but retained exactly the same data set, that you would get anything close to what is obtained with the newer papers.

    So the criticism that McIntyre & McKitrick never published the “correct reconstruction” using MBH’s data set is itself flawed. It assumes that you can have an algorithm that is good enough, that you can input sow’s ears on one side and output silk purses on the other.

  73. Carrick,
    The thing that sticks in my craw with the whole Hockey-Stick controversy is that the historical evidence (written records for goodness sakes!) for the Medieval Warm Period and the Little Ice Age, both extensively studied and carefully documented in hundreds of journal articles, got thrown out of the IPCC report by a newly minted PhD trumpeting his own work…. and the IPCC pooh-bahs applauded. That single episode pretty well sums up for me what is wrong with the entire IPCC process: they start with a politically motivated, desired outcome and ‘do whatever it takes’ to support that outcome, including distortion of whatever accurate information is present in the report when the Summary for Policy Makers is written. In AR5 the desired outcome was ‘defend the plausibility of high climate sensitivity against all contrary evidence’, and that is exactly what AR5 does; not a bit of discounting of the wacko-high sensitivity values that are used to ‘demand’ draconian public action. It would all be laughable if it were not so wasteful.
    .
    The IPCC is an organization that has long since outlived whatever usefulness some may have imagined it once had; it should be disbanded and the money spent on more productive endeavors. Pachauri can then devote full time to writing pornographic novels.

  74. DGH, d’oh! I forgot people were doing that here way back when.

    Carrick:

    So the criticism that McIntyre & McKitrick never published the “correct reconstruction” using MBH’s data set is itself flawed.

    It’s also flawed in that they did publish the effects of correcting the various errors. That’s what led to Michael Mann repeatedly criticizing their “reconstruction” for lacking statistical skill (even as recently as in his book). Showing what happens when you correct mistakes is obviously not the same as saying you get a correct answer when you correct mistakes, but the Team never paid attention to distinctions like that.

    And Boris just seems to ignore the entire thing. According to him, it never happened.

  75. SteveF:

    That single episode pretty well sums up for me what is wrong with the entire IPCC process: they start with a politically motivated, desired outcome and ‘do whatever it takes’ to support that outcome, including distortion of whatever accurate information is present in the report when the Summary for Policy Makers is written.

    Yep. And it was very damaging to their credibility in the larger scientific community.

    Brandon, agreed.

    It occurred to me that if you really wanted to test MBH98’s algorithms, you’d use it on a more modern proxy network, e.g., Ljungqvist for example. I predict you’d get a classic Mannian hockey stick. But it’d be interesting to see.

  76. Ugh. I’m sure you’d get a hockey stick out of the Ljungqvist proxy network. The Christensen and Ljungqvist papers used networks which just recycled proxies used by other paleo reconstructions, doing nothing to ensure their usefulness or representativeness. It’s not cherry-picking, but it’s relying on choices made by people who did cherry-pick.

  77. Brandon, I don’t think Ljungqvist is so terrible when you compare it to Loehle for example, but they do likely overstate the confidence in their results.

    Regardless… if we think the newer methods are less flawed, running Mann’s old algorithms on the the same proxies as a newer reconstruction should yield objective information on how “far off” MBH was.

    The problem with that is I don’t think his old code actually works. McIntyre has an emulation of it, but Mann will never accept that as an accurate representation of his own program.

    Still, the bigger issue you can’t make silk purses from sows ears, and most of these proxies aren’t even good sows ears. The biggest issue is GIGO.

  78. Carrick, my issue with it wasn’t the methodology. It was their choice of data. They didn’t even attempt to get a representative sample. They just trusted the choices other people made about what data to use. That’s a problem as no methodology can produce good results if the data put into it is cherry-picked. Put simply, their results were skewed by their use of an unrepresentative sample.

    As a first step, you can go through their data and see the primary proxies which contributed to their shape for their long reconstructions. If you do, you’ll find many of them are ones which people have long criticized as inappropriate for use in temperature reconstructions. As a second step, you can go through their data and see there are plenty of other proxies they could have used. They didn’t use those ones because other people haven’t used them, because those ones don’t give answers those other people like.

    Michael Mann’s methodology cherry-picked data, but he also cherry-picked parts of his data. Ljungqvist’s methodology doesn’t cherry-pick data, but a larger portion of his data was already cherry-picked. I’d approve of testing various methodologies on different data sets and seeing how the results compare. It could tell us a fair deal about those methodologies.

    I just want to be clear it can’t tell us much about reality.

  79. SteveF,
    re:(Comment #130992)
    That is probably the best single critique of the climate crisis I have read in quite a long time.

  80. Carrick, your list of issues is taken from our 2003 paper which was written without the benefit of Mann’s actual data as used. The 2005 papers (GRL is commonly cited, but the EE 2005 paper covers a great deal more ground) contain much additional analysis.

    As to our emulation, it reconciled to six nines accuracy to the Wahl and Ammann emulation after one small tweak. Wegman ironically observed at the time that Wahl and Ammann emulated Mc-Mc rather than Mann, though you’d never know it from their text.

    Mann’s code didn’t work, but, like all of Mann’s code, lacks mathematical insight and contains vast and repetitively irrelevant calculations. Mann et al 2008 is even worse. I ultimately reduced MBH calculations to a few lines. One of many things that would have been worth writing up formally.

    Ultimately the question for MBH98 is whether Graybill’s bristlecone chronologies are magic thermometers. The constribution of all the other proxies is nondescript noise; the entire shape is imparted by the bristlecones.

    While much blog discussion of Mann 2008 has been about Tiljander, it also used the bristlecones again. Amusingly, Mann’s bristlecones are weighted heavily in both his NH and SH reconstructions.

    I’ve spent far more time recently looking at proxies through the Holocene where there is a “louder” signal. This guards against the practice of orienting series by ex post correlation rather than ex ante knowledge of the proxy. I have a great deal of work in inventory, but have not figures out how I want to go about writing it up.

  81. “So they go against science and Mosher who said it is a fundamental axiom that close things are more similar than distant things and actually include this flaw in scientific thinking in their algorithms”

    actually its true.

    for example, simply compare the AIRS measurements of Arctic temps with the fields generated using sparse data and the fundamental axoim.

    dont like that, use reanalysis.

    dont like that, use MODIS LST that will give you some idea

    dont like that, use the data held by researchers but not released to the public… yet. opps you cant, but some can

    dont like that? there is a whole new big pile of data from the north that I hope will be available soon.

    in any case, you always need to make assumptions. Assume that the temperature 1 km away from you is more likely to be the same as opposed to temps 2000km away

    Of course you can prove it wrong. go find arctic data that is not sparse and demonstrate it.

  82. Steven,
    What you said is true or the flaw is true?
    Sorry but your comment was a little ambiguous.
    Taking it that you are now defending Cowtan/Way against yourself
    I.e. saying that areas further apart can be more similar than areas close together when you only have sparse records in the first place is a bit Yogi Berrish.
    Or am I wrong and you are actually agreeing with yourself and me?
    If you only have 3 or 4 data points and the two furthest apart are the same it is not right to fill in hundreds of points between using the rational that two distant points are the same. Heck you just said it yourself. Either leave the data alone or infill as you do for everything else (the old trusted and true method) whereby, as you said, temperatures 1 Km away are more likely to be close than 2000 k away.
    Or use satellite data where available which may be able to give the whole blooming lot.
    Not unjustified, by this flawed method, Kriging.
    Again sorry if you are agreeing with your past comment

  83. SteveF (Comment #130992)

    What bothers me is that almost all, if not all, temperature reconstructions that I have read and analyzed use a very flawed basic concept and method of selecting proxies. Those doing the reconstructions cannot select which proxies to use after the fact of measuring the correlation with the instrumental record without biasing the reconstruction towards a modern warming, i.e. a hockey stick. Extracting a temperature signal from very noisy proxy data would require a prior criteria for selection of proxies and then using all the data in attempts to “average out” the noise in order to find a temperature signal. Even that process fails if the noise changes over time, but unless one makes these efforts one will never know if these proxies can be used as approximate thermometers.

    We can argue over the details of how the data obtained by this flawed method are manipulated – as perhaps an exercise in determining how well those doing the reconstructions understand the statistics involved – but those exercises deflect the attention away from where these reconstructions go wrong from the start.

    If the IPCC and defenders of these reconstruction works can overlook this basic flaw, I cannot reach any other conclusion than that these people will accept uncritically any result that agrees with their point of view on AGW.

  84. Kenneth,
    Certainly, if this was lab data and could be obtained easily everyone would agree that you first discover whether certain ‘types’ of trees can be used as ‘proxies’ using one set of proxies. You find out all your ‘fitting’ parameters based on the trees you use to ‘figure out’ how this ‘type’ of tree response to temperature. You can use the historical data for this– as there is no alternative.

    Afterwards, you get new, fresh trees whose characteristics you think make them the ‘same type’ of tree and use those to create your reconstruction. After you create it, you compare the reconstruction to the historical data.

    This would solve many of the arguments about proxies. While it is true that the errors in ‘sieving’ proxies against the historical record may be small, it also may not be, and whether it is or is not depends on many contentious issues. So, if getting proxies was inexpensive, no one would leave things like that if it were easy to obtain proxies. Both defenders and detractors of the reconstructions done with ‘sifted’ proxies would both be motivated to just get fresh proxies.

  85. “So, if getting proxies was inexpensive, no one would leave things like that if it were easy to obtain proxies.” But can’t you do this now with cross-validation? Start with all the proxies there are, randomly leave out half the proxies, develop your sorting criteria on the other half, then check them on the half you left out.

  86. Mike R–

    How do you pick which proxies represent “all the proxies” from which you pick half? Are “all the proxies” filtered in the first place? Or are they randomly core samples (or whatever) from all over the world based on … well.. no criterion at all?

    It seems to me there is always going to be a selection process of some sort. There has to be a selection process. No one is going to grab anything and everything including stuff no one thinks responds to temperature. If the initial selection is “sieving” to creat a big pool one refers to as “ALL” the proxies (but really mean “all the things that passed the first sieve”, testing the effect of a subsequent selection from a set of pre-sieved data cannot resolve the question of whether sieving causes an issue.

    Your still stuck with the question of whether the initial sieving caused a problem.

    I’ve known people who thought they could “calibrate” instruments using the data later reported to also be ‘test’ data. It’s extremely difficult to get these people to understand why the ‘calibration data’ and the “test data” need to be entirely separate data. But they really do need to be entirely separate data if one is going to try to interpret what the ‘test’ data tell us.

  87. “testing the effect of a subsequent selection from a set of pre-sieved data cannot resolve the question of whether sieving causes an issue.” Is the question of whether sieving causes problems an issue which is unresolved? For every sieving criterion? If there is, then of course the first sieving should not include criteria that are controversial. But if there are indeed things that “no one thinks correspond to temperature”, that means that there are criteria that are not controversial, and pre-sieving by those criteria should be acceptable to all.
    In short, I’m not sure what your question is.

    “I’ve known people who thought they could “calibrate” instruments using the data later reported to also be ‘test’ data.” This seems to be a different question, which is that people don’t follow best practices. I know that McIntyre continues to document that many core-samplers and the like just haven’t archived all their data, even after many years. Their results are based on whatever undocumented pre-screening they used. I would think that the right response to this is, “Archive all your data, with full meta-data included. Once you have done that, we will consider accepting analyses based on it. Until you have done that, nothing you produce can possibly be useful or acceptable. We just have no way of judging it or trying to work with it.”

  88. MikeR,
    I guess if you don’t understand my question, I guess I don’t understand the solution you were suggesting here:

    But can’t you do this now with cross-validation? Start with all the proxies there are, randomly leave out half the proxies, develop your sorting criteria on the other half, then check them on the half you left out.

    Pretend you haven’t read my previous comment. No explain: How do you “start with all the proxies there are”? How do you know something “is” a proxy in order to put it inside the group you consider to be “all the proxies there are”?

    For example: Hypothetically, archeologists dig a 100 core in the middle of Nebraska. It turns out due to the fact that formerly living things decayed, one can do radio carbon dating and date the strata. Is that cores some sort of “proxy”? Or is it ‘not a proxy’?
    This is not rhetorical. We can come up with all sorts of things that can be dated. How do we know if they are “proxies” that we include in “all the proxies”?

  89. lucia (Comment #131003)

    Lucia, if we could do lab experiments under controlled and repeatable conditions with proxies none of what I say above about selecting proxies after the fact being wrong would apply. I think some of the more forgivable problems with the failure of people who should know better to properly criticize the basic flaw in reconstructions stems from the failure of the hard science inclined to appreciate how different reconstructions and lab experiments are and the different statistical considerations required.

    Lucia, in your answer to Mike R you hit upon the problem of using a proxy data in cross validation. Obviously one can use part of the data to calibrate and the other to validate and some reconstructions do this, but that does not preclude the original after-the-fact selection based on either the calibration and/or validation results to bias the reconstruction. Like you say no one is going to select data that shows no correlation to the instrumental record. However, if the one thinks that the temperature signal is buried in a noisy signal and if one did the selection a prior and based on some hard empirical/theoretical considerations then one would not have qualms about using proxy data that shows a poor correlation and in fact would know that the averaging out the noise process would require using all the data (beyond that data that could be legitimately considered as outliers for known reasons).

    My observation that these considerations for using proxies that we discuss here are either waved at or not considered at all in the reconstruction literature leads me to believe in a strong research/experimenter bias prevailing in this community. I see for tree ring reconstruction few alternatives to getting it right other than laying out a hard a prior selection criteria. Other reconstructions that use proxies that have a more direct physical relationship to temperatures, such as isotopic fractionation, I would judge to be more promising than tree rings, but even with those proxies there are confounding factors like where precipitated water originated and getting the calibration and measurement right when attempting to proxy small temperature changes.

    Beyond what we are discussing here, a bias is even more evident from what SteveM noted in that a single proxy can strongly influence the results of a reconstruction when used in combination with many others that appear as white/red noise over time. What is most revealing to an uninitiated observer is seeing many of the proxy series as individual series for the first time and how much those series appear as noise. Now if one can claim that these very noisy proxies all have temperature signals buried within then one is obliged to use all the proxy data and even data that might appear as the anti-Bristle Cone.

  90. “What you said is true or the flaw is true?
    Sorry but your comment was a little ambiguous.
    Taking it that you are now defending Cowtan/Way against yourself
    I.e. saying that areas further apart can be more similar than areas close together when you only have sparse records in the first place is a bit Yogi Berrish.

    You dont get it. You simulate having sparse stations to test the method. Further in some places where we THOUGHT we had sparse stations and applied the method, we have now found MORE data. This data is used to show that you are wrong.

    ###############
    Or am I wrong and you are actually agreeing with yourself and me?
    If you only have 3 or 4 data points and the two furthest apart are the same it is not right to fill in hundreds of points between using the rational that two distant points are the same.

    huh,

    Heck you just said it yourself. Either leave the data alone or infill as you do for everything else (the old trusted and true method)

    the old trusted method? yes kriging.

    it is simple to test. and you havent.

    go away and do some work. talk to mwgrant, he will help you

  91. “For example: Hypothetically, archeologists dig a 100 core in the middle of Nebraska. It turns out due to the fact that formerly living things decayed, one can do radio carbon dating and date the strata. Is that cores some sort of “proxy”? Or is it ‘not a proxy’?”

    Not following too well, because of my ignorance. I understood that there are theoretical grounds for thinking that certain features of cores serve as (possible) proxies for temperature: tree-ring thickness, isotopes in ice cores,… Did you have one of those in mind? Are you going to hand me a piece of cardboard you found and ask me to find the local temperature in 1260 CE?
    I guess I would take all the cores of various type, along with the various theories on why and how they might represent temperature. Your core would not be among them unless you suggested a way to measure temperature with it. If you had such a way, and a lot of such cores, we could try it as I suggested.

  92. MikeR,

    Did you have one of those in mind?

    I’m asking you which attributes the core has to have for you to consider it one of the proxies that fall in the category of “all proxies”. Do theoretical grounds need to exist? Does the specific theory need to be validated using some other set of proxies (not the just the ones used in the reconstruction?) How broadly does the theoretical validation need to go? So for example: if this is a bunch of larches grown at some elevation and some “special” conditions, has the theory been tested on all larches? Another set that matches the condition? And so on.

    Or was this set selected because someone merely “speculated” that this should be a treenometer, then tested and confirmed by finding a positive correlation.

    Your core would not be among them unless you suggested a way to measure temperature with it.

    What if I just “suggested” you measure date by distance measured “down” from a line I marked “top” and temperature based on the density of paper at that level and I claimed that was a theory. What if you tested and found the correlation in the “instrument” period was good.

    Obviously my theory is bunk– but how much external validation of the theory do you need. Can it just be speculation?

    This question is for you, not me. Because you are the one who suggested one could “cross validate” by starting with “all” the proxies. But I think you just bumped the question to what are “all” the proxies.

  93. Just a thought on the wrongs or otherwise of outing Way’s opinion on Mann’s hockey stick papers.

    It seems to be accepted that the simple fact of his derogatory comments about the hockey stick work in the SKS forum would mean Way’s comments would have got back to Mann.

    Form everything I have read about the great thin-skinned one, I find it implausible that he would not have been looking for opportunities for revenge on Way from about an hour after the comments were originally posted. Depending on time zones.

    The fact that the comments were outed publically and that he put in his whine about the potential hurt to his career have greatly lessened the scope for Mann to damage him. At least until the dust settles and everyone moves on.

    Any revenge Mann is planning on Way, (or keeping an eye out for opportunities in future) was going to happen anyway, the posting in the forum was the catalyst, not outing it to the general public. Steve Mc and Way’s reaction just bought him some time.

  94. MikeR , I am trying to understand your using 1/2 of the proxies to establish a criteria for selecting from the other 1/2. You would have to be much more clear on what this method requires. Are you saying that the criteria from the first half would be established empirically after the fact of correlation with the instrumental temperature record? In that case you surely are aware that a certain portion of a large sample of proxy responses will have a significant correlation and that that correlation well could be spurious. That is the case particularly when you have auto correlated series that will have more frequently an ending trend (up and down) with which the upwardly trending instrumental will correlate reasonably well.

    I think what you might be attempting to say is that you want to test some physically based prior criteria on proxy data. If you select the criteria before doing the temperature correlations to the proxy response and then do the correlations you probably have a valid way of testing the prior criteria. If the test showed that the prior selection process used does not provide correlations with temperature and you were to continue to change that prior criteria until you found it worked on 1/2 the proxies you are tempting the fates of obtaining a spurious and over fitted model.

    Now if you very honest and scrupulous in reserving 1/2 the proxy data to use for testing your selection criteria (you did not peak by applying the criteria to both halves) and that criteria continued to be based on a reasonable physical basis you may have a valid prior selection criteria.

    Even this process could fail if the non temperature conditions during the instrumental period are different than those in historical times and that difference could affect the proxy response. That is something that a truly skeptical scientist would want to keep in mind and test if possible.

  95. Lucia, part of the answer to part of the question of use of proxies has already been summed up by the first Gergis et al rejected paper.
    paraphrasing my brain ie without being accurate on the numbers.
    Her group originally examined all the proxies for the Southern Hemisphere temperature reconstruction for up to the past 2000 years, some longer, some shorter but with I guess at least a few 100 years of data up to the present.
    Yes, most of these would already have had some selection bias in them,but some were started well before there was a concept of selection bias.
    Some proxies would have been excluded on paucity of data or shortness of time from the pool of all recorded proxies.
    The total number of proxies available was surprisingly small in number and types. say 62
    Here is the rub as MikeR points out. Only 27 were used as the rest did not show a thermometer correlation in the overlapping part of their data.
    Hence one could conclude that the old 1/3 high, a 1/3 low and a 1/3 in the middle might apply.
    Meaning temperature reconstruction from proxies that do not measure temperature have virtually no relevance as there are too many other factors at work.

    “”Our temperature proxy network was drawn from a broader Australasian domain (90E–140W, 10N–80S) containing 62 monthly–annually resolved climate proxies from approximately 50 sitesNeukom and Gergis, 2011)…
    Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis. This process identified 27 temperature-sensitive predictors referred to as R27.

  96. Steven Mosher (Comment #131011)
    July 16th, 2014 at 12:02 pm
    “What you said is true or the flaw is true?
    Taking it that you are now defending Cowtan/Way against yourself
    I.e. saying that areas further apart can be more similar than areas close together when you only have sparse records in the first place
    You don’t get it. You simulate having sparse stations to test the method.”

    Steven, I’m not the one who simulates stations, you are the simulator.
    Cowtan and Way are commentating on the sparse stations that exist, ie are real, all [3 ?] of them,

    “” Further in some places where we THOUGHT we had sparse stations and applied the method, we have now found MORE data. This data is used to show that you are wrong.
    ###############”

    Did you THOUGHT you found more data or did you THUNK up more data in the sparse stations?

    “Or am I wrong and you are actually agreeing with yourself and me?
    If you only have 3 or 4 data points and the two furthest apart are the same it is not right to fill in hundreds of points between using the rational that two distant points are the same.
    huh, Heck you just said it yourself. Either leave the data alone or infill as you do for everything else (the old trusted and true method) the old trusted method? yes kriging.
    it is simple to test. and you haven’t.”

    No, it is not simple to test,otherwise people here would have done it and reproduced Cowtan/Way.Everyone takes the Kriging on their say so without understanding the assumptions they use
    Yes, I have not done it because it is not simple.

    go away and do some work. talk to mwgrant, he will help you””

    I would rather you help by explaining how when you agree a fundamental principle is close objects are more likely to be similar than distant, and you use it in all other algorithms for infilling; You turn around and say, wait, we have on this one occasion found data that contradicts this rule and we will use that data instead.
    As a scientist you should say the fundamental principle is right, now why is this data “different.”
    Or you could chuck the fundamental principle out the window and decommission all your GISS, GHCN etc as they are based on a flaw.

  97. @Lucia, @Kenneth. I continue not to follow this discussion well. As far as I know, I am suggesting that a perfectly standard part of data analysis be followed: develop a hypothesis on training data, then check the hypothesis on test data. Lucia, you want me to suggest a hypothesis; I can only repeat that I don’t know enough about the subject to do it. I am only asking that the scientists involved follow correct practice for analysis of their data. Of course they may not “peek” at their test data; if they do, everyone knows that their results are going to be tuned and worthless.

    Am I to gather that this is not what has been done till now? If so, why would anyone take any of this field seriously and why don’t all competent scientists request their heads?

  98. “Am I to gather that this is not what has been done till now? If so, why would anyone take any of this field seriously and why don’t all competent scientists request their heads?”

    MikeR, I have been asking those questions for some time now as what I described what I thought you might have in mind from my own thoughts on the matter has not been done. Not even discussed as I recall.

    Unfortunately even the skeptics get off track by not first and always pointing to this most basic flaw in temperature reconstructions. Instead oft times we hear skeptic criticisms (and rightly so) about the methods used by those doing reconstructions in handling the data after the data is selected in the flawed manner. Such criticisms that fail to mention the basic flaw are taken by the less informed in the audience as though if one corrected those methods the entire process would be valid or worse that a different more correct method will make only minor changes in the results/conclusions.

  99. Steven, I’m not the one who simulates stations, you are the simulator.

    Huh. the only time you simulate stations is when you are testing a method with synthetic data. That’s called testing. You dont like testing I get that.

    #########################

    Cowtan and Way are commentating on the sparse stations that exist, ie are real, all [3 ?] of them,
    “” Further in some places where we THOUGHT we had sparse stations and applied the method, we have now found MORE data. This data is used to show that you are wrong.
    ###############”
    Did you THOUGHT you found more data or did you THUNK up more data in the sparse stations?

    1. First you look at public records. They look sparse. you use that.
    2. Then you discover that individual researchers have data that
    no one has considered before.
    3. Then you find out that there is a whole program to recover
    this data as well as other historical records that are paper form
    only.

    ##############################

    “Or am I wrong and you are actually agreeing with yourself and me?
    If you only have 3 or 4 data points and the two furthest apart are the same it is not right to fill in hundreds of points between using the rational that two distant points are the same.
    huh, Heck you just said it yourself. Either leave the data alone or infill as you do for everything else (the old trusted and true method) the old trusted method? yes kriging.
    it is simple to test. and you haven’t.”

    No, it is not simple to test,otherwise people here would have done it and reproduced Cowtan/Way.

    It is simple to test. kriging has been used by industry for quite some time. Next we did test our method. it passsed the tests.
    Reproducing C&W is easy. been there done that.
    Next

    #######
    Everyone takes the Kriging on their say so without understanding the assumptions they use
    Yes, I have not done it because it is not simple.

    You havent done it because you are lazy.

    #############################
    go away and do some work. talk to mwgrant, he will help you””
    I would rather you help by explaining how when you agree a fundamental principle is close objects are more likely to be similar than distant, and you use it in all other algorithms for infilling; You turn around and say, wait, we have on this one occasion found data that contradicts this rule and we will use that data instead.

    Huh, you simply do not understand. Go talk to mw. he has time to educate you. I dont.

    ###############
    As a scientist you should say the fundamental principle is right, now why is this data “different.”

    The principle is still correct. thats what you dont get. You will find anomalies. They actually prove the principle. Figure out why for yourself. do some reading.

  100. This works for me…

    Steven Mosher (Comment #131000) July 15th, 2014 at 4:11 pm
    wrote

    “in any case, you always need to make assumptions”

    all the way down the line…all the way. The ‘you’ is generic.

  101. Mosher
    Thank you for your responses.
    “Reproducing Cowtan and away is easy , been there done that.”
    Sorry, my comment was other people reproducing Cowtan and Way, not you pressing a button on their computer and reproducing the results. I meant some one actually, really reproducing it, de novo, not copying it.
    Semantic tricks just doesn’t do it , I’m afraid.
    Though you are very good!
    The only time you simulate stations is when you are testing a method with synthetic data.
    No, you simulate stations all the time with the USHCN data and that is not for testing a method.
    That is for putting up a misleading, unlabelled graph purporting to be a real record of historical temperatures of the US where none of the past temperatures are what was really recorded at the time. That is true simulation by a simulator.
    Sparse stations, a true comment.
    More data you say, note the semantics.
    Not more stations, more data.
    The stations are still sparse, Steven but you know that, you are still just playing tricks with words.
    Kriging, we did test our method, it passed our tests.
    Our tests?
    That’s what you said. How very , shall we say , scientific, Steven.
    Everyone will be so impressed.
    You have not done it because you are lazy.
    That hurt, the truth does hurt.
    Have you considered that Mr Fritsch, Nick, Lucia, Judy, Steve MacIntyre, Tamino et al have not done it as well? At least I am in good though lazy company.
    Look, you and Zeke are good scientists, the study of climate needs assessments done, Zeke is doing the best he can with the methods you have at hand and the method is a valid way of comparing the past. Very useful.
    The problems are you are altering past temperatures because you take the starting point from the present and while you both know it is a reconstruction the general public DO NOT KNOW and you do not believe that you are indebted to tell them due to the hubris of intelligence.
    It needs a road to Damascus moment for both of you and when, in that Southpark sense, you get it I will be very happy and will not even require a thank you, if you get

  102. The principle is a rule-of-thumb. People observed, people inferred, people tried, and it worked–most of the time. There is Tobler’s First Law of Geography circa 1970 which is pretty much the same although the concept certainly was ‘applied’ before that formality. The principle in a lot of common-sense reasoning we take for granted. It does not have the ‘fundamental’ status Newton’s Laws and exceptions to the assumption are common, e.g., the forests in Knoxville and Asheville are more similar than the forest along the Appalachian crest which lies between the two cities. In the context of interpolation schemes use of the principle is certainly not unique to kriging, e.g., IDW and TIN. Reject the principle and you reject a lot of practical stuff.

    With kriging because you must have a correlation function or semi-variogram before you krige you even know beforehand whether that and other problems might be occurring. Another good rule is that time spent on the correlation function/semi-variogram is not wasted.

    The early roots of kriging go back to the Soviets in the 1940’s but the practice roughly dates to the 1960’s. It is well established is widely used in the private, academic, and government sectors. It is credentialed but significant ‘new’ approaches continue to be developed. Expect to see things like transition matrix kriging (screams land/sea?) and multi-point geostatistics (physiographic regions). They are already being applied in other disciplines.

  103. Geostatistics has been used in a practical manner for several decades –private sector, academia, government.

    However, one key to the acceptance or rejection by third parties of BEST, a custom implementation, is documented code validation and verification. BEST is still a work in progress….

  104. One of the problems with kriging is that it was developed for problems that are temporally and spatial invariant. However, it’s being applied to a problem where this isn’t true.

    And when you are extending a method to use in a new area, the people creating the new work have the responsibility to show that the method works as well as other existing techniques in that field.

    I don’t think the assumption of spatial isotropy is a valid one, and I really don’t think you should assume azimuthal symmetry either in the kriging function.

    In any case, I’d try multiple methods for interpolation. So not an EOF implementation as well as kriging? NCDC has used that highly successfully.

    There is accumulating evidence that BEST over-smooths the spatial variations in trend. For me, there is some cause for concern here, especially if one of BEST’s goals is to produce a higher spatial resolution than the other methods.

  105. Carrick,

    Just responding to your good comment and making no assumptions for any parties. Maybe Steve will also respond to your comment.

    ===================
    “One of the problems with kriging is that it was developed for problems that are temporally and spatially invariant.”

    For example universal kriging and regression kriging, long in use, specifically address data with spatial trend. BEST handles the detrending exercise for two of the spatial dimensions (latitude and elevation) internally with augmented kriging equations. See eqns 25 and 26 in the current methodology appendix at the BEST site and in particular note eqn 26 where the polynomials for latitude and elevation appear explicitly. Also notice before eqn 25:

    “These 16 free parameters [for the azimuthal polynomial-mwg], as well as the free parameters related to elevation are determined empirically as part of a Kriging process used in the construction of G(x[i]).” {Here x[i] is a subscripted vector quantity.}
    — — — —
    More recently there has been a type of kriging called spatio-temporal kriging developed to handle spatio-temporal data. For example here is an R package:

    http://cran.at.r-project.org/web/packages/gstat/vignettes/st.pdf .
    — — — —
    One of the authors, Pebesma, is a co-author of the 2014 paper, “Spatio-temporal interpolation of daily temperatures for global land areas at 1 km resolution”, for which Steve Mosher provided at link in a comment at CE.

    The main point here is as I earlier noted, “…significant new approaches continue to be developed.” This constitutes neither advice to BEST, nor kibbutzing on what BEST has done, nor prediction about BEST’s direction. They seem to have enough help.

    ===================
    “And when you are extending a method to use in a new area, the people creating the new work have the responsibility to show that the method works as well as other existing techniques in that field.”

    Absolutely and that is a primary role of documented QA’ed validation and verification or V&V.

    ===================
    “I don’t think the assumption of spatial isotropy is a valid one, and I really don’t think you should assume azimuthal symmetry either in the kriging function.”

    I would agree with that. Those are assumptions which would be addressed in V&V. (This applies also for the residuals left after spatial detrending.) I’d throw incorporation considerations on physiographic regions into that category too.

    “In any case, I’d try multiple methods for interpolation. So not an EOF implementation as well as kriging? NCDC has used that highly successfully.”

    Same here. I would also look techniques other than interpolation and it would not hurt to have some serious hard rethinking on alternatives to a ‘global temperature’ as a practical measure at this time.

    ===================
    “There is accumulating evidence that BEST over-smooths the spatial variations in trend…”

    V&V and a discussion of errors and uncertainty…

  106. I’ve decided what BEST does is not spatial detrending. What BEST does is calculate spatial parameters for one particular period then subtract those parameters out for all periods. That is akin to estimating the effect of aerosols from 1980-2010 then removing that single value from all years back to 1750 even though you know the effect from aerosols has changed dramatically from 1750 – 2010.

    BEST calculates a temperature field for one (thirty year?) period. This period is one in which the true temperature field is known to be undergoing changes due to anthopogenic influences. It’s known the true temperature field of any other period would be different. That means if the temperature field were estimated for any other period, it would be different than the one BEST calculated.

    Here’s an example. A basic point about global warming is the poles will warm more than the equator. That means the differential between temperatures at the poles and at the equator will change in a systematic manner over time. The differential in 1980 will necessarily be different than the differential in 1880, much less the differential in 1780. Removing the 1980 differential from the 1780 data set would not remove spatial trends from 1780. In fact, it is possible the 1980 differential would be so different from the 1780 differential that removing it from the 1780 data would increase the magnitude of spatial trends in the 1780 data.

    Calling what BEST did spatial detrending is like saying someone performed a linear fit on the 1750-2010 data when what they really did was perform a fit for 1960-1990 data then extrapolate it back to 1750. It’s not entirely wrong, but it’s not a remotely accurate description.

  107. “I’ve decided what BEST does is not spatial detrending. ”

    OK. We do not agree and having that point of disagreement works for me. Ultimately everyone has to work it out for themselves if they so desire and the result of that process is what will matter with them. No problem for me on that either.

  108. mwgrant, one minor followup—I wasn’t claiming what could be done with kriging, I was commenting on what kriging originally developed to do, and more importantly how it is being implemented by Best.

    I believe this is true:

    They have a single correlation function, which they assume to be time-indepedent and azimuthally symmetric.

    It is easy enough to confirm in the study of correlation functions that they are stretched along bands of constant latitude. I don’t know whether the assumption of time independence is valid or not, but it’s something that should be checked.

    As to detrending versus other—I think we’d have to see the output of more intermediate steps to determine what it’s in fact doing:

    The difference between language of documentation and the reality of implementation is relevant here.

  109. mwgrant, I can’t say I get the point of commenting just to say, “I don’t agree,” but… okay.

  110. Carrick,
    Understood. Thanks for the clarification. I agree on all your points. BEST in it present configuration in incomplete and there outstanding questions. As you and others have noted before it is on ongoing project. Perhaps a very real problem is the origin of BEST was as a study and there were/are poorly defined or perceived goals and expectations. It is just there evolving.

    Re documentation: for the record let me clarify that IMO the papers (journal articles) currently found at the site serve a different legitmate purpose than documentation and are a far cry from being QAed V&V and perhaps user documentation that are needed for serious external peer review. [You’re lucky. I just cut a bunch of stuff.]

  111. Carrick,

    “I’ve also ordered a copy of this book…”

    I think you are in fat city for a while. Enjoy.

  112. mw.

    you should join the spatial list in R.
    that way you can do more than simply read Tomas’s book,
    you can actually work with him on stuff.
    the gstat guys, the modis and raster team, rastervis, gdal guys.
    all there.

  113. “The main point here is as I earlier noted, “…significant new approaches continue to be developed.” This constitutes neither advice to BEST, nor kibbutzing on what BEST has done, nor prediction about BEST’s direction. They seem to have enough help.”

    There are bunch of different directions. I was somewhat inspired by Tomas’ work so my main focus is on increasing the spatial resolution
    and on answering the question of “over smoothing”
    so if you look at some of Tomas’ papers you will see a list of factors that one would use to estimate the climate.. for the past 30 years of data.. a good portion of that exists. It is a brutal data processing problem

    Right now for the US PRISM is sub 1km. It looks non physical
    Robert Hijimans (raster) has done a global 1km using splines.
    hmm I think he is looking at driving that with our data.

    Rohde is more keen to work in the daily data problem.. Although he does have some tweaks WRT distance from coast and some better lapse rate stuff in the pipe.

  114. Carrick (Comment #131148)
    “I don’t think the assumption of spatial isotropy is a valid one, and I really don’t think you should assume azimuthal symmetry either in the kriging function.”

    What they say in the appendix to the methodology paper is:

    “Though not shown, we also find that the East-West correlation length is about 18% greater than the North-South correlation length. This is consistent with the fact that weather patterns primarily propagate along East-West bands. The variations discussed above, though non-trivial, are relatively modest for most regions (except perhaps at the equator). As previously noted, when considering large-scale averages the Kriging process described here is largely insensitive to the details of the correlation function, so it is expected that small changes in the correlation structure with location or orientation can be safely ignored. Hence, the current construction applies only to the simple correlation function given by equation (14). However, developing an improved correlation model that incorporates additional spatial variations is a likely topic for future research.”

  115. Nick I have yet to see a suggestion from the peanut gallery that we havent already tested. It’s getting old

  116. Nick, that result is averaged over all sites and I believe only looks at 30-day data (they are using one day and even hourly data now too).

    Steven Mosher, yes I saw the results of that testing. If I had thought it was done right, I would have accepted it.

    You averaged over the entire globe, instead of breaking it down by latitude band.

    What both Brandon and I’ve been looking at (the accumulating evidence) is that there doesn’t appear to be a large bias associated with it, there is a spatial smearing that results in a reduced spatial resolution for BEST.

    This does seem to be driven by the belief that you can take a station 1000-km away and use it to adjust a local station.

    My suggestion: You guys have produced a very nice on-line data quality analysis tool. People are going through it and finding anomalies.

    You should be taking notes about what they are finding instead of getting offended that they are looking. Free quality assurance help.

  117. Good discussion here on the latter part of this thread that informs me on some of my recent analyses I have done for my personal edification on uncertainty of temperature series from spatial and statistical considerations. Through Monte Carlo simulations based on ARMA modeling of station data I have pretty much convinced myself that the stated uncertainties supplied by the data set owners are valid estimates. My interest in these adjustment algorithms deals exclusively with the estimated uncertainties and there primarily with large regional and global estimates.

    The uncertainties introduced by the algorithm method are the least estimated and understood part of the overall uncertainty of these data sets and is the part that I judge can be addressed by using benchmarking with a known truth from a simulated temperature series and adding in non climate effects. A recently proposed benchmarking test is in the process of being made with the aid of online suggestion. http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.pdf . Such a test can go far in better evaluating these different temperature adjustment algorithms. I am hoping that the tests will include potential non climate conditions for which the algorithms would have trouble accounting – even if it is a separate test.

    The question of the BEST kriging method appears to me to stem from BEST uniquely among the data sets differing in the warming in the southeastern US. I am wondering if the difference is more due to BEST using more and thus different station data than it has to do with the kriging method used. I would think that a simulation could answer this question. I recall Cowtan and Way changing their minds about the cause of the differences in Arctic warming between their data set and others and are now attributing the difference to differences in the station data used. I am also guessing that controlling for more parameters with a kriging method would change/improve the results on the margins only, and particularly when looking at large regional or global areas.

    As an aside I noted previously at this blog the claims that BEST makes for improved spatial and statistical uncertainty for their method that is based on the greater amount of station data they use. That would be true if the added data had the quality of the data to which it was added. I am aware that BEST weights the data and that perhaps that weighting would account for at least some differences in station data quality. My point here is that these adjustment algorithms used by the data set owners are rather intricate and can be compared best, in my view, by using the benchmarking tests as I outlined above.

  118. Kenneth, the smearing does not seem limited to the US SE. Brandon has some results on his blog.

    I think it’s important to verify what Brandon’s done before drawing any strong conclusions, but this result for 1880-2010 is strongly suggestive that the variance in trend for BEST is sharply reduced compared to GISTEMP, which in turn suggests that BEST has a lower spatial resolution than GISTEMP, even though the spatial sampling frequency is higher.

  119. For what it’s worth, I agree with Carrick other people should verify what I’ve said/done. Along that line, if anyone would like code to replicate my results, I’m happy to share it. I haven’t been posting much code because this is a work in progress, and I’m making changes as I go.

  120. Brandon’s data are revealing. What do the BEST people say about these calculations – they must be aware of this? Did Brandon use BEST calculated data? Do the large region and global trends show near the same warming as the other major temperature data sets? Was not it the BEST people who noted that 1/3 of the stations (not necessarily the ones it used) show cooling over some extended period of time?

    Brandon I understand the BEST code is MetLab. Do you have it in R?

    How difficult would it be to take GHCN data and apply the BEST algorithm to it? All these revelations cry for a proper benchmarking test and letting the chips fall were they may.

  121. Kenneth Fritsch, BEST calculates a temperature field for the globe. I used BEST’s gridded version of it. It’s more or less a direct translation.

    As for regional/global trends, there are several regions where BEST diverges from the other data sets like it does for the southeastern United States. It’s not clear what, if any, effect that has on the global average. It’s difficult to see how we can be expected to trust global results if we can’t trust regional results though. Inaccurate results over an area approximately as large as a continent may be acceptable, but biased results over that large an area should not be.

    As for BEST’s code, I haven’t even attempted to replicate it in another language. That would be too large a project. I’ve worked with their code in its native language to better understand their methodology, but so far, I’ve only discussed things which can be taken directly from their data and published results.

    On the topic of cooling stations, the 1/3rd you refer to tended to be stations with shorter lengths. They also had some other differences (especially spatial distributions). Still, they give lie to the idea BEST’s spatial resolution is what they portray it as. If BEST truly did have a one degree resolution, we would see spots of cooling in it, even if they were intermixed with a larger amount of warming locations.

    If you want to see the extent of BEST’s spatial smearing, I wouldn’t recommend looking at the histograms or trend maps I posted. I’d recommend looking at the graphs I posted comparing BEST/GISS gridcells instead. I posted graphs of every BEST gridcell (16 of them) in a four by four area along with the corresponding GISS gridcells (four of them). The GISS gridcells had a fair amount of variation, but the BEST gridcells were virtually identical. That is about as strong of evidence of spatial smearing you could hope for.

  122. Kenneth Fritsch (Comment #131170)
    “Do the large region and global trends show near the same warming as the other major temperature data sets?”

    There’s a WebGL gadget here that lets you compare BEST and GISS.

  123. Interesting. I have so far in my personal studies ignored the question of the legitimacy of the global temperature series, generally accepting Mosher’s/Zeke’s argument that data and methodological errors were small enough to be irrelevant.
    Now having looked at Brandon’s recharacterization of the BEST data and Carrick’s comments on anisotropy with respect to definition of variograms, and the strange difference in recent trend between BEST and the RSS and UAH satellite trends, I have decided to ignore the BEST reconstruction as irrelevant unless someone can explain the discrepancies.

  124. Perhaps there is an element of filamentation in the atmospheric distribution of temperatures that effects or relates to temperatures at different sites. Comparison of layers of temperatures from RSS and UAH above and between non contiguous land sites may offer insights into temperature changes at separated sites. Triple your computer size, Steven. From the peanut gallery.

  125. I don’t see the point of the argument about gridded data. On a 2° grid, each cell is an estimate based on partly exrernal data, as it has to be. Many cells will have no stations at all within. So it is smoothed; GISS and BEST both. It’s likely that BEST used a variant of their Kriging kernel for the estimation, and that is broader than GISS.

    None of this is relevant to their actual temperature recon. They say in their FAQ
    “Our algorithms aim to:

    Avoid gridding. All three major research groups currently rely on spatial gridding in their averaging algorithms. As a result, the effective averages may be dependent on the choice of grid pattern and may be sensitive to effects such as the change in grid cell area with latitude. Our algorithms eliminate explicit gridding entirely.”

    I think they probably exaggerate the difference it makes. But they don’t use gridding in computing global averages. Not that the smoothing would matter then anyway.

  126. I have downloaded the BEST station data that has been back calculated to give the adjusted station data. Lots of data in a table with the dimensions 15,717,007 by 7. I’ll be subsetting the stations with data from some extended period to look at portion with cooling and warming and histogram of trends.

  127. Paul_K, your comment amuses me. Before BEST, I never really cared about the modern temperature record. I followed the issue because it was a popular topic, but I never expected much to come from of it.

    BEST changed that because as soon as I started looking into it, I noticed issues. Mildly curious, I started discussing those issues. When I tried discussing them with BEST members, I was repeatedly rebuffed. The more I looked, and the more I got rebuffed, the more curious I got.

    Quite frankly, I still don’t care about the modern temperature record. I’m just tired of hearing BEST praised as some ideal product which answers all sorts of questions when, as far as I can tell, it is worse than what we had before.

    For what it’s worth, I think their self-serving guide to converting skeptics was probably the breaking point for me. Its primary tool for converting skeptics was to shove BEST’s linear CO2 + volcano fit in people’s faces over and over. That was so ridiculous I finally had motivation to spend the time dealing with BEST.

  128. Paul_K wrote:

    “I have decided to ignore the BEST reconstruction as irrelevant unless someone can explain the discrepancies.”

    If BEST is irrelevant (to something) wouldn’t any reconstruction similarly be irrelevant? I expect different reconstructions to mostly share the same relevance but have differences in quality. To be clear, I’m not quibbling over Paul’s choice of word; instead the thought occurs to me that in practice relevance determines the levels of quality needed for things. How relevant is any temperature reconstruction to climate policy? To me, honestly, not much. Yet it is so easy to fall into the trap of allowing the complexity of the task rationalize my view of a greater need for quality. That does not make sense. Good I’ll just enjoy it for what it is and what it isn’t.
    ——-
    [Funny I was writing the above while Brandon and Nick posted.]

    Brandon: ” I’m just tired of hearing BEST praised as some ideal product which answers all sorts of questions…”

    Yeah, that’s galling for sure but we can not fix it.

    Nick: “I don’t see the point of the argument about gridded data. ”

    Indeed.

    Nick: “So it is smoothed; GISS and BEST both. It’s likely that BEST used a variant of their Kriging kernel for the estimation, and that is broader than GISS.”

    Or maybe block kriging was done. All kinds of little possibilities…

    [Brandon and Nick] Finally there is another little wrinkle. As far as filled contour plots go, that likely involves splines, maybe more interpolation, likely use of an method that does not honor the input data… Oh, it goes on and on.

    [Here’s hoping…cursor, jumpy editor…]

  129. Nick:

    I don’t see the point of the argument about gridded data.

    It’s a method for reducing noise (gridding is just a poor quality 2-d wavenumber filter… there’s nothing stopping you from overlapping the center point of the grid to produce a high spatially sampled product) and for going from discretely and irregularly sampled points to an estimate of the underlying field that they were sampled from.

    On a 2° grid, each cell is an estimate based on partly exrernal data, as it has to be. Many cells will have no stations at all within.

    How many is “many”?

    So it is smoothed; GISS and BEST both. It’s likely that BEST used a variant of their Kriging kernel for the estimation, and that is broader than GISS.

    The question is the resolution of the product for regional scale variations, not whether it’s smoothed. Of course it’s smoothed.

    But they don’t use gridding in computing global averages.

    Good luck performing a numerical integral without something akin to gridding (discretization).

    In any case, if you are going to compute the global average of the temperature field at any time, you’re going to have to somehow go from discretely and irregularly sampled points to an estimate of the underlying field that they were sampled from.

    Of course, the main point here is how do the different packages compare to each other with respect to regional scale temperature field reconstruction?

    Not that the smoothing would matter then anyway.

    You know that it would not matter… how?

    We generally find out whether it matters by testing it, not by speculation.

  130. mwgrant, there is no reason a person concluding BEST is irrelevant due to methodological issues would mean they ought to conclude all modern temperature constructions are irrelevant. As for the filled contour plots I made, it is trivially easy to check to see if they apply any smoothing to the input. They don’t.

    Nick Stokes, the reason we’ve been covering gridded data is BEST’s estimated temperature fields, i.e. its results, are pretty much directly translated into that gridded data. That means we’re just examining BEST’s results.

  131. Nick, hand waving and hoping the content is not noticed when you say the grid cell area changes with latitude?
    Why would a grid cell change area with latitude , should it not be a constant?
    Also why would Kriging be dependent on altitude in the majority of the Arctic as it is all basically the same elevation?

  132. By the way, here’s a comparison of trends for 1900-2010, for the region 82.5-100W, 30-35N:

    berkeley 0.045
    giss (1200km) 0.004
    giss (250km) -0.013
    hadcrut4 -0.016
    ncdc -0.007

    Berkeley looks to be a real outlier.

    What makes me convinced BEST is over-smoothing is the comparison of GISS (1200-km) to GISS (250-km) data. Nobody can argue that GISS 1200-km is more strongly smoothed than GISS 250-km.

    GISS (1200-km) is more similar to BEST.
    GISS (250-km) is more similar to HadCRUT.

    I think given the uncertainties, we could say “NCDC agrees with “HadCRUT”, but that’s an judgement on my part (needs more verification).

    I’ve thought of another technique involving wavenumber transforms. It’s a bit too technical to discuss here, but looking at the spatial frequency spectrum would directly yield the effective spatial of the different gridded temperature series.

    As a side note, you can use this link:
    You can get gridded averages from the Climate Explorer:

    http://climexp.knmi.nl/selectfield_obs2.cgi?id=someone@somewhere

    to obtain gridded averages for the various products without too much hassle. In the off chance anybody needs more help I wrote up more detailed instructions here on how to download the various gridded series.

  133. Carrick,
    Yes, I’m using gistemp1200_ERSST.nc

    “You know that it would not matter… how?”
    Because smoothing usually just shifts the integrand from one location to another, conserving the global integral. As with diffusion.

    “Good luck performing a numerical integral without something akin to gridding “
    You can integrate an interpolation function directly, finite element style, or you can explicitly interpolate onto a set that is easier to integrate (grid). As I said, I think BEST overrates the difference.

  134. Here’s a comparison 1924-2008 from Nick’s link.

    Figure.

    I believe this is GISTEMP (1200km) versus BEST.

    For South America, I can’t legitimately say which is correct. There’s probably not a lot of data to look at is my guess, so a lot of uncertainty.

    Still it’s interesting how different these series are. My expectation is that BEST is over-smoothing, but I’m interested in other possible explanations too.

  135. Carrick (Comment #131184)
    “By the way, here’s a comparison of trends for 1900-2010, for the region 82.5-100W, 30-35N:”

    Carrick, check the map I posted. That region is near the centre of a cool trend spot, so smoothing will give a higher trend If you look on the same map at about El Salvador, you’ll see GISS has a warm spot that BEST has smoothed out. That’s the problem with your cell by cell approach.

  136. Nick:

    Because smoothing usually just shifts the integrand from one location to another, conserving the global integral. As with diffusion.

    I noticed you cleverly stuck in the word “usually”. 😉

    I agree any bias introduced by smoothing is likely small, possibly negligibly so. It’s just something that I think should be checked directly, e.g. by using simulated data.

    You can integrate an interpolation function directly

    I agree. In fact, my 1-d cubic spline routine includes the integral of the interpolation function.

    But it would be incredibly messy, hence “good luck,” if you are trying to analytically integrate a particular optimal smoothing function associated with an irregularly sampled two-dimensional field.

    On the other hand, it’s relatively trivial to map that optimal smoothing function onto a grid and use that to compute the global average.

  137. Nick:

    That region is near the centre of a cool trend spot, so smoothing will give a higher trend If you look on the same map at about El Salvador, you’ll see GISS has a warm spot that BEST has smoothed out. That’s the problem with your cell by cell approach.

    I’m afraid I don’t understand your argument.

    The issue that I have been raising is that BEST appears to be spatially smoothing compared to even GISTEMP 1200-km. I found that to be a really surprising result. I would have expected them to be similar.

    You’ve just given yet another example of the same issue.

    How is this a “problem” for cell by cell comparison, if it just confirms that there is a problem?

  138. “How is this a “problem” for cell by cell comparison”
    It’s a problem if you deduce that BEST is a warming outlier. It can’t distinguish between that and smoothing, as a cause.

    “But it would be incredibly messy”
    It’s done in FEM. Not necessarily optimal interpolation, but adequate. With linear interpolation, it’s one of the options in TempLS.

  139. Nick:

    It’s a problem if you deduce that BEST is a warming outlier. It can’t distinguish between that and smoothing, as a cause.

    Okay. Understood:

    I’m not using cell-by-cell comparison to test for whether there is a warming bias, just trying to understand the spatial resolution of the different methods.

    Again, I think the global bias introduced by over-smoothing is probably very small, possibly negligibly so.

    It’s done in FEM. Not necessarily optimal interpolation, but adequate.

    Oh, yeah. That’s true.

  140. Brandon

    I never state that BEST is irrelevant due to methodological issues or in particular that “a person concluding BEST is irrelevant due to methodological issues would mean they ought to conclude all modern temperature constructions are irrelevant.” That is far from what I was thinking. I’m suggesting that the relevance or irrelevance of a temperature reconstruction (to something, e.g., policy-making) would more likely be because it is a reconstruction and not because of the details (methodology, quality, etc) of its implementation. Also that the degree of relevance determines the level of quality needed in the effort rather than quality determining the relevance–quality does of course help determine applicability.

    “As for the filled contour plots I made, it is trivially easy to check to see if they apply any smoothing to the input. They don’t.”

    Yeah, I crossed wires–definitely no spline because no contours and you indicate you did not do any smoothing. Thanks.

  141. mwgrant, Paul_K said he intends to ignore BEST as irrelevant because of its apparent methodological problems. You responded by suggesting BEST is irrelevant, all temperature constructions are irrelevant. I pointed out that’s a non-sequitur. One is free to dismiss BEST as irrelevant because of its apparent problems without dismissing other reconstructions, which lack those apparent problems, as irrelevant. In other words, I am answering your question:

    If BEST is irrelevant (to something) wouldn’t any reconstruction similarly be irrelevant?

    By saying no. There are many reasons a temperature construction may be irrelevant. Only if a reason is shared by BEST and others would it apply to BEST and others.

  142. “Again, I think the global bias introduced by over-smoothing is probably very small, possibly negligibly so.”

    over smoothing assume you know the truth.

    next tests against synthetic data show that GISS and CRU are worse methods.
    next tests with different correlation lengths show no change in the global answer

    yes, Giss and CRU have a cool bias.

  143. Steven Mosher:

    over smoothing assume you know the truth.

    Actually there are statistical methods for distinguishing over-smoothing from “less noisy”. It goes under the moniker “wavenumber analysis”. E.g., if the BEST spectrum tracks that of the other methods, but rolls off at a lower wavenumber, then it’s oversmoothed.

    Using GISTEMP 1250-km vs 250-km is a crude method for looking at this. GISTEMP 1200-km is unarguably a smoothed version of GISTEMP 250-km.

    The trends with GISTEMP 1250-km are closer to BEST.

    But the trends with GISTEMP 250-km are closer to NCDC, which uses EOFs.

    A EOF-based method should be a more exact method for interpolation than kriging, and appears to resolve finer detail than BEST.

    next tests with different correlation lengths show no change in the global answer

    This is not a true statement for GISTEMP, and see my comment about GISTEMP 250-km more closely matching NCDC.

    yes, Giss and CRU have a cool bias.

    In the global mean, yes. But that’s thought to be due to missing area in the Arctic. It’s possible to compute the integral in a way that reduces the error.

    Keep in mind, we’re discussing regional scale variation in trends, not global variations.

    Naively one would expect 1°x1° scales to show more variability, because that’s generally how nature works: Smaller scales have typically more variability. More accurate measurements shouldn’t remove that variability.

    The result I linked above, where South America is showing nearly the same trend is not plausible, IMO. Comparing the wavenumber spectra for that region would

  144. If you really want to compare BEST with GISS with respect to smoothing they both need to be using the SAME underlying data

    One of the reasons GISS has hot spots and cool spots is they are limited to GHCN-M.. they dont have all the data. This means that grid cell for them will be represented by in some cases by one or two stations. When you have more data you see that the real field is actually smoother than GISS depict

    That is why when you want to compare the methods you do what we did. Same input data, synthetic data where you know the ground truth.

    if you compare BEST which has 40K stations with GISS which has
    7K and you find that one is smoother than the other
    you have two explinations.

    1. A difference in method causes this.
    2. a difference in input data causes this.

    The only effective way to decide between the two is to test both with the same data.

    otherwise you are left with deciding that GISS shows gradiants in trend that are unphysical,

  145. Steven Mosher:

    If you really want to compare BEST with GISS with respect to smoothing they both need to be using the SAME underlying data

    No they don’t have to have the same exact data. They just have to converge to the same answer within measurement uncertainty.

    That’s why I started by looking at gridded data, and only comparing them where there were plenty of stations. Adding more stations to the SE US shouldn’t have dramatic effects on the measurement noise, if the arguments by the BEST team about 1250-km correlations is correct.

    You can also test for measurement noise versus over-smoothing by looking at wavenumber coherence between the two data sets.

    I agree it would be interesting to run both data sets, but not essential, especially if we focus on areas where there is good coverage for both data sets.

    This means that grid cell for them will be represented by in some cases by one or two stations. When you have more data you see that the real field is actually smoother than GISS depict

    Again, that’s why I advocate looking at the trends averaged over large enough areas that the measurement error is reduced.

  146. For the US, one way to look at whether it is the data set or the method is to look at the adjusted temperature trends. Here is 1900-2010 for the US, looking at stations with data over the entire period and meeting certain quality control requirements.

    figure

    It’s my impression that the cooling of the US SE over this period (especially in the wintertime) is not a particularly contentious point:

    For example, we see things like the southward migration of orange trees and peach trees. Southern Georgia used to boast orange groves. Now you have to go as far south as Gainesville, FL before you find large commercially viable groves.

    There are also plenty of good quality stations for this region, and most of us agree there is regional scale correlation, so adding more, and certainly less-well quality-controlled stations, shouldn’t give a very different result.

  147. Carrick, there’s also the fact we can look at the individual stations in an area to examine the possible role BEST’s additional data could have

    Or for that matter, we can just look at the data BEST uses. That’s the simplest test, and it’s the one I started with. My very first post looked at the BEST data for an area to see how what it said compares to what BEST said for that area. The two clearly disagreed.

    I’ve repeated the process for other areas, including ones which bordered the original one. In area after area, including fairly large ones, BEST finds trends not present in the data for that area. That shows the difference in results for those areas is not caused by the additional data BEST uses.

    Steven Mosher’s claim we have to test results by testing methodologies on the same data is just wrong.

  148. Brandon Shollenberger (Comment #131193)

    —————————-
    “You responded by suggesting [if?] BEST is irrelevant, [then?] all temperature constructions are irrelevant.” [Brandon]

    [This is a bit an ambiguous statement. Response assumes ‘if-then’ applies]

    No. I simply ask the rhetorical question:

    “If BEST is irrelevant (to something) wouldn’t any reconstruction similarly be irrelevant?” [from original mwg comment]

    which I then work in the comment.

    —————————-
    [Brandon] “You responded by suggesting BEST is irrelevant, all temperature constructions are irrelevant.” [Brandon]

    No, far from that. But, you can conclude that I consider BEST irrelevant to climate policy because it is a temperature series.

    —————————-
    “I pointed out that’s a non-sequitur.”

    Well then that has to be in the only sentence you wrote on the that topic in your comment and that statement is one of your creation. I did not go there.

    “mwgrant, there is no reason a person concluding BEST is irrelevant due to methodological issues would mean they ought to conclude all modern temperature constructions are irrelevant. “
    —————————-
    As for the rest getting down to “By saying no.”:

    In working my way through the question I arrived at yes in regard to the specific case of climate policy. This is because I have the expectation that a large majority of reasons that BEST or any reconstruction would or would not be relevant (to something) stem directly from its being a reconstruction and not other attributes.

    —————————-
    “One is free to dismiss BEST as irrelevant because of its apparent problems without dismissing other reconstructions, which lack those apparent problems, as irrelevant.”

    Dismissal is an action completely separate from attribution of relevance or any other characteristic. I would not dismiss a reconstruction would not subsequently be dismissed as irrelevant say to climate policy because of the problems, but would dismiss it as inadequate or incorrect because of the problems. Paul did, and you apparently would but for me conceptual relevance remains. Big deal, our ways of organizing thoughts are different.

    —————————-
    —————————-
    I lot of my comments are reflective. I do that to try to gain perspective. Not everything thing has to be shown to be right or wrong.

  149. mwgrant, the if-then nature of my remark should have been obvious despite the difficulties I had typing that on my phone. Similarly, it should have been obvious when I said:

    You responded by suggesting BEST is irrelevant, all temperature constructions are irrelevant.

    There should have been an “if” in the sentence. Once you add that in, what I said is certainly true.

    In working my way through the question I arrived at yes in regard to the specific case of climate policy. This is because I have the expectation that a large majority of reasons that BEST or any reconstruction would or would not be relevant (to something) stem directly from its being a reconstruction and not other attributes.

    Again, that is a non-sequitur. All temperature constructions may be irrelevant to whatever because of whatever reasons. That doesn’t make your suggestion correct. You suggested if BEST is irrelevant, all temperature constructions are irrelevant. That’s false. One temperature construction can be irrelevant while others are irrelevant.

    Dismissal is an action completely separate from attribution of relevance or any other characteristic. I would not dismiss a reconstruction would not subsequently be dismissed as irrelevant say to climate policy because of the problems, but would dismiss it as inadequate or incorrect because of the problems. Paul did, and you apparently would but for me conceptual relevance remains. Big deal, our ways of organizing thoughts are different.

    No. Dismissal is not “separate from attribution of relevance or any other characteristic.” One can dismiss work in any number of ways for any number of reasons. Despite what you say, Paul_K did not “dismiss [BEST] as inadequate or incorrect because of the problems.” He dismissed BEST as irrelevant. That’s why he said he “decided to ignore the BEST reconstruction as irrelevant.”

    This is simple. Paul_K decided BEST is irrelevant to him because of its apparent problems. You responded by discussing an entirely separate issue, the relevance of temperature constructions in general. That topic is worth discussing, but it is not pertinent to the view Paul_K discussed.

    Paul_K is free to dismiss BEST as irrelevant because of the quality of its results. You are free to dismiss BEST as irrelevant or accept BEST as relevant for a different set of reasons. What you are not free to do is replace Paul_K’s reasons with your own.

  150. My R code continues to grind and has ground through approximately 1/4 of the 9600 plus Best stations with at least 50 years of data and I have found 693 stations with trends from 1960-2012. I have found one station with a negative trend for this period out of these 693 stations. It was -0.0007 degrees C per decade. From the Berkley website these stations are those with temperatures that have been corrected (adjusted) back to what they would be with the breakpoint adjustment.

  151. Brandon

    I am have extreme difficulties editing (both locally and in Lucia edit boxes.)
    One relevant casualty:
    Dismissal is an action completely separate from attribution of relevance or any other characteristic. I would not dismiss a reconstruction would not subsequently be dismissed as irrelevant say to climate policy because of the problems, but would dismiss it as inadequate or incorrect because of the problems. Paul did, and you apparently would but for me conceptual relevance remains. Big deal, our ways of organizing thoughts are different.

    should be
    Dismissal is an action completely separate from attribution of relevance or any other characteristic. I would not dismiss a reconstruction as irrelevant and would not subsequently dismiss as irrelevant say to climate policy because of the problems, but would dismiss it as inadequate or incorrect because of the problems. Paul did [dismiss it as irrelevant] and you apparently would but for me conceptual relevance remains. Big deal, our ways of organizing thoughts are different.

    So indeed I agree that both you and Paul can parse it that way. I generally try to avoid telling people how to go about things, “Big deal, our ways of organizing thoughts are different.” In addition my transition to the rhetorical question was clear and legitimate.

    ————————-
    “There should have been an “if” in the sentence. Once you add that in, what I said is certainly true.”

    No. There are qualifications out the wazoo and you generalize.
    ————————-
    “Again, that is a non-sequitur. ”
    Wrong again. Remember it is a rhetorical question…a starting point. You have recognized that in the original context and I have subsequently pointed that out explicitly. It is important that I added the qualification of expectation.
    ————————-
    Now I’ve got to address my computer issues. I just wish the hell you would try to quit telling me what to do and would satisfy your creative writing needs elsewhere.

  152. mwgrant:

    No. There are qualifications out the wazoo and you generalize.

    What?! You suggested an absolute:

    If BEST is irrelevant (to something) wouldn’t any reconstruction similarly be irrelevant?

    I responded by pointing out that is a gross over-generalization which is only true if we apply certain qualifiers. How do you respond to that by saying I generalize and ignore qualifiers? Seriously. I want to know. You’re accusing me of doing exactly what you did.

    Wrong again. Remember it is a rhetorical question…a starting point. You have recognized that in the original context and I have subsequently pointed that out explicitly. It is important that I added the qualification of expectation.

    No! It was not “a starting point.” The “starting point” was Paul_K’s comment:

    I have decided to ignore the BEST reconstruction as irrelevant

    To which you responded. You can’t respond to a person’s comment then claim people should ignore what you were responding to when interpreting your response. You certainly can’t claim I’ve “recognized” it was the starting point when I’ve constantly referred back to what you were responding to.

    But even if it were “a starting point,” It would still be wrong. Whether or not BEST is irrelevant does not tell us whether or not other temperature constructions are irrelevant. One set of results can be irrelevant while other sets of results are not. That is the point I have made all along, and it is a point you’ve consistently failed to address.

  153. Yes, those people criticizing BEST are terrible because they may not watch the video Steven Mosher posts which has nothing to do with anything they say. /sarc

    On a more serious note, that video is terrible. I don’t think people should be criticized for failing to watch a presentation from a guy who is terrible at giving presentations.

  154. I watched.

    I’m struck by the “100 weddings” analogy. (Because I _understand_ it, mostly.) The artist browsed the web. Presumably thousands of images of weddings were found. If the couple were ethnically white, the groom on the bride’s left, the wedding formal with tux and gown, in other words if the sample of the browse is thoroughly cherry picked, THEN the samples are “registered” or adjusted so that the images are roughly the same size and centered. FINALLY the images “sampled” and “registered” are averaged. The average tells us that a wedding consists of a Caucasian couple, with a larger, apparently male, person in a tuxedo next to and standing to the left of a smaller, maybe female, person in a white gown.

    The artist gets what he expects, selected for, and adjusted to.

    This analogy has NOTHING TO DO with the way climate data is selected, homogenized, or adjusted, and it is PURELY COINCIDENCE that a climate scientists admires the work and includes in into a presentation of how data is analyzed.

  155. one note for Nick stokes.

    to compare 1×1 best to GISS dont average Berkeley up.
    disaggregate GISS down to 1×1.

  156. Steven Mosher: I’m not sure why you’d say that.

    As is easy to confirm, adjacent 1×1 Berkeley cells are nearly identical.

    The only time it would make sense to do what you discuss is when the 1°x1° series actually shows more spatial structure than the 2°x2° series.

    That’s not the case here.

  157. Steven Mosher (Comment #131211)

    Interesting, but as he says, fairly elementary for anyone with a background in geospacial analysis.

  158. Steven Mosher

    Good ‘help yourself’ link offering.

    An easy 40 minute armchair overview and specific to climate. Seems aimed at the level of people who are aware of the topic and want to check it out….just what you might expect at a conference. LatticeKrig catches the eye. I haven’t looked at the other presentation yet.

  159. Steven Mosher (Comment #131217)
    I was thinking like Carrick. The gradations are fairly smooth; it’s not clear the finer resolution would have much visual effect. And it’s four times as much to download.

    I’ve implemented Carrick’s suggestions about making the colors matchable and marking levels.

  160. Yes Carrick, disaggregate. Nevermind doing so would make no sense here. Nevermind Steven Mosher did nothing to address the point you made. You have no choice. You must disaggregate.

    You must. Mosher commands it!

  161. Mosher: You get one more chance to recommend a video before you will be totally ignored as another Springer wannabe. The first one with Mr. Noodle blaming CO2 for plate tectonics was pathetic. Doug N is a backward step.

    The 100 wedding example is garbage. The mean looks exactly like all of the individuals… to an earf scientist. This is the problem with climate science. Nimrod bookeepers with mild aspy spectrum overwhelmed by deterministic non-periodic flow: Monk doing auto repair.

    You realize that all of the energy and material that has constructed our modern economy has been gobbled up by guys who have been eyeballing kriging and non-linear partial differential equations since like forever.

    Wake me up when you send Carrick and Brandon your data and code. Isn’t that the LukeWarmer Mantra you invented?? I’ll give you this, your group settled for Best rather than A+ for your marketing tripe… one step above snake oil.

  162. Mosh or Zeke (or anyone),

    I have just finished going through for the first time the appendix to the BEST methods paper (http://www.scitechnol.com/2327-4581/2327-4581-1-103a.pdf), and am really puzzled. I have found what appears to be a gross error – so gross, in fact, that I cannot believe it could have gone unnoticed up to now. I therefore suspect that I am misunderstanding the description. Maybe you could help straighten me out.

    The weights that are used in kriging must of necessity relate to the spatial structure of the kriged element – the thing being mapped. This is basic. There are two kriged elements involved in the process described. The first is the “weather”, given by Equation (13), and the second is the G(x) term, which represents the “geographical anomalies in the mean temperature field”, other than the variability explained by latitude or elevation.

    The only spatial correlation structure actually discussed in the Appendix is obtained by fitting a spherical functional form to “a reference data set created by randomly selecting 500,000 pairs of stations that have at least ten years of overlapping data, and measuring the correlation of their non-seasonal temperature fluctuations as a function of distance.” This is included in Figure 1 as a “Mean correlation versus distance curve constructed from 500,000 pair-wise comparisons of station temperature records.”

    On the face of it then, the spatial correlation structure is derived from monthly temperature measurements which have been de-seasonalised, but not otherwise adjusted, and which show a correlation structure which is substantial out to a distance of ~1000 km and non-trivial up to ~1800 km from each site.

    This level of spatial correlation does not seem unreasonable for deseasonalised temperature data. This correlation is seeing, inter alia, a large “geographic component” in the correlation – i.e. the natural maintenance at large distance scales of a relatively invariant temperature functional relationship between latitude zones. (In fact, we are told that 95% of the variance of the annual mean surface temperature is explained by appropriate choice of model for latitude and elevation.)

    The text gives the impression that this same spatial correlation structure was used to populate the coefficient matrices for both of the kriging problems i.e. for each of the two kriged elements above. (Is this correct??)

    The two kriged elements are both residual temperature terms. They both exclude by definition the (latitudinal) geographical component of temperature. But this is one reason (and probably the main reason) for the derived spatial correlation structure being substantial at large distances.

    I can see no reason why the spatial correlation derived from the apples of the deseasonalised temperature data can be applied sensibly to the bananas of these two residual series. It makes no sense to me at all.

    I could for example generate a synthetic dataset which preserves latitudinal structure imposed onto a mean secular temperature series, and then add a white noise term to the datapoints. It should be obvious that I would end up with a spatial correlation from the synthetic temperature data which retains high correlation with distance. The BEST methodology would assign the same spatial characteristics to the residual white noise function despite the fact that in this hypothetical example it would have no spatial correlation at all.

    The effect of this error, if error it is, would be to smear data over large distances, a problem exacerbated by the existence of a substantial “nugget” in the applied spatial correlation. This would, for me at least, explain some of Brandon’s findings as well as Carrick’s observations re variance reduction.

    Its impact on the mean series would be more limited because of the constraints imposed (the areal-weighted series are forced to integrate to zero).

    So, am I missing something here?

  163. Paul, happy to hear if there is a reasonable explanation and not two high school students pulling the wool over every ones eyes.
    I have discussed other people reproducing the results and been told to do it myself which I cannot do.
    So pleased to finally see people trying to put the work in.
    The problems with the Kriging as explained by Cowtan and Way are due the unbelievably perfect fit it gives to all those data sets it has been used on and checked against.
    This suggests that the past results of the rest of the world are built into its data base already instead of it being an algorithm to work on the data. No wonder it fits perfectly.
    Secondly there are no areas of cold anomaly anywhere in their arctic Kriging reproduction, none.
    This unbelievable result cannot be due to chance and should alert everyone to a problem with this formula.
    I ask Steven and this time Zeke as well. If you cannot show some negative measurements over the whole of the Arctic for years how can anyone take this seriously?
    It is too perfect!
    Lucia, anyone out there who wishes to comment on the chance of a perfect graph when discussing the weather?

  164. Paul_K (Comment #131228)
    “This correlation is seeing, inter alia, a large “geographic component” in the correlation “

    I think what you are missing is the word “fluctuations”:
    “measuring the correlation of their non-seasonal temperature fluctuations as a function of distance”
    ie they are dealing with anomalies relative to monthly climatology. The geographic component has been largely removed in the station means.

  165. Nick Stokes reports: “they are dealing with anomalies relative to monthly climatology. The geographic component has been largely removed in the station means.”

    They remove the obvious errors. They remove the subtle artifacts of station changes. They remove the seasons. They remove the altitudes. They remove the distances. Like the apocryphal story of Michaelangelo sculpting a statue of an elephant, they chip away at the block everything that doesn’t look like an elephant — in this case, everything that doesn’t look like what they expect climate to be. And lo and behold, they find a climate/temperature time series that steadily increases, pretty much confirming what they pretty much expected, (subject to certain reservations, stipulations, and caveats, which need never be spotlit.) Again, if a wedding photo is defined by a couple in formalwear with the groom on the bride’s left, we get a fairly standard model photo. If the photo assumes gay weddings, punk weddings, couples seated, kissing, surrounded by step-children, or otherwise in circumstances and postures other than standing-couple-with-groom-to-the-bride’s-left, we get another picture entirely.

    If the assumptions and expectations are reasonable the process is reasonable as well. But begin with another set of assumptions — random walk, solar notch, cosmic rays, alignment with planetary orbital convergences, whatever — and conduct similar series of ‘necessary’ adjustments — and, LOOK THERE, we see the elephant, and now he’s even wiggling his trunk! The open question is how well the “greenhouse” model compares to the “cosmic ray” or other model, by how many pre-processing adjustments are necessary — and how well those pre-processing steps are documented. The BEST team is highly commendable for detailing what sorts of wedding photos they’re looking for and what kinds of adjustments they make before “averaging”. Yay BEST! Absolutely the best in several senses. Still not good enough to justify the proposed “remedy”.

  166. Angech,

    I have not looked at Cowtan and Way at all. As I mentioned earlier – perfectly truthfully – I haven’t invested any personal time in modern temperature reconstructions.

    The issue I am raising above is not a problem with kriging per se. It is a potential problem with the choice of spatial correlation data used to apply the kriging algorithm in BEST, or actually algorithms plural in this instance. If the authors have done what I think they have done, then they are establishing a correlation-distance relationship from one type of data and applying it to another type of data for which it is uninformative.

    The effect in kriging of assuming too large a distance before data become independent is to over-smooth or smear the data in the predicted surface. The BEST algorithm uses data from a (huge) moving circle with diameter over 3000 kms. There is very little local “protection” of data records because of a significant nugget effect in the correlation-distance function; this promotes the weighting of more distant datapoints. Hence, this data choice could be a major contributor to the obliteration of small-scale variation in the final mapping.

    I am still not sufficiently excited by the modern temp series to overview Cowtan and Way, but I may get round to it some day.

  167. Paul_K,
    I guess it is important to know if they are talking about station anomalies from local climatology or not (Nick’s comment above 131231).

  168. Paul_K,
    For what its worth, I can’t get too worked up about the details of reconstruction, since I have nowhere seen anything which looks likely to have biased the reported trends very much; the uncertainty is too low to make much difference in the net warming since the mid 19th century. There is too much ‘trying to pick fly droppings out of ground pepper’ for my taste. Too many wild-eyed conspiracy claims as well.
    .
    Cowtan and Way seem to have used a reasonably sensible method to improve the accuracy of a reconstruction with missing regions (like the Arctic); but once again the net effect on the global trend is modest.

  169. I have compared the trends from 1960-2012 for the mean station temperatures from the GHCN v3 (adjusted) and BEST (back calculated by BEST for adjustment) temperature data sets. I found 1245 series for GHCN and an impressive 4476 for BEST. As an aside here I was surprised to see so many additional stations with long series in the BEST data set since the limiting factor for using data in the GHCN data set has been length of series for better constructing anomalies. The histograms for comparison are in the link below. The number of GHCN stations for this period with negative trends was 19 (1.53%) and for BEST was 8 (0.18%). One can see the spread of trends appears larger for GHCN but the scale difference on the Y axis can be deceiving. My next step is to match all the GHCN stations with one from the BEST data set and redo this comparison.

    I used station data for comparison in an attempt to determine at what point in the process the difference in width of trend distributions between GHCN and BEST becomes apparent. I am not familiar with the procedure used by BEST to calculate an adjusted station temperature using final results of their algorithm and how much that result would be affected by any smearing of the data from kriging. I am speculating here that if anything a smeared final result would make the corrected station data distribution wider.

    http://imageshack.com/a/img536/7034/c492c0.png

  170. @Nick Stokes (Comment #131231)

    No, Nick. You cannot remove the geographical component in the temperature (fluctuations) unless you know what it is, and you don’t “know” that until after you have completed the BEST process.

    Unless there is a major processing step which is not described anywhere in the methods paper, the correlation data in Figure 1 come from the input temperature records. No attempt is described to define the base change in temperature as a function of time around which the “weather” term must fluctuate, nor to quantify (or eliminate) the G(x) contribution. And subtractino of monthly mean values does not eliminate covariance in the geographical component.
    More detail would help, but if the temperature values are converted to anomalies relative to a monthly mean calculated over some fixed period X, (which needs to be the same for any collation of pairwise comparisons), you are then still left with anomaly values which have temperature contributions from inseparable multiple sources; this includes the geographical component, particularly the latitudinal correlation, which we do expect to be high over large distances.

    You will generally see larger fluctuations at higher latitudes relative to any constant baseline subtraction. Ironically, subtraction of a mean baseline is likely to also increase the retained covariance in the N-S direction by reducing the amplitude of high latitude changes relative to lower latitude.

  171. SteveF (Comment #131236)

    Steve, I agree that imputing motivation to those putting together temperature data sets is not only silly it is counterproductive to looking at the more important issues of (1) the uncertainty of temperature series due to the method (algorithm used) and (2) the need for proper benchmark testing whereby the various available algorithms for temperature adjustment are tested and compared against simulated station data with the truth being known and with known non climate effects added to the station data. It is also important in my view to test the algorithm adjustments to determine the limitations of those methods such that one could propose hypothetical non climate effects for which the process would fail to or poorly adjust.

    If you look at the uncertainties going back in time for temperature data sets those uncertainties which I judge may not include much of the method uncertainty make it difficult to conclude how well observed temperatures versus climate model generated ones can be compared or even how well calibrating a temperature reconstruction can be performed (given that the reconstruction methodology is proper).

    Benchmark testing has been performed on various data sets including BEST and GHCN and as I recall the test indicated some problems for BEST. I also recall Zeke Hausfather doing a thread on it or perhaps replying in thread about changes for the BEST algorithm that appeared to bring it better but not completely in line with the GHCN benchmark performance. I asked in that thread about if and when those changes were going to be applied to BEST, but did not receive a clear answer. Now there are interested parties who are proposing and asking for suggestions online about a new benchmarking test for the various data sets. The link is:

    http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html

    Also the work that BEST and Cowtan and Way have done recently on temperature data sets should bring these issues more quickly to the fore than in the past where interested parties and those using the data sets in their analyses and published papers appeared to blithely accept the data as more less the last word on the instrumental record and surely not a work in process. The Cowtan and Way generated data sets with the faster warming Arctic and paused warming in the lower latitudes if shown to more correct than other data sets would have to have a large impact on the thinking about the Arctic polar amplification. Interesting aside on Cowtan and Way is that they are relatively new to the field as I believe the BEST people are.

  172. Kenneth, one issue to think about is that when you compare the trends of individual stations, the variance will be much larger than for regionally averaged values (e.g., 2°x2° gridded data).

    This is going to make it difficult to look for the effects of spatial smearing, because the trends of individual stations are in general much noisier than the trends of geographically proximate stations averaged together.

    What I would suggest doing is computing the average over each 2°x2° cell for both GISTEMP and for BEST.

    I’d compute this average for all stations in GISTEMP & BEST, and separately only for stations that are present in both series (Steven Mosher’s “same data” criterion).

  173. Kenneth Fritsch

    Did you happen to make the the corresponding probability plot or qqnorm plots? That might be nice complementary information.

  174. Kenneth Fritsch, spatial smearing will decrease variance, not increase it. You can see this by thinking about what happens when you smooth a time series. It’s effectively the same thing. The difference is just the number of dimensions involved in the smoothing.

    As you know, when dealing with a time series, smoothing decreases the degrees of freedom. Those degrees of freedom help determine the variance in your data, in the form of a term like (degrees of freedom) / (number of data points). The larger that value, the larger your variance. Smoothing reduces the numerator more than the denominator (proportionally) so it must reduce variance.

    Another similarity to consider is endpoint issues. Smoothed time series often have bias at their ends due to the lack of data on one side. This can bias the follow-up calculations, such as if you were to take an average of the series. The same can happen with spatial smearing when you don’t have data for regions (particularly common with coastal areas). If biases are introduced by that endpoint handling, the global average (akin to a simple average of a smoothed time series) can wind up biased.

  175. Paul_K, the biggest issue I have with reading their documentation/publications is I’m not sure that the description given in that appendix is an accurate rendition of how BEST works.

    Is the person who wrote that appendix the same person who implemented the code, for example?

    I also can’t shake the feeling that they’ve managed to cock-up the computation of the correlation function, by assuming that it is spatially and temporally invariant. Even within those assumptions, it’s possible they have made other errors that led to gross errors in the estimate of the correlation function.

  176. @Carrick (Comment #131244)
    Thanks Carrick. I have “logged” several of your comments on the subject, and agree.
    It seems to me that the assumptions regarding isotropy and stationarity are important, but perhaps secondary to the issue of whether the spatial characteristics are being derived from the right dataset.

  177. @Brandon Shollenberger (Comment #131243)
    Hi Brandon.
    You have to define which variance you are talking about.
    The smearing will tend to reduce the variance of sampled trends from the predicted surface (as I think we may be seeing).

    Normally however, the variance of the surface itself can be calculated from the statistics, and typically increasing the correlation length scale will increase the calculated variance of the surface.

    In this instance, BEST is using a combination of theory and a jack-knife approach to estimate the error variance. I know that Jeff Id complained a while back that the jackknife calculation was in error, but I have no idea where that one landed.

  178. Carrick, I know BEST’s website had incorrect descriptions of its methodology for quite a while, and as best I can tell, BEST hasn’t released any documentation for a number of changes made since that appendix was published. The only reliable way to tell what BEST actually does is to work with its code directly. (And even then, it’s difficult, if not impossible, to establish what code goes with what publications/results.)

    Paul_K, I don’t see how increasing the amount of spatial smoothing applied to data could increase its variability. Could you explain what you have in mind?

  179. @Brandon Shollenberger (Comment #131243)
    Hi again, Brandon. Please ignore my previous post.
    I was just flat wrong when I wrote:-“…typically increasing the correlation length scale will increase the calculated variance of the surface.” Your original comment was correct, and I was thinking about the effect of including a nugget vs no-nugget. Sorry for any confusion.

  180. Paul_K, I agree with your comments that the biggest problem may be the wrong data set.

    I’ve been downloading the noaa data set located here:

    http://www1.ncdc.noaa.gov/pub/data/noaa/

    slowly over the period of weeks (not trying to wreck their server).

    These are hourly observations that include full meteorological data, so I think these are the “right” data set to use for correlational studies.

  181. Paul_K, no problem. I’m just glad to hear I wasn’t missing something obvious.

  182. Kenneth Fritsch (#131237) –
    Your histograms suggest that the global trend is higher in GHCN than in BEST. (Applying equal weight to each station by eye, GHCN ~= 0.23 K/dec, BEST ~= 0.20 K/dec.) The contention elsewhere was that BEST has a larger trend than others. Do you have any idea why the subset of stations which you selected, should behave differently from the whole? Or perhaps the equal-weight assumption is sufficiently far from the kriged/gridded weight that such an approach is inherently invalid.

  183. HaroldW, one issue here is the BEST data set are far from uniformly sampled, and there is is a strong relationship between latitude and magnitude of trend.

    Looking at the ensemble of stations, especially if the stations are different in the two data sets, won’t help you very much in sorting out whether there is a net bias in the data.

  184. HaroldW (Comment #131253)

    I noticed the trend mean or median being higher for GHCN, but as Carrick noted that means little because the extra stations that were used for the BEST histogram could have been cooler on average.

    Carrick (Comment #131241)

    My using station data was an attempt to better isolate where the added smearing occurs in the BEST process. One could obtain smearing depending on how the station data are adjusted which in GHCN case involves using near neighbor results in a direct adjustment while in the BEST case no direct adjustments are made but rather the data is segmented and weighted with the adjustment coming after the fact in an exact manner I am not aware.

    mwgrant (Comment #131242)

    I can certainly do that but am not sure how the results can be used.

    Brandon Shollenberger (Comment #131243)

    “Kenneth Fritsch, spatial smearing will decrease variance, not increase it.”

    I agree and stated that the BEST distribution is narrower. I think you are referring to my later comment where I said, “I am speculating here that if anything a smeared final result would make the corrected station data distribution wider.”

    What I meant by that statement is that if the end result BEST data are smeared by let us say the BEST application of kriging and if the station data for GHCN and BEST were comparable and BEST back calculates corrected station data using smoothed data the corrections for the stations might appear larger than it should. In other words back calculation of station data might reduce the variability differences between GHCN and BEST for station data versus gridded data. However since I do not exactly how the back calculations are made this is all speculation.

  185. Kenneth Fritsch

    qqnorm plots the data quantiles against normal quantiles. It of course is normally a visual look at normality but deviation from the straight line gives info on the tails. Among things can get a view sense of peakiness relative to normal distribution. Afterwards though I thought why not just calculate the kurtosis. A Q-Q plot using qqplot lets you look at one distribution versus another, e.g., GHCN and BEST.

  186. Kenneth Fritsch:

    My using station data was an attempt to better isolate where the added smearing occurs in the BEST process.

    Yes, I was aware of this. My point was only that individual stations are much more noisy than their local average.

    So you looked at the average which you compute over geographically proximate stations, you can still test for biases and differences in the variance of trends, without having to deal with noise issues.

    Matching up stations and working directly from that is a method that works too.

  187. Paul_K (Comment #131239)
    “No, Nick. You cannot remove the geographical component in the temperature (fluctuations) unless you know what it is, and you don’t “know” that until after you have completed the BEST process.
    Unless there is a major processing step which is not described anywhere in the methods paper, the correlation data in Figure 1 come from the input temperature records.”

    They say quite explicitly in Eq 13 what they are forming the correlation structure of. It is W, which are deviations from the station baseline etc. And yes, you have to have a correlation matrix to get these, so they have to iterate.

  188. I’m still not sure how much weight to put on the appendix that Paul_K linked. They explicitly state:

    The Kriging formulation is most efficient at capturing fluctuations that have a scale length comparable to the correlation length; however, it also permits the user to find finer structure if more densely positioned data is provided. In particular, the Kriging estimate of the field will necessarily approach the underlying field exactly as the density of data increases. This feature of Kriging contrasts with the NASA GISS and Hadley/CRU averaging approaches which smooth over fine structure.

    But in fact we see what appears to be a heavily smoothed temperature field instead. I’m seeing lots of claims but nothing provided to validate those claims.

  189. Carrick, I don’t know what you’re going on so much about. Steven Mosher has clearly addressed your conspiratorial rants. I mean, you simply dont understand what the BEST results mean.

    you and other simply dont understand what the climate field represents.

    The methods paper and the results paper address the conspiratorial rants of you folks.

    *eyerolls*

    Maybe Steven Mosher is Stephan Lewandowsky in disguise.

  190. I would hope there was another point to the paper than addressing conspiratorial rants from e.g. phi.

    Maybe I was supposed to read the paper with my eyes closed. Somebody throw me a bone.

    I do find it amusing that Stephen has decided that anybody who finds a potential flaw in the paper is one of “you folks” now.

    Us vs them mentality at work. Not good.

  191. Carrick (Comment #131260)

    There is nothing wrong about the quoted characterization of kriging. It is an “in theory…” sort of remark about archetypal ‘textbook’ formulations, but BEST is not typical in a number of aspects. In an ideal world this would put some significant validation and verification responsibilities on the development team if they wished the current implementation to be treated as something beyond research. However, a safe bet is that current BEST, C&W will have finite shelf-lives. Look at the paper Steven provided the link to (spatio-temporal)–if that isn’t writing on the wall…

  192. mwgrant, what I object to in this statement “This feature of Kriging contrasts with the NASA GISS and Hadley/CRU averaging approaches which smooth over fine structure”

    is the explicit identification of the other products, and the making of a statement that has clearly never been vetted.

    I find it curious that they avoid discussing the relative merits of Kriging (which I’m still not sold on for atmospheric processes) compared to say empirical orthogonal functions.

    There I said it! NCDC got it right and Muller got it wrong.

  193. Carrick

    I didn’t give it (the bold) a second thought but I can see why you did. I took it that they were doing nothing more underscoring the fact that kriging is an exact estimator–it honors the data whereas the others, of which I know zippo, do not.

    My problem is that BEST is inhouse, custom code and has some unique aspects and this puts some significant validation and verification responsibilities on the development team should they wish BEST to be treated as something beyond research.
    Also part-and-parcel with my shelf-life comment, I do not think the current BEST approach will survive.

    BWAG: BEST II will be an entirely different approach. It has to compete and being too different is a problem when people are going to ask questions. I’d lay down some quatloos on that.

  194. @Nick Stokes (Comment #131259)
    Nick,
    They say quite explicitly in Eq 13 what they are forming the correlation structure of. It is W, which are deviations from the station baseline etc. And yes, you have to have a correlation matrix to get these, so they have to iterate.

    Quite so. I agree that Eq (13) includes a clear statement of intent, a definition of what spatial structure needs to be defined:- “Thus the “weather” field is constructed as a spatially-weighted linear combination of the fluctuations in the data 𝑑𝑖 𝑡𝑗 relative to the global trend 𝜃 𝑡𝑗 and the station’s baseline 𝑏 𝑖.” Exactly. So the Sai(x,t) weights are unequivocally supposed to reflect the spatial correlation structure of the temperature fluctuations of the weather turn around the local baseline behaviour i.e. after the “gegraphical component” of temperature variation has been elminated.

    However, instead of using a spatial weighting appropriate to that structure, the paper also indicates after Figure 1 that “The black curve corresponds to the modeled correlation vs. distance reported in the text. This correlation versus distance model is used as the foundation of the Kriging process used in the Berkeley Average.”

    There is no mention of an iterative process to update these covariance data. This should not be confused with updates of the empirical weighting functions used to modify unreliable data and outliers. I can only go with what’s written, unless someone tells me it has been updated.

  195. @Nick Stokes (Comment #131259)
    Nick,
    “They say quite explicitly in Eq 13 what they are forming the correlation structure of. It is W, which are deviations from the station baseline etc. And yes, you have to have a correlation matrix to get these, so they have to iterate.”

    Quite so. Ignoring the empirical weighting to crop outliers and downgrade unreliable results (actually “inconsistent” would be a better term), then what is sought for Eq (13) is the spatial structure for just the weather term, as explicitly stated after Eq (13):-

    “Thus the “weather” field is constructed as a spatially-weighted linear combination of the fluctuations in the data [ ] relative to the global trend[] and the station’s baseline [].”

    What is actually used is the correlation structure from Figure 1, after which is written:-
    “This correlation versus distance model is used as the foundation of the Kriging process used in the Berkeley Average.”

    There is no description anywhere of any attempt to remove the geographical component from this correlation structure, and essential requirement for Eq 13 to make sense.

    On the face of it, this appears to be gross error, which would result in unwarranted smearing in the mapping because of the retention of unjustifiably high correlation distances for the weather mapping and the mapping of the local component of the geographical contribution to temperature structure. The affect on global average temperature should be limited because of the constraints applied to ensure that the weather fluctuation integrates to zero.

    As you hint, it is possible to define an iterative or multistep scheme which would update the spatial covariance structure of the weather function and of G(x) by generating synthetic data which eliminated the successive estimates of the local geographical component, but unless I have missed it, there is no reference to any such scheme in the text.

    This error is sufficiently gross that I cannot believe that it has not been picked up and discussed somewhere. I look forward to Mosh or Zeke pointing me at some explanation or discussion of the issue.

  196. Paul_K (Comment #131273)
    Paul,
    I would have assumed that they iterate, but I agree that the caption to Fig 1 suggests they use the pairwise correlation. But I don’t see your objection. Pairwise correlation is of fluctuation about the respective means, which incorporate the geographic effects that determine mean temperature. Those means are subtracted out. In your earlier terms, they would quantify (or eliminate) the G(x) contribution.

  197. Paul_K

    You raise an extremely interesting and important point…one for me that is the straw that broke the camel’s back regardless of resolution. I am grateful.

  198. Paul_K, I think they compute the correlation function R(d) using by Equation (14) before performing any minimizations:

    [for some reason it’s not parsing]

    As described in the text:

    The free parameters 𝛼, 𝑑𝑚𝑎𝑥 , and 𝜇 are determined by fitting this functional form to a reference data set created by randomly selecting 500,000 pairs of stations that have at least ten years of overlapping data, and measuring the correlation of their non-seasonal temperature fluctuations as a function of distance.

  199. Paul_K (sorry I got sided tracked by not being able to get the equation to parse).

    Beyond that, I agree with you: I do not think what is described in this text will work.

    I think what they actually did was just approximated the correlation function with Eq. (14) and I don’t think they did any particular spatial detrending in computing this correlation function, even though it was suggested in the text that this was done.

    My suspicion is they computed the parameters 𝛼, 𝑑𝑚𝑎𝑥 , and 𝜇 as part of a previous study and this model is used in the code.

    I really think that stage of the calculation needs to be a separate paper, since it seems to be performed prior to the rest of the analysis. In fact, there originally was a draft paper on this. Does anybody know if there is still a copy online somewhere?

  200. Carrick wrote:

    “I really think that stage of the calculation needs to be a separate paper…”

    Amen…almost. It doesn’t need to be a paper in a journal. They are limited in space and completeness of review. It really needs to be in comprehensive internal QA V&V documentation…nothing less.

    ——
    “…since it seems to be performed prior to the rest of the analysis.”

    Yes*

    ——

    “In fact, there originally was a draft paper on this. Does anybody know if there is still a copy online somewhere?”
    Funny you should write that. I’ve a vague sense that I have seen values for coefficients but could not find anything….
    —————-
    * Clearing the book: For the record this means my response to you at ( http://rankexploits.com/musings/2014/sks-tcp-front/#comment-131150 ) was not accurate in regard to the latitude and elevation trends being handled internal to the kriging. (I was working on that correction but you have simplified that task.)

  201. I have finally captured the BEST and GHCN station data as described above for the same station (within 0.1 degrees latitude and longitude). I found 1105 corresponding stations with which BEST designated as coming from GHCN V3 and GHCN V3 designated as having at least 50 years of data from 1960-2012.

    As an aside here the number of GHCN V3 monthly stations qualifying in BEST (2968) is much greater than the 1275 qualifying stations from GHCN, i.e over 50 years of data between 1960-2012. Why this should be is a puzzle to me currently. I also see rather large differences in trends for a number of the 1105 corresponding stations between the GHCN V3 adjusted and the BEST breakpoint corrected temperature series. I want to address these 2 issues later.

    In the link below are graphs showing histograms of the trends and QQ plots for GHCN and BEST. The BEST results give a narrower and more peaked (leptokurtic) distribution around the mean. The mean, sd and kurtosis for GHCN is 0.242, 0.098 and 3.40, respectively and for BEST is 0.211, 0.076, and 4.49. These differences are evident but not large in my view. I am more interested in the lack of agreement between trends for the same station between BEST and GHCN and how BEST has so many more qualifying long series stations from GHCN V3 data that I do not obtain from GHCN. Of the 4476 qualifying long series stations from BEST 2968 came from GHCN Monthly V3 designations.

    http://imagizer.imageshack.us/v2/1600x1200q90/537/7e20d9.png

  202. Kenneth—any possibility of providing your trends in a CSV file… e.g., station number, BEST trend, GISTEMP trend?

    Regarding this statement “The mean, sd and kurtosis for GHCN is 0.242, 0.098 and 3.40, respectively and for BEST is 0.211, 0.076, and 4.49. These differences are evident but not large in my view.”

    This is not a surprising conclusion, because when you are comparing individual stations, there is a substantial amount of “self-noise”. In my view, this should be seen as measurement noise and should not be seen as a characteristic of the underlying temperature field that people are trying to characterize. And not averaging over multiple geographically proximate stations means you are burying the signal of interest in a component of variation that is not informative to climate science.

  203. Kenneth Fritsch

    If you are not already working on it, post-plot of the two sets of trends keyed to value might be informative. Maybe something will jump out with regard to the spatial distributions. Anyway after all that crunching you have the numbers to explore.

    Thanks for the Normal QQ plots and the basics statistics. They help characterize and quantify the initial ‘raw’ distributions. (HaroldW’s eyes did well.) [Carrick I had asked Kenneth about the peakiness so I may be to ‘blame’ by bring up kurtosis to begin with. However, it was nice to see the other statistics because having the statistics in hand even if they are expected results precludes having to approximate and speculate in downstream discussion. Guess I have a touch of EDA mentality.]

  204. Carrick (Comment #131284)

    “This is not a surprising conclusion, because when you are comparing individual stations, there is a substantial amount of “self-noise”. In my view, this should be seen as measurement noise and should not be seen as a characteristic of the underlying temperature field that people are trying to characterize.”

    Carrick, BEST and GHCN are using the same raw unadjusted data and adjusting that data using their different algorithms. GHCN adjusts directly and BEST recalculated to obtain corrected station data. The differences in trends I see has to be due to the use of different algorithms.

    I can send what you want in a csv file if you tell me the best way to proceed. I can also link to a scatter plot of the trend differences between BEST and GHCN from the same stations.

    When BEST was initially benchmarked against GHCN it was handling breakpoint adjustments differently than GHCN and that is what Zeke was addressing in a thread sometime back. It appeared he made or had someone make adjustments in BEST that reduced the difference but did not eliminate it. I was never clear from my query about the adjustments to BEST being made permanent.

  205. mwgrant, Regarding the computation of the correlation function, IMO, at the least it should have been an appendix. Possibly not enough meet for a real paper, though I’ve seen central results like this published as notes, so they get peer reviewed and are fully accessible.

    I was able to confirm that the BEST group cache the pairs of correlations between stations in “.mat” (Matlab binary) files. Also, the “.mat” files that I just downloaded are identical to the versions I had from 2011. They only stored the distances between points and not the station ids, so there’s no obvious way validate their computations.

    Also in the homogenization code, there is a place to perform a least squares fit on the correlation function, using the analytic form of the correlation function described in the appendix. I haven’t looked through it to see if they cache that fitted value or not.

    To make things more interesting, code that isn’t used and doesn’t even work appears to have been included in the various directories. They discuss this in the documentation as if it were a feature.

    Note added: There does not appear to be any software feature for iterating on the correlation function, once you’ve estimated the temperature field. So the discussion of detrending as discussed in the appendix is wrong or misleading.

  206. Kenneth, if you just have station identification, best trend, gistemp trend, I could back out everything else I needed from there.

  207. Carrick

    “Regarding the computation of the correlation function”
    One way or another it merits separate attention.
    —————
    “They only stored the distances between points and not the station ids, so there’s no obvious way validate their computations.”
    Validation is their responsibility and would be part of internal V&V.
    —————
    “To make things more interesting, code that isn’t used and doesn’t even work appears to have been included in the various directories. They discuss this in the documentation as if it were a feature.”
    Yes, that does makes it more interesting, but is not a surprise.
    —————
    “So the discussion of detrending as discussed in the appendix is wrong or misleading.”
    Or perhaps incomplete and muddled. (That is my vote.)
    —————
    On documentation and V&V: I know I continue to bring up V&V, so no more after this. Here is a link to a groundwater code (FEHM) in current usage for regulatory work that has good documentation [see links in middle of page].
    ( https://fehm.lanl.gov/ )
    This is the norm—not peer review. Papers come after. Compare with what BEST offers and laugh. I do not think it unreasonable to hold BEST or any other reconstruction code to a standard at this level before its results are accepted if it is applied in a prominent way out of research, e.g., policy.

  208. mwgrant, thanks for the V&V link. It will be put to use.

    Regarding validation, I personally think it’s the responsibility of the vendor to provide enough detail that their system can practicably be externally validated.

    To bad Steven Mosher isn’t a bit more open about this. Not good transparency in science there.

    It would be interesting to know who primarily wrote that appendix and who primarily wrote the actual code that appendix is supposed to describe. At the moment, I am assuming they are not the same person.

  209. Carrick

    “Regarding validation, I personally think it’s the responsibility of the vendor to provide enough detail that their system can practicably be externally validated.”

    Absolut.
    ——
    “To bad Steven Mosher isn’t a bit more open about this.”
    Well, when you are on a team in the light there are boundaries of who and what. Also if you do not control something then you minimize what you say about it to others. I assume one or both of those and move on.
    ——
    “It would be interesting to know who primarily wrote that appendix and who primarily wrote the actual code that appendix is supposed to describe. At the moment, I am assuming they are not the same person.”
    Twenty quatloos. I say the same person is the primary author of the code and of the paper. (Do you have a quatloos account with Lucia?)

  210. Here I have linked some graphs showing the GHCN – BEST trend differences at 1105 stations for the period of 1960-2012. I did it for the all the stations (globe) and for zonal regions. I also show the distribution for trend all differences in a histogram and density plot. The differences appear to be very random and not related to latitude. The differences are not insignificant compared to trend for this period. The differences might be used in part to estimate the method uncertainties providing that the data I used is legitimate.

    http://imagizer.imageshack.us/v2/1600x1200q90/538/b8d429.png

    Carrick, I will attempt to dropbox the csv file you requested from a link in my next post.

  211. Thanks Kenneth,

    Just to make sure I’m interpreting it right–the first trend is BEST and the second is GHCN, right?

    Carrick

  212. Kenneth Fritsch (#131296) –
    The scatter plot doesn’t show a great correlation between BEST & GHCN trends.

    The correlation coefficient is r=0.57, r^2=0.33. I expected higher.

  213. HaroldW, the correlation isn’t as good as you’d like, but it’s the slope I’m interested in.

  214. Kenneth, ixnay that question…figured it out (opened it up in Excel, instead of reading it with “more” and it was immediately obvious).

    I agree with HaroldW’s numbers:

    the slope (scaling factor) is 0.45 for Best vs GHCN. This is the “deflation in trend”.

    And the offset is 0.10 °C/decade, which is the offset bias in trend for Best vs GHCN. Mind that is not the same thing as the offset bias in global mean temperature.

  215. Carrick, thanks. I asked because I believe the code you’re looking at doesn’t have a couple changes that could matter a fair bit. BEST has a SVN set up so people can see more up to date code. It may be worth downloading the code from it and comparing.

    Strangely, I was never able to get the SVN to work. I tried it with a couple different programs, and none could make use of it. I can use the web page they’ve set up to mirror the contents though. It’s a bit of a pain to try to collect everything from a browser instead, but at least it works.

    Also, I should point out I’m being quiet about BEST right now, even failing to produce a couple maps that were requested. That’s because BEST has actually managed to make me mad. They’ve reached the point of mind-boggling stupidity. I want to talk about it, but I hate writing things while angry.

    As a hint, people will probably say what’s got me upset doesn’t matter as it doesn’t affect BEST’s results. My response at the moment would be a string of epithets. It might even involve a bit of cursing.

    And I can’t remember the last time I cursed.

  216. @Carrick (Comment #131281)
    .
    Hi Carrick,
    Yes. Equation 14 is a just a convenient functional form. The parameters are selected to fit the data from Figure 1. It is intended to capture the spatial characteristics of the “average behaviour” from Figure 1 based on an assumptions of isotropy and stationarity – no change in the correlation function as one moves from one region to another. It does not affect the main issue (for me) which is that Figure 1 is the wrong data to start with to define the spatial correlation for the weather term.

  217. Carrick, interchanging the axes (BEST on the x-axis) gives a slope of 0.74 for GHCN as a function of BEST. Intercept similar at 0.08. Since both variables have error (and how!) a reduced major axis (RMA) regression would provide the actual relation (takes into account the relative error, which for GHCN is slightly greater:

    GHCN BEST
    mean 0.242 0.211
    SD 0.098 0.076
    RSD 0.405 0.359
    SE 0.003 0.002
    RSE 0.012 0.011

  218. well the peanut gallery has raised what appear to be valid criticisms of the data manipulation in the latter half of the comments ,steven mosher urges those capable to go and look at the data for themselves,they do ,then come back with questions and lo and behold steven mosher and zeke are nowhere to be seen or heard.

    the silence is deafening.

  219. Lance Wallace (#131305) –
    It’s interesting that regressing GHCN vs. BEST also gives a positive intercept. One could thereby argue that GHCN is biased high relative to BEST, or use the original regression to argue the opposite!

    But I think you’re correct that it’s not justified to perform an OLS regression of one dataset vs. the other, which assumes that one series is “correct” and the other “noisy”. Assuming that both series are independently “noisy” with relative variance given by the ratio of sample variances, the formula for the Deming regression gives a best fit for the trends of

    BEST = 0.024 + 0.775*GHCN

    suggesting that BEST trends are slightly higher and have a smaller spread by about a quarter.

    Edit: for the benefit of those who aren’t looking at Kenneth’s spreadsheet, the trends are denominated in K/decade. So that’s the unit of the intercept (0.024) in the above equation.

  220. Paul_K

    I do like your comment(s) and those that they have elicited.

    Assume for the moment a mistake was in the caption/description of Figure 1 and that the correlation function was indeed based on the pair variances of fluctuations (after seasonality, latitude, and elevation are removed) instead of temperature. That is, (I think) the figure caption is modified to be consistent with Equation 14. Would that fix the problem from your perspective or are there some other wrinkles?

    [I am no suggesting the such a fix would make BEST whole. If the need for significantly better documentation for BEST (current and future) is not now clear to the team it likely never will be. So it goes.]

  221. Brandon—I really am not a fan of SVN because it doesn’t preserve timestamp information. I started with the tar ball because it’s simpler to download.

    Lance Wallace and HaroldW — thanks! I knew better, but I was just too tired to do attempt to do it right last night.

    I hadn’t seen reversing the axis as a test of the effect of noise. Good concept, thanks.

    Total least squares gives 0.8536 for the regression coefficient, and 1.1715 = 1/0.8536 when you reverse the axes.

    I expected this number to be smaller than regionally averaged trend, because individual stations have a lot of self-noise. So it would be interesting to go back and average all values in the same 1°x1° block and repeat this exercise.

    mwgrant…the description in Fig. 1 matches the code implementation, at least in version of the code I have.

  222. Not enough overlap for 1°x1° cells, without implementing some form of spatial smoothing.

    For 2°x2° cells, with three or more stations in the same cell, I found 82 cells meeting that criterion.

    The TLS regression coefficient for that case was 0.858, so at least in the context of TLS, that value seems to be a robust result.

  223. Carrick

    Thanks! So Eqn 14 and associated text is off? (More reaction than question) Brazzlfrack! Thanks again.

  224. mwgrant, I believe what was actually done is described correctly in Fig 1, so yes, their explanation for Eqn 14 seems to be in error.

    Paul_K, what do you think would be the “right data”?

  225. Carrick (#131312) –
    Are you sure about the TLS slope of 0.8536 for the station list? From a geometric interpretation, I expect that TLS minimization should yield the same result as the Deming regression formula with delta=1 (that is, assuming that the variance for the two variables is equal). But my calculation for that case yields
    BEST = 0.055 + 0.645*GHCN

  226. HaroldW, TLS doesn’t assume equal variances. I don’t know if it’s closer to the true value or not. I interpret the range as indicative of the uncertainty in the scaling factor.

    But this is actually an interesting theoretical problem to look at, so it’s worth trying some other approaches too.

    In my own work, I have the freedom to take measurements with three instruments simultaneously, so I prefer Sleeman (2006)’s three channel correlational method, which produces unbiased estimates under very few assumptions. Thus I’ve less experience with these methods than I otherwise might have.

  227. HaroldW (Comment #131316)
    July 23rd, 2014 at 10:26 am

    The different types of regression have a confusing set of names. Fromthe writeup on Deming regression it looks much like what I have been calling Reduced Major Axis (RMA) regression. With the variances equal (delta = 1) it looks like what I have been calling Orthogonal Regression. Since the variances of the two data sets are not equal, it seems that perhaps RMA is preferred. The two results are
    BEST = 1.291 GHCN – 0.031
    GHCN = 0.7746 BEST + 0.024

    Note that in both cases the intercept is closer to zero than for the OLS approach, which is known for giving a lower slope and higher intercept than other regression approaches. The difference in slopes seems quite striking (1.28 vs 0.74, and 0.77 vs 0.45).

    Of course, we are dealing with data taken by untrained volunteers using obsolete instrumentation located in contaminated airport locations not meeting siting criteria and subject to continuous ever-changing “adjustment” by biased organizations and rentseekers. Particularly clever is the way the original 1/3 of stations with negative trends has been reduced to 12 (out of 1100) stations by GHCN and a single one by BEST. (How did they miss that one? I bet its halflife has been sharply reduced when this is noticed.)

  228. Whoops–I reversed the x- and y- variables. Should read
    GHCN = 1.291 BEST – 0.031
    BEST = 0.7746 GHCN + 0.024

    Kenneth Fritsch (Comment #131318)
    July 23rd, 2014 at 2:19 pm

    I get 1.291 (SE 0.044) and 0.775 (0.031). However, I have been having a hard time validating these standard error estimates, so there may be some question here, but they are very much in line with estimates of standard error for the same databases using 5 other regression methods.

    HaroldW (Comment #131316)
    July 23rd, 2014 at 10:26 am
    I get almost identical slope (0.647 vs your 0.645) and identical intercept of 0.055 using what I call Orthogonal Regression, which thus appears to be another name for Deming Regression.

  229. Lance Wallace (#131319 & 20) –
    Apologies if the nomenclature I used was confusing, but you have it just right. [I picked up the nomenclature from Wiki; this is not something that I use often enough to know by what name(s) it’s properly called. I re-derive the formulas when I need it.] The regression in #131310 matches your RMA calculation. And that of #131316 is (or should be!) the same as Orthogonal Regression. I expected that would match Carrick’s Total Least Squares approach, as the metric minimized is dx^2/delta + dy^2; when delta is set to 1, the quantity minimized is dx^2 + dy^2 which I thought was the TLS goal.
    .
    As for the CIs on the slope, I’m too lazy to derive the formula for that. However, I found a Matlab routine online which (with a small correction in a comment) seems to provide CIs. [I used the default value which I think provides 95% CIs.] For the RMA approach (weighting based on sample variances), the regression is
    BEST = 0.7746*GHCN + 0.0241
    slope CI: 0.7746 +/- 0.0728
    intercept CI: .0241 +/- 0.0179
    .
    For the orthogonal regression (equal weighting), the regression is
    BEST = 0.6474*GHCN + 0.0549 [so you’re right, Lance!]
    slope CI: 0.6474 +/- 0.0634
    intercept CI: 0.0549 +/- .0153

  230. @Nick Stokes (Comment #131274)
    July 22nd, 2014 at 4:41 am

    “Pairwise correlation is of fluctuation about the respective means, which incorporate the geographic effects that determine mean temperature. Those means are subtracted out. In your earlier terms, they would quantify (or eliminate) the G(x) contribution.”

    OK, Nick. I agree that this seems mathematically correct for the structural model assumption as stated. But that is because the model assumption does not recognise any time-dependence in the geographical component other than the time-dependent variation in the global average temperature. Under this model assumption, the expected value of temperature at each location on the planet moves in lockstep with the global average temperature. In simplistic terms, the model structure recognises that the poles are colder than the tropics, but it does not recognise in any structural sense that the amplitude of temperature variation in the high northern latitudes is observably greater than the amplitude in the high southern latitudes which is greater than the amplitude in the tropics. The model has no structural recognition of the “geographical component” of temperature variation which is associated with amplification of temperature change as one moves towards the higher latitudes. (I will call this ‘polar amplification’ for short, even though it is not quite the context the term is normally used in.) Yet this is undoubtedly a realworld phenomenon, and one which manifests itself in the data input (weather stations).

    Hence when a variogram or correlation-distance analysis is done on the real world datapoints under this model assumption, the only element remaining to be mapped is apparently the weather plus error terms. In the real data, it is the weather plus the temperature variation associated with polar amplification (plus error terms). There is then a very high correlation retention associated with polar amplification, and this results in a very large distance parameter being applied (with radius over 3100 kms) to the weather term, now an obvious misnomer, which results in an unjustified smoothing of all smaller-scale features.

    There is another way to look at this, which is to say that since the weather term W is the only fluctuation which is left in the mathematical model, then any variation (like polar amplification) which is not explained by the stable geographical component already accounted for MUST be included as a fluctuation in the weather term. Therefore, you need the large correlation distances to correctly account for polar amplification. This is actually highlighting a structural problem in the model – which results in trying to get the same data to do two incompatible things.
    I don’t think the problem is insuperable – see my follow-up response to Carrick and MWgrant – but I do think that it is a real problem.

  231. @MWgrant
    Thanks for the kind words. I will try to respond to your question by responding to Carrick’s below.
    @Carrick
    “What do you think is the right data?”
    I have come to the conclusion that the problem is better described as a structural deficiency rather than a choice of data problem. At the moment, the BEST methodology is trying to get one spatial characterisation to do two incompatible things – map relatively small-scale weather features and map large-scale temperature variation associated with amplitude dependence on latitude. (See my response to Nick Stokes above for some further clarification of this point.) They need to be separated.

    What BEST does at the moment does not seem like a bad approach to generating an average global temperature series, and this should be relatively insensitive to choice of maximum correlation length. Its main problem is in the mapping (and the associated error calculation).
    One pragmatic solution to the problem therefore is to use a two-step approach. The first step as is. For the second step, it is necessary to define new expected values of T(t) by latitude band, which will consist of the old expected values (Tav plus the stable geographic component) plus a trend correction from the results of step 1, such that the areal integral of the sum is equal to the areal integral of the final mapped results for this latitude band from step1. The trend correction in the appropriate latitude band is then applied to the datainputs to calculate a new synthetic dataset of (zero-mean) fluctuations about the new expected values. Their spatial correlation – which should have a much smaller distance scale – is assessed, and the synthetic data is then kriged and added on to the new expected values of T(t) to yield the final mapping.

  232. Re: bit chilly (Jul 24 05:03),

    but radiated back out into space.

    I think you and the linked blog have it backwards. An increase in the magnitude of the TOA radiative imbalance means that even less energy is being radiated to space than is absorbed from the sun. This is, in fact, exactly what you would expect from a slowing of the rate of increase of the surface temperature while ghg concentration continues to increase.

  233. Before I leave my analysis of the BEST temperature data set, and in particular the mean temperature series adjusted for non climate effects and linked below, I want to point to a discrepancy that I have observed between the number of years of data that GHCN Version 3 shows for stations with data for the years 1960-2012 and what BEST shows for these same stations (or least stations located at the same latitude and longitude within 0.1 degrees).

    Using the link below I used the 7218 stations for average temperatures breakpoint corrected from BEST that BEST designated as being sourced from GHCN monthly Version 3 data. After eliminating approximately 100 duplicated stations I extracted those stations with a start and end date from 1960-2012 that had at least 50 years worth of data. Using the latitude and longitude of these stations I determined the corresponding stations from the GHCN V3 data set. (Note for the spreadsheet of GHCN versus BEST trends I linked above I used only those stations that had at least 50 years of data from 1960-2012).

    http://berkeleyearth.lbl.gov/downloads/TAVG/LATEST%20-%20Breakpoint%20Corrected.zip

    I found 1573 stations where the BEST data set showed more years of data than the GHCN set and in some cases very substantial differences that I’ll attempt to show in table directly below. I would greatly appreciate Steve Mosher or Zeke Hausfather replying on this matter before I contact the Berkeley Earth people.

    Yrs Diff Freq
    2 11
    3 48
    4 117
    5 92
    6 71
    7 57
    8 69
    9 47
    10 38
    11 60
    12 60
    13 53
    14 42
    15 27
    16 18
    17 11
    18 21
    19 24
    20 35
    21 57
    22 258
    23 116
    24 39
    25 15
    26 14
    27 6
    28 5
    29 3
    30 9
    31 16
    32 18
    33 7
    34 2
    35 5
    36 6
    37 2
    38 3
    39 7
    40 6
    41 8
    42 25
    43 11
    44 14
    45 5
    46 3
    47 2
    48 1
    50 4
    51 2
    52 2
    53 1

  234. Carrick, I’m not a fan of SVN either. I just want to try to keep abreast of what the BEST code does. The code you’re looking at is outdated, not used to create the set of results they currently have published on their site. I’d rather not try to judge results based upon code that may have had any number of changes, all made without any announcement.

  235. Brandon, I believe that tar ball is meant to correspond to the code at the time of their write-ups. So I thought that was a good starting point.

    I did download the svn repository using these commands:

    svn –username installer –password temperature co http://berkeleyearth.lbl.gov/svn/code
    svn –username installer –password temperature co http://berkeleyearth.lbl.gov/svn/data
    svn –username installer –password temperature co http://berkeleyearth.lbl.gov/svn/documents

    and I verified that the “.mat” files are still unchanged, and that they still load these files. I’d have to actually get their code to work before I went any further.

    They’re up to 20,000 lines of matlab code and 20,000 lines of python. I wonder if anybody is keeping track of this….

  236. Paul_K, thanks for the comments. What you are saying makes sense to me.

    HaroldW, I think the issue is nomenclature. I am referring to TLS as a type of EIV not TLS as a type of orthogonal regression.

  237. “I would greatly appreciate Steve Mosher or Zeke Hausfather replying on this matter before I contact the Berkeley Earth people.”

    write to steve @ berkeley earth . org

    I’ll put your request in my stack of things to do.

    since i am paid to help users who write to me there, that the best way to get a response. for now, until I finish my own projects that will probably be the only way to get a response.

    questions about the data sent to others get routed to me.

  238. Steven Mosher (Comment #131347)

    Steven, I guess this means that you will not be answering my query here. Fair enough. I will put it to the paid Steven at Berkeley Earth and await it to come to the top of that stack.

  239. Carrick, as I recall, BEST changed their code between papers. I believe that tar ball is supposed to correspond to the last paper (and however many others), but not the first paper, the appendix you’ve discussed or the results currently displayed on their website. I would think the .mat files would have changed too since they’ve updated their data.

    I could be wrong about that though. Goodness knows it is difficult enough to keep track of these things when BEST doesn’t say anything about changes to their code or methodology, much less provide documentation.

    In happier news, I managed to get an SVN connection working. That makes downloading their code much easier. Now I just need to refamiliarize myself with their code. Am I remembering right that they calculate correlation of the derivative of station records, not the station records themselves?

  240. Brandon:

    Carrick, as I recall, BEST changed their code between papers. I believe that tar ball is supposed to correspond to the last paper (and however many others), but not the first paper, the appendix you’ve discussed or the results currently displayed on their website.

    I also have a code distribution from circa 2011-10-20. So I have three total: an early release tarball distribution, the 2013-11-07 tarball distribution, and an SVN from a couple of days ago.

    The appendix people have been discussing is timestamped 2013-10-23.

    $latex ~$

    Brandon:

    I would think the .mat files would have changed too since they’ve updated their data.

    I have confirmed these “.mat” files, download by SVN and associated with the correlational analysis, are unchanged:


    ./SVN/code/trunk/Analysis/Averaging/mask16000.mat 28731 48
    ./SVN/code/trunk/Analysis/Averaging/monthly_covariance_info.mat 10771 4090
    ./SVN/code/trunk/Analysis/Averaging/new_monthly_covariance_info.mat 46615 17884

    Here are the versions from the oldest code I have:

    ./AnalysisCode/Old/Export/Code/Analysis/Averaging/mask16000.mat 28731 48
    ./AnalysisCode/Old/Export/Code/Analysis/Averaging/monthly_covariance_info.mat 10771 4090
    ./AnalysisCode/Old/Export/Code/Analysis/Averaging/new_monthly_covariance_info.mat 46615 17884

    The numbers at the end are the check sums.

    For completeness, here are the file sizes in bytes, timestamp and file name for the three versions:


    48845 2011-10-20 07:33 ./AnalysisCode/Export/Code/Analysis/Averaging/mask16000.mat
    4188041 2011-10-20 07:33 ./AnalysisCode/Export/Code/Analysis/Averaging/monthly_covariance_info.mat
    18312478 2011-10-20 07:33 ./AnalysisCode/Export/Code/Analysis/Averaging/new_monthly_covariance_info.mat
    48845 2011-10-20 05:33 ./AnalysisCode/Old/Export/Code/Analysis/Averaging/mask16000.mat
    4188041 2011-10-20 05:33 ./AnalysisCode/Old/Export/Code/Analysis/Averaging/monthly_covariance_info.mat
    18312478 2011-10-20 05:33 ./AnalysisCode/Old/Export/Code/Analysis/Averaging/new_monthly_covariance_info.mat
    48845 2014-07-23 16:11 ./SVN/code/trunk/Analysis/Averaging/mask16000.mat
    4188041 2014-07-23 16:11 ./SVN/code/trunk/Analysis/Averaging/monthly_covariance_info.mat
    18312478 2014-07-23 16:11 ./SVN/code/trunk/Analysis/Averaging/new_monthly_covariance_info.mat

    The SVN version is the date and time it was downloaded, since that idiotic program doesn’t know how to preserve timestamp info.

    $latex ~$

    Brandon:

    Am I remembering right that they calculate correlation of the derivative of station records, not the station records themselves?

    If that’s the case, it’s not what’s documented. What they say is:

    Mean correlation versus distance curve constructed from 500,000 pair-wise comparisons of station temperature records. Each station pair was selected at random, and the measured correlation was calculated after removing seasonality and with the requirement that they have at least 10 years of overlapping data.

    Without going through the code exhaustively, I believe this description is accurate.

  241. Carrick, it looks like it was the results paper, not the methods paper, which was published before they released new code. I mixed the two up because I assumed the methodology was described along with the results.

    What actually happened is BEST published preprints of several papers, including one which described their methodology. They then published a results paper with no methods paper while directing people to the preprint of their methods description. They later published a new methods description, one which was different from the previous. That’s when the appendix being discussed was published (actually on the 26th, three days later than the paper).

    Anyway, given the tarball you downloaded was uploaded in-between all this, something like a month after the results paper was published, I’m guessing what happened is BEST changed its methodology between its preprints and published papers. That is, all the paper used the tarball you downloaded. BEST just failed to document any of the changes in its methodology, and because it didn’t upload its new code or new methodological description when it published its results paper, there was plenty of room for confusion.

    If that’s right, what it’d mean is for a month or so after BEST published its results, there was no documentation available for any changes. For the three months or so after that, the only correct documentation was the newly published code itself, code which didn’t match the methodological descriptions available at the time. After those four months, things finally matched up, but nobody bothered to say so.

    On the issue of correlation calculations, I realized what had me mixed up. They use correlation calculations for more than one thing. One time is when calculating empirical breakpoints. For it, I believe BEST does pairwise comparisons of differenced stations.

  242. Kenneth Fritsch (Comment #131330)
    July 24th, 2014 at 10:32 am

    Kenneth–

    It appears you have finally found the long-missing solar effect! There is a clear peak in the BEST-GHCN disagreement at 22 years!

  243. Kenneth Fritsch (Comment #131357)

    I want to conclude here by saying that Steven Mosher has resolved my concerns about where BEST obtained the years of data in the GHCN monthly sourced station data that are missing from the GHCN monthly station data. In my analysis above I noted that the 1100 or so station comparison between BEST and GHCN monthly data could have been closer to 2600-2700 stations if GHCN had as complete of data coverage as BEST showed. That was for the extended period from 1960-2012 where I required at least 50 years of data.

    The extra data comes from BEST through their use of GHCN daily data. Mosher has talked about this resource (the daily network) numerous times at these blogs. As I explained to Mosher what threw me off here was my not understanding why GHCN would not use their own available data to extend their adjusted monthly station data. If BEST’s efforts motivates GHCN to better utilize their daily data that says something perhaps for some friendly competition between temperature data set producers.

    Take a look at the GHCN Daily at the link below and notice the huge files that are available there. Look at the by_year folder and note the data going back to 1763. Working with that much data has to be a daunting task and I therefore have to give BEST and Mosher a lot of credit for making use of these data.

    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/

    I think having the efforts of BEST and Cowtan and Way publicized will, in the end, help provide a better understanding of the overall instrumental temperature record and importantly the adjustment processes and better still establishing overall CIs. That, of course, does not mean that any of the data set processes are above criticism – and even from amateurs. I continue to advocate for a proper benchmarking test whereby these data set processes can be compared and eventually be improved.

  244. Kenneth I have a R package that gets daily data.
    Im extending it to cover other variables .

    Over time GHCN -M will be reconciled with GHCN D.

    ideally all monthly products would just go away

  245. Steven Mosher (Comment #131372)

    But summarized somewhere as monthly data I would hope.

  246. But summarized somewhere as monthly data I would hope.

    Put out daily, users can do the sums themselves..

    fewer charges of fraud that way

  247. My comments on data presentation go to obfuscation and ommision,Steve.
    When certain facts are never answered or given clever answers that completely sidestep the answer then it looks like people are trying to hide things.
    Like telling someone all the data is there but not summarising it so it remains hidden from the public.
    Or telling people the answer is on a certain site when it is not present clearly on that site.
    Hiding things is rarely ever a good idea.
    There is also a difference between presenting algorithms and fraud.
    Cowtan and Way and Best can do algorithms.
    If they omit a measurement by error or ignorance it is sad but it is not fraud.
    Fraud would be taking data, applying rules to it that would only present an outcome that was known to be wrong but desired and then knowingly publishing it.
    No one has said anything about fraud here.

  248. Regarding the delay between GHCN daily and monthly—my impression there’s some quality control that is being done on the monthly that can lead to delayed reporting. That said, it sure would help if NCDC decided to release their full source code, and document as well as provide an explanation for why there’s so many differences in data coverage between GHCN daily and monthly.

  249. “NOAA and NASA (which uses data gathered by NOAA climate center in Asheville) has been commissioned to participate in special climate assessments to support the idealogical and political agenda of the government. From Fiscal Year (FY) 1993 to FY 2013 total US expenditures on climate change amounted to more than $165 Billion. More than $35 Billion is identified as climate science. The White House reported that in FY 2013 the US spent $22.5 Billion on climate change. About $2 Billion went to US Global Change Research Program (USGCRP). The principal function of the USGCRP is to provide to Congress a National Climate Assessment (NCA). The latest report uses global climate models, which are not validated, therefor speculative, to speculate about regional influences from global warming.

    The National Climate Data Center and NASA climate group also control the data that is used to verify these models which is like putting the fox in charge of the hen house. At the very least, their decisions and adjustments may be because they really believe in their models and work to find the warming they show – a form of confirmation bias.

    Please note: This is not an indictment of all of NOAA where NWS forecasters do a yeoman’s job providing forecasts and warnings for public safety.”

  250. “No one has said anything about fraud here.”

    really?

    you need to spend some time on skeptical sites swatting down the charges of fraud before I take your dedication to the truth seriously. That goes for a bunch of folks here. We are all too quick to demand that people in climate science call out the bad behavior of climategate, but slow to call out the irresponsible fraudulant charges of fraud leveled at NOAA.

    I should tell you folks the messages I got from prominent skeptics demanding that I find something on NOAA in the climategate files WRT the temperature record. witch hunt crap

    D’Aleo charged NOAA with Fraud over the great thermometer drop out. The charge itself was a fraud.
    Consequently NOAA employees were subject to investigation because of the fraudulant fraud charge. and you said nothing.

  251. The quote was “Nobody has said anything about fraud here“. I don’t doubt that people have made charges of fraud.

    But I’m not sure how fruitful it is fighting people who make scurrilous charges. The quote about wrestling pigs comes to mind…

  252. Carrick
    Correct.
    Can you tell me how many real USHCN stations out of 1129 in 1987 there are left?
    Is it important?
    Do you care?
    Zeke will not.
    Paul K ‘s and Kenneth Fritsch ‘s concerns are also left lying on the table.
    Obfuscation .
    and Omission are my concerns.

  253. angech, I’d rather hear a pitch from you on why you think the USHCN is important for monitoring climate.

  254. The USHCN is supposed to be the Historical record of the contiguous US land temperatures for the last 100 years . The records are impORTANT AS THE USA has one of theis one of the best continuous records with the best instrumentation available. And is one of the main Northern Hemisphere

  255. Carrick,
    The USHCN is supposed to be the Historical record of the contiguous US land temperatures for the last 100 years.
    It is important for monitoring the climate as it purports to be one of the biggest, oldest, most complete and most extensive historical record of a Northern Hemisphere landmass and helps give a good overview of both American and World temperatures.

    Unfortunately it would better be called the USMCN or US Model Climate network as there is no correct historical data in it.

    Zeke makes this clear when explaining the many changes that are made to historical data to give the scientists a working model that excludes possible past bias in recording. The product can be presented in 2 ways , he says, by keeping the past records as correct but altering the current records as they come in, the problem being the current temperatures would always have to be rated upwards which would clash;Or the way he agrees with which is to keep the current records as accurate and rate past temperatures downwards which makes them non Historical.
    My objection to this is that the USHCN must be labelled as a model, not used as a de facto real data set but Zeke says this is unimportant, as he knows this, and the general public do not need to be informed if they are not smart enough to work it out themselves.

    The second unspoken problem is the data itself. Set up in 1987 with 1129 select stations running for 100 years it is dependent on all the stations continuing to exist and supply real data.
    But stations keep dropping out. They are replaced with modelled data with severe deficiencies from the surrounding but unused large set of known stations. Some were also completely substituted .AT any one time up to 100 stations do not give complete records for a month and have to be adjusted.
    If it becomes common knowledge that less than 50$
    5 of the original stations are real the use of the USHCN becomes severely problematical.
    Who would want to trust a model of a data set when people, with the best intentions are filling in over 50 % of the modified data with more modelled data.
    How do we know this may be happening?
    I have asked Zeke, and Mosher for confirmation of the number of real stations and have been nlocked at every turn.
    When one wants to preach openess and transparency not answeing a simple question is the worst thing one can do.
    So Carrick, do you know the number of real,original USHCN stations in use? A no
    Do you care? A probably not.
    Do you think the USHCN is important for monitoring climate.
    A Yes or why ask the question, and No as you do not know the answer

  256. angech, this is my understanding, so hopefully if somebody who knows more sees errors or can add information, they can correct this:

    The USHCN is just a database of existing stations with long records. That is, it isn’t a real entity in the sense of having an office that coordinates US temperature stations. It’s a group of people in an office in North Carolina who are generating a data product, and it’s not even the data product that gets used by the major temperature series (e.g., NCDC, GISTEMP or CRUTEM/HadCRUT).

    USHCN takes existing stations, and attempts to amalgamate them into a high-density network of stations. So the stations aren’t technically associated with the USHCN other than they are selected from existing weather stations that had long records and were still operating in 1987. (I believe that is when USHCN revision 2 was created. Version 1 was created circa 1983.)

    The stations are not funded by USHCN, USHCN is only a data user of the information provided by the stations and funded by other tasks. The density of stations is likely higher than is needed for climate monitoring, many of the sites are run by volunteers, and there’s not been a lot of effort to recruit new volunteers as the old ones retire.

    Recently there’s been the influx of data from low-cost automated weather stations, so if anything, the density of stations with available hourly (and sub-hourly) has increased dramatically in recent years.

    The closing of stations has nothing to do with USHCN because it never funded them to start with. Why there isn’t more funding is getting into politics, and discussing politics with the expectation that it will achieve anything of value is completely unrealistic.

  257. I didn’t see your longer comment until after I already replied to the short one.

    First a correction, Review 2 started in

    I followed this link (which was supposedly for Revision 1 and 2):

    http://cdiac.esd.ornl.gov/ndps/ushcn/abstract_r1.html

    which states:

    This document describes a database containing monthly temperature and precipitation data for 1221 stations in the contiguous United States. This network of stations, known as the United States Historical Climatology Network (U.S. HCN), and the resulting database were compiled by the National Climatic Data Center, Asheville, North Carolina. These data represent the best available data from the United States for analyzing long-term climate trends on a regional scale. The data for most stations extend through December 31, 1994, and a majority of the station records are serially complete for at least 80 years. Unlike many data sets that have been used in past climate studies, these data have been adjusted to remove biases introduced by station moves, instrument changes, time-of-observation differences, and urbanization effects.
    These monthly data are available free of charge as a numeric data package (NDP) from the Carbon Dioxide Information Analysis Center. This NDP includes documentation and 27 machine readable data files consisting of supporting data files, a descriptive file, and computer access codes. The entire database takes 355 MB of disk space, with the largest file being 16 MB. The published documentation for this NDP describes how the stations in the U.S. HCN were selected and how the data were processed, defines limitations and restrictions of the data, describes the format and contents of the magnetic media, and provides reprints of literature that discuss the editing and adjustment techniques used in the U.S. HCN.

    Looking at the url more closely I noticed it says “r1”.

    The actual link for revision 2 is:

    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    And states specifically:

    In 2007, USHCN version 2 serial monthly temperature data were released and updates to the version 1 datasets were discontinued. Relative to the version 1 releases, version 2 monthly temperature data were produced using an expanded database of raw temperature values from COOP stations, a new set of quality control checks, and a more comprehensive homogenization algorithm. The version 2 temperature dataset and processing steps are described in detail in Menne et al. (2009) and more briefly below.

    This document has a much better historical description of the USHCN dataset than I was able to find previously:

    The first USHCN datasets were developed at NOAA’s NCDC in collaboration with the Department of Energy’s Carbon Dioxide Information Analysis Center (CDIAC) in a project that dates to the mid-1980s (Quinlan et al. 1987). At that time, in response to the need for an accurate, unbiased, modern historical climate record for the United States, personnel at the Global Change Research Program of the U.S. Department of Energy and at NCDC defined a network of 1219 stations in the contiguous United States whose observation would comprise a key baseline dataset for monitoring U.S. climate.

  258. Responding to your longer comment, I don’t see USHCN as important or even necessary for the study of global climate change. Where I taking over as funding director, they’d have to make a good pitch why keeping it in existence is even useful.

    I think the papers written by the authors of this product are “must reads” for anybody doing research using the US surface temperature record, but the USHCN is not a primary source, but a highly derived one. As I mentioned, neither NCDC, GISTEMP nor CRUTEM/HadCRUT (nor BEST) use the USHCN data product, they start from more primary data sources, such as found here:

    http://www1.ncdc.noaa.gov/pub/data/noaa/

    As you probably know, GISTEMP now uses the GHCN monthly adjusted data. Prior to that, they state:

    December 14, 2011: GHCN v2 and USHCN data were replaced by the adjusted GHCN v3 data. This simplified the combination procedure since some steps became redundant (combining different station records for the same location, adjusting for the station move in the St. Helena record, etc). See related figures.

    So ironically some of the historical problems GISTEMP had might be related to having originally used USHCN.

    But of course Steve Goddard is going to slam GISTEMP for “adjusting the past” and other moronic comments of that sort because GISTEMP has gone to a more reliable source product. But you can’t fix stupid.

  259. Carrick (Comment #131428)
    Carrick, there is a close correspondence between GHCN and USHCN, as there should be. The 1218 USHCN stations are exactly the ConUS stations in GHCN, and the unadjusted temperatures seem exactly the same (maybe some QC differences). Adjusted is a little more complicated, since GHCN does not use Filnet, which is USHCN’s effort to infill, and I think GHCN is less inclined to provide an adjusted value, when in doubt. But I think they use the same homogenisation.

  260. Nick is about right with his description, but GHCN does do quite a bit of adjustment on USHCN data. No infilling though.

    Here is a couple years of data from GHCN’s version of USHCN.
    Q=12 means exactly 12 months data qualifies for this analysis. DNQ means did not qualify. Mean is just the average of all months with data. In this case there is no missing data. Nothing fancy. .qcu is unadjusted and .qca is adjusted.
    Source:ghcnm.tavg.v3.2.2.20140720.qcu.dat
    Temperature: ID#=4250*: = 12 months data:
    Year Mean Data Lines Q=12 DNQ
    2012 12.94 9636 1008 803 205
    2013 11.23 9408 964 784 180

    Source:ghcnm.tavg.v3.2.2.20140720.qca.dat
    Temperature: ID#=4250*: = 12 months data:
    Year Mean Data Lines Q=12 DNQ
    2012 12.81 8352 943 696 247
    2013 10.9 7632 834 636 198

    I took a close look at 2013 and of those 784 unadjusted qualifying sites, 85 don’t even merit a line in the adjusted file. They are just gone. Not even a tombstone left behind to mark their passing.

    Nick might be interested to know GHCN has no temperature data for Australia for the time period Sep-Nov 2011. Looks like a coding screw-up to me. Emailed them yesterday about it. Guy wasn’t in his office though. Don’t know when it will get fixed. I think the idea they hadn’t noticed an error of continental magnitude for almost three years says something about their concern for accuracy and sanity checking against previous files.

    A few weeks ago I notified them about a sourcing error in their unadjusted file. They had rolled back over 16000 months of data in 2013 to a lower priority data source. USHCN wasn’t affected. This created 100s of months of additional missing data along with a few data changes and loss of six sites for 2013. That error was there for more than a month until I pointed it out to them. All I did was run some quick tallies of flags and data and it was obvious something was wrong. They corrected it the next day and put an update in their status file. Of course they didn’t describe it the way I just did.

    New database is coming next year. Just downloaded a 400+ megabyte beta file. Haven’t had a chance to look at it yet. Has Tavg, Tmax, and Tmin all in the one file. First version has over 32000 stations. Was told it will be the basis for the GHCNMv4. Here is a link.
    http://www.surfacetemperatures.org/databank

  261. Nick, I estimate there are about 1780 stations in GHCN V3 for the contiguous US.

    Do you just mean that all of the USHCN stations are in the GHCN network? That’s not surprising because USHCN is just stations with 80-years of data as of 1987 and I think GHCN is created by most of the same people that did USHCN originally. (Some bit of history here.)

    Anyway, I can’t escape the notion that USHCN is an older product and just shouldn’t be relied on.

  262. Bob Koss:

    I took a close look at 2013 and of those 784 unadjusted qualifying sites, 85 don’t even merit a line in the adjusted file. They are just gone. Not even a tombstone left behind to mark their passing.

    So did you look at the missing stations? They may have a flag on those stations to not include them in the adjusted data set.

    Improved documentation on what goes on (or should be going on) in adjusting the dataset would be useful here.

    A few weeks ago I notified them about a sourcing error in their unadjusted file. They had rolled back over 16000 months of data in 2013 to a lower priority data source. […] They corrected it the next day and put an update in their status file. Of course they didn’t describe it the way I just did.

    It’s of course possible that their description is more accurate than yours. Though IMO they should acknowledge people who send in bug reports that result in corrections to the code or data product.

  263. “I have asked Zeke, and Mosher for confirmation of the number of real stations and have been nlocked at every turn.”

    1. I stopped looking at USHCN long ago when it was clear that it was largely if not completely a derivative product.
    2. I stopped looking at GHCN-M once I figured out GHCN-D

    USHCN is only important because GISS continues to use it and I suppose that Anthony might use it. They should stop.

    once you get into the guts of the problem you will go to primary sources.. That is,, IF you are interested in producing the best answer you can, then you will go to primary sources. If, on the other hand, you’re interested in throwing spitballs or goddard style hit jobs, then you’ll focus on USHCN.

  264. Carrick (Comment #131439)

    I didn’t put up the flag data, but I always tally it. There were only 13 months flagged for removal out of those 784 with 12 months of data. They don’t remove entire years due to one monthly datapoint being flagged at a site. They just remove that month of data. Statistically, a few of those removed lines probably involved a month of flagged data. That is small potatoes compared to the 85 lines which went missing.

    The really annoying part is when they apparently do remove all data from a line while adjusting, they leave no indication it ever existed. The only flags you can find in the adjusted file are those related to data removed by PHA. There are three flags for the adjusted file. Two are never used. A&M
    A = alternative method of adjustment used.
    M = values with a non-blank quality control flag in the “qcu”
    dataset are set to missing the adjusted dataset and given
    an “M” quality control flag.
    X = pairwise algorithm removed the value because of too many
    inhomogeneities.
    Only the X flag show up, but never for all 12 months. I’ve seen lines with 11 -9999s with X flags and one lonely unflagged -9999 which haven’t been removed. From that I assume the lines removed all had 12 X flags, but I can’t know because they don’t exist anymore. Seems to me that is an extraordinary number of sites with 12 months of data to remove.

    To my way of thinking, all flagged data should remain so data provenance is maintained. It’s not difficult or excessively time consuming to ignore flagged data while running the algorithms. It seems they just don’t want to do it that way. Maybe because not doing it creates extra work for someone trying to follow their reasoning in reverse.

    When you see some sites have 12 months data for long stretches of years in the unadjusted file and then see 3-5 years in succession during that period when they don’t get a line in the adjusted file, you have to wonder what is going on. Is it just that people don’t do maintenance anymore?

  265. “Nick might be interested to know GHCN has no temperature data for Australia for the time period Sep-Nov 2011. Looks like a coding screw-up to me. Emailed them yesterday about it. Guy wasn’t in his office though. Don’t know when it will get fixed. I think the idea they hadn’t noticed an error of continental magnitude for almost three years says something about their concern for accuracy and sanity checking against previous files.”

    on a month to month basis literally thousands of stations go missing and then come back. It appears there are major clean ups/ deprecations going on.

    Of course if you just looked at the meta data youd find hundreds of thousands “stations” many duplicate.. some ‘sorta’ duplicate.

    One hopes that with ISTI that some order will be brough to this.

    Ideally they’d set that up as an independent project for just organizing the existing data and getting consistent documentation.

  266. Bob Koss:

    he really annoying part is when they apparently do remove all data from a line while adjusting, they leave no indication it ever existed.

    Yes, I agree that algorithms which adjust data, and leave no record of what they did, are problematic. My own method is to save unadjusted values, then flag changes in new file names that include the specific adjustments, and I like to store each step in its own intermediate file.

    One thing this does is document what was actually done with a particular file, given that the algorithms do change a bit over time that do the adjustments.

  267. Per the discussion about getting rid of USHCN.

    As of 2013 the US had 1131 stations in the GHCN database report at least one month of unadjusted data. Some of those stations were off shore. 964/1131 of those were USHCN. Get rid of the USHCN and coverage of the US will be very much reduced. A system would have to be set up where GHCN takes responsibility for over-seeing and collecting the data from those stations, and some of the volunteers might not be too happy with the change.

    In 1921 the USHCN shows 1153 stations reporting in the unadjusted file. The center of the network was located at 39.55N 95.64W. In 2013 the network shows 964 stations reporting with the center located at 39.69N 95.79W.

    By center I mean each valid monthly datapoint also has a location attached to it. It is the average of all those monthly locations. The center has moved 21 km NW in the last 93 years. The years between, it has mostly stayed within those parameters and rarely outside them. That’s a pretty darn stable distribution.

    Why get rid of it? No one has to pay attention to their homogenized product. But the raw data is useful. As long as the network stays reasonably stable I think it should remain as one.

  268. Bob. Ushcn is a subset. Getting rid of it
    Does nothing to coverage.
    Ghcnd- – >ghcnm–>ushcn

    There are about 20k ghcnd stations in the US
    Some end up in ghcnm
    Some of those end up in ushcn.

    Untangling the mess is best done by just using
    Ghcnd. Daily data.

  269. To follow up on Steven Mosher, USHCN is just a data product. It is a “virtual network” of previously existing stations. So getting rid of USHCN wouldn’t affect whether those stations exist or not.

    I think there’s nothing special about the spatial distribution of this array because there is nothing challenging about mapping a distribution of measurements into the estimate of a field. And there isn’t any useful value of averaging the stations temperatures, without some sort of spatial weighting first.

    There’s not much of particular use about 80 “consecutive years of station data, when you still have to break the station up into segments (to account for station moves etc.)

    Back in the 80’s when they were designing the USHCN array, I think these nuances weren’t as well understood. So what seemed like a good idea on paper became less of a good idea in practice.

  270. Carrick (Comment #131438)
    “Nick, I estimate there are about 1780 stations in GHCN V3 for the contiguous US.
    Do you just mean that all of the USHCN stations are in the GHCN network?”

    Well, I meant that. But I thought too there were no others (from ConUS). Seems I was wrong.

  271. USHCN awill also do freaky things like combining stations that are kilometers apart.

    Long ago I looked into the ‘duplicate station” problem. Really down in the weeds. Steve Mc wrote a little bit about it as did RomanM

    In some ways we still have some problems that result from decisions
    made upstream..Our central park record was a mess because of upstream merges that people had done.

    It’s mind numbing work and not limited to the US.

    take Env Canada as an example. Robert Way and I worked on Labrador for about 3 months trying to “hand” reconcile various records. we had

    1. Labrador data according to Env canada
    2. Labrador data according to GHCND and GHCNM
    3. Labrador data according to CRU
    CRU uses homogenized data from EnvCanada.

    Finding matches is easy

    A: using metadata from all the sources calculate a ‘distance’
    1. Physical distance
    2. Edit distance for the names

    B: using the time series find the closest station ( data is most similar)
    1. look at the physical distance
    2. look at the “name” distance

    In your first pass you will find a bunch of identical or near identical matches.

    These go into a pile of “uniques”

    the next step is brutal.. if two stations are “close” are they the “same” or different? what do you do.

    ISTI has a similiar approach. and this disambiguation is handled by an algorithm you can tune.. they end up with three piles, one pile of stations is ‘cant tell’

    basically you have two approaches.

    A Algorithmically decide when stations are unique and when they are the “same”
    B Do it all by hand and historical research.

    Both will have errors. When you do ‘A’ and you make mistakes
    then somebody will find the error and assume that it is pervasive.
    They will recommend “B”. the problem with “B” is the sheer volume of work. For example Tim Ball did his phd on the history of a hand ful of stations in canada.

    Think of the work done on CET or other series. now multiply by 10000 or more.

    What are the error rates on Approach A? hard to say.
    But I know if you throw out questionable stations.. nothing changes in the global average. What changes is local detail.

    if you are interested in local detail ( say labrador) then what you do is start with the datasets.. apply an algorithm and then spend months hand checking stuff. Once you have the local detail figured out you can throw the data back in the meat grinder and see the
    changes.

    So, when Robert was interested in Labrador he didnt just use CRU or BEST gridded product or GISS gridded. We went
    back to the sources, understanding that the algorithms used are aimed at GLOBAL minimization of error, and we worked the local area from scratch. In the end maybe a couple things change here or there. mouse nuts.. a few extra months of data here.. improved coverage there..

    it all depends on what question you are trying to answer.

  272. Thanks Mosher. I was curious about the effects and model logic of the assumptions.

  273. Discombobulation is a better word. It describes the sense of unreality that exists when parsing some of the statements above.
    When I have asked Mosher he says there are the data sets do it yourself.
    Then he has the gall to say “It’s mind numbing work”
    “the problem with “B” is the sheer volume of work”
    and “1. I stopped looking at USHCN long ago when it was clear that it was largely if not completely a derivative product.”
    and Carrick “To follow up on Steven Mosher, USHCN is just a data product. It is a “virtual network” of previously existing stations.”
    Come on boys, homogenise your non answers.

    To repeat there were 1129 putative USHCN stations Carrick
    Mosher says There are about 20k ghcnd stations in the US
    Some end up in ghcnm Some of those end up in ushcn.

    I care about USHCN , Carrick because it is purported to be the true Historical network for the US with the best, most extensive, for the longest time coverage of one of the largest landmasses in the Northern Hemisphere and is used and quoted extensively by everyone with an interest in US Historical Temperature and weather data.
    However we all know it is not historical data at all. It has less than 50% of the real, original stations from 1987 reporting at any one time. This fact is obvious from the fact that Zeke and Mosher and Nick et al will not give a figure for the real number of stations.

    It is obvious from the fact that they infill all stations that have closed, and all stations that fail to report for a month [admittedly bu using data from nearby non USHCN stations, some of the 20 K Steven mentioned.
    It is obvious because no one will deny this fact with a number.
    Because we all know that if it is admitted that less than half the USHCN stations are real original stations [Hence Bob Koss’s comment about the epicenter moving that you dissed]
    That the general public will have no confidence in a data base that purports to be Historical but changes all Historical data.
    Blow Steve McIntyre.
    These are the facts that I have been arguing and getting a complete run around on.
    Why not try to ask the question yourself or think about it.
    It is not data and should never be called data or historical data once it has been modified
    It is useful fiction.
    It is needed for the scientists to generate models of how the temperature “Should have been measured”
    nothing more or less.

  274. angech, USHCN is a derived data product and not a physical network of stations. This is a factual statement. They could stop producing USHCN and it would have no operational impact on data collection.

    The actual data for those stations are available in GHCN-Daily and GHCN-Monthly Adjusted is a better derived product that contains more stations than the older USHCN product and the data quality control and homogenization process is much better documented for this newer product. These are also easily verified facts.

    As to the general public having no confidence in that data, well that does seem to be your goal and desire regardless of what the facts about the surface data actually are. Perhaps you need to stick to Goddard’s site where that mentality is not only accepted but encouraged.

  275. Andrew_KY, it’s unclear why you juxtaposed my comment with that from USHCN. The two statements don’t contradict each other. (But I wonder how many laypeople are getting hung up over the geophysics term “network”.)

    Regardless, I don’t think it is true that USHCN is a consistent network:

    Even in the 80-year period where all stations had real data, there were station moves, instrument changes etc. I don’t see how that can be described as “consistent” other than in the geometric sense described by Bob Koss.

    The value of having 80-year duration records is overstated unless those stations are truly “pristine” and require no quality control and homogenization processing.

    And as soon as you allow that you need quality control and homogenization, then I place more value in the presence of documentation that explicates both of these than I do in just having 80-years of uninterrupted data.

    I think the Berkeley Earth approach is much better here. Divide the record of a “station” into segments that don’t require homogenization. Look for neighboring stations with overlapping records. Use these to construct an estimate of the surface temperature field.

    As I said above, USHCN was a very early product. There were probably a lot of things that probably looked necessary on paper that weren’t necessary, and “desirable features,” like the 80-years of continuous data, that don’t buy you anything.

    We’ve learned a lot since then. There maybe places where USHCN still has value (I can’t identify any, but then I’ve never used this product in preference to GHCN), but if you find problems with it for what you want to do, just move on to a less derived product.

  276. Carrick,

    NOAA implies with this- “The USHCN is a high-quality network of COOP stations”- that it is a physical network. And a really good one on top of that.

    Now I’m not saying your comments aren’t valid or that NOAA is wrong. I’m way beyond needing to pick a winner here.

    Current Climate Science doesn’t have any winners, anyway. Which is more of my point. It’s a giant pile of imprecise nonsense.

    Andrew

  277. “Discombobulation is a better word. It describes the sense of unreality that exists when parsing some of the statements above.
    When I have asked Mosher he says there are the data sets do it yourself.
    Then he has the gall to say “It’s mind numbing work”
    “the problem with “B” is the sheer volume of work”
    #######################

    its actually worse than that. Its the volume of work and the fact that you cannot recover the work that was done. You have to reverse engineer what they did and you dont ave all the data about the decisions they made. McIntyre and I have both made this point.

    start here
    http://climateaudit.org/station-data/

    then progress to here

    http://statpad.wordpress.com/2010/07/19/ghcn-twins/

    thats just a start.

    ##################

    and “1. I stopped looking at USHCN long ago when it was clear that it was largely if not completely a derivative product.”
    and Carrick “To follow up on Steven Mosher, USHCN is just a data product. It is a “virtual network” of previously existing stations.”
    Come on boys, homogenise your non answers.

    Its pretty simple. USHCN is just a data product. how version were created is shown here

    http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

    version 2

    “Monthly mean maximum and minimum temperatures (and total precipitation) were calculated using three daily datasets archived at NCDC (DSI-3200, DSI-3206 and DSI-3210). The daily values were first subject to the quality control checks described in Menne et al. 2009 and only those values that passed the evaluation checks were used to compute monthly average temperatures. Monthly averages were computed only when no more than 9 daily values were missing or flagged by the quality checks. Monthly values calculated from the three daily data sources then were merged with two additional sources of monthly data (DSI 3220 and the USHCN version 1) to form a more comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated and values from the daily sources were used in favor of values from the two monthly sources. DSI 3200 was used in favor of the USHCN v1 database. . Monthly values were subject to a separate suite of checks as described in Menne et al. 2009”

    version 2.5
    Monthly mean maximum and minimum temperatures (and total precipitation) were calculated using GHCN-Daily (Menne et al. 2012). The daily values are first subject to the quality control checks described in Durre et al. (2010). Only those values that pass the GHCN-Daily QC checks are used to compute the monthly values. Further, a monthly mean is calculated only when nine or fewer daily values are missing or flagged.

    Monthly values calculated from GHCN-Daily are merged with the USHCN version 1 monthly data to form a more comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated and values from GHCN-Daily are used in favor of values from the USHCN version 1 raw database. USHCN version 1 data comprise about 5% of station months, generally in the earliest years of the station records.

    Pushing back further to the DSI data

    Abstract: U.S. Daily Surface Data consists of several closely related data sets: DSI-3200, DSI-3202, DSI-3206, and DSI-3210. These are archived at the National Climatic Data Center (NCDC). U.S. Daily Surface Data is sometimes called cooperative data or COOP, named after the cooperative observers that recorded the data. In any one year there are about 8,000 stations operating. Most cooperative observers are state universities, state or federal agencies, or private individuals whose stations are managed and maintained by the National Weather Service. Each cooperative observer station may record as little as one parameter (precipitation), or several parameters. U.S. Daily Surface Data is also called Summary of the Day data. The original data was manuscript records, the earliest of which are from the 1800s. Records for approximately 23,000 stations have been archived from the beginning of record through the present. Major parameters are daily records of maximum, minimum, and average temperatures (Deg. F), precipitation, snowfall, snow depth (in.), and either river stage (height in feet above or below zero gauge), or wind movement (miles), evaporation (in.) plus maximum and minimum temperature (Deg. F) of evaporation pan water. It must be noted that NCDC has the observations from the time the station opened, but the NWS has the current data. Official surface weather observation standards can be found in the Federal Meteorological Handbook. Selected Summary of the Day data from related file DSI-3210 for National Weather Service “first order” or principal climatological stations and “second order” stations have been merged into this file. Therefore, users must be aware that if an element in DSI-3210 was flagged as suspicious or in error and an estimated value is included, the estimated value is entered into DSI-3200 as an “original” value. If a user needs the true original value, it must be retrieved from DSI-3210.It must also be noted that observations from DSI-3210 contains surface observational data from the Automated Surface Observing System (DSI-3211). Users must be aware that DSI-3210 contains flagged data that have been quality controlled by personnel at the NCDC. DSI-3211 contains only automated quality

    Carrick: USHCN is just a data product.
    Mosher: Yes, I prefer to work with the upstream sources.

    Why? simple. because it is vary hard to reconstruct everything that NOAA did to construct UHSCN, especially in the area of duplicate stations.

    I really dont care if somebody wants to call it the “historical record” or best data. That’s marketing. i dont care to look at, find problems in it, correct it, count it, whatever.

    It matters only in so far as the study of it revealed primary source to me. Once I found the primary data, why bother with it any more.

    I have no desire to reverse engineer its construction. there is no scientific interest in reverse engineering it. I spent months on it.
    it was mind numbing and in the end it made more sense to work from the primary data.

    So, do the work yourself. In the end you’ll find nothing of interest.

  278. “Why not try to ask the question yourself or think about it.
    It is not data and should never be called data or historical data once it has been modified
    It is useful fiction.
    It is needed for the scientists to generate models of how the temperature “Should have been measured”
    nothing more or less.”

    The “data” is modified as the very first human operation on it.
    an observer see 55.4F and he manually rounds it down to 55.

    As I have repeated argued there is no “raw” data. There are records. written and electronic records. These records are the best evidence we have of an historical fact. That fact can never be checked. you cant go back in time. So in a deep sense there are no historical facts, there are records, subject to error. we cant go back and re measure 1909.

    USHCN takes the historical records, which have already been changed by the human recorder, and then apply rules to create a derivative record. It you want to study that derivative record, go ahead. I prefer to spent my time going back to the source records
    and work with those records that havent been put through a processing mill.

    If you have a problem with NOAA or others calling these records
    “historical facts” then go write your congressman. For me they are records made by humans. subject to error. they are not data but rather records of data. This is a challenge in all observational science. We try to take the first records where we can find them and construct a expectation subject to the assumption that the records are accurate.

  279. Andrew_KY:

    NOAA implies with this- “The USHCN is a high-quality network of COOP stations”- that it is a physical network. And a really good one on top of that.

    You need to define what you think a “physical network” is, if you think there is any conflict with my language.

    The USHCN documentation explicitly states that they took existing stations with long records and formed a geophysical network of stations from them. In the language of science and engineering this is a “data product” in the same way that GISTEMP and HadCRUT are data products.

    It is probably not a network in the sense that a layperson would envisage a network to be. One might imagine for example stations that were installed by some organization that managed the USHCN, station operators and that the data from the stations were being delivered under administrative control of that same organization to some central data repository.

    USHCN is none of those things.

  280. Carrick,

    Physical –

    “having material existence”

    Network –

    “an interconnected or interrelated chain, group, or system”

    If NOAA can’t describe what they do/provide accurately, then obviously changes need to be made.

    Andrew

  281. Steven Mosher:

    As I have repeated argued there is no “raw” data. There are records. written and electronic records. These records are the best evidence we have of an historical fact. That fact can never be checked. you cant go back in time. So in a deep sense there are no historical facts, there are records, subject to error. we cant go back and re measure 1909.

    Yes this is a good approximation of the measurement process.

    When we collect data, measurement errors happen. Changes in the measurement process happen. Equipment has systematics. You should always correct for those (e.g.., buoyancy effect for weight measurements in the atmosphere).

    Changes in equipment happen. For digital sensors, there are software bugs and software fixes and corrections that need to be applied.

    All observations of a time varying quantity that is not under experimental control are, by any useful definition, quantitative snap-shots of the actual underlying quantity being measured. And as we have more information, our estimate of the value of the actual underlying quantity should improve.

    For experimental (as opposed to observational) work, we can re-run the same identical experimental conditions till we get really good at taking the measurements, which we then publish (commonly a years measurements end up with the last two weeks or so of data being actually used in the publication).

    For observational measurements, we don’t have that luxury, so we need to look for other ways to control for error. An important one of these is anomalization, which in the case of temperature records means removing seasonal effects and subtracting an arbitrary offset. Homogenized, quality controlled data, even for values collected in the past, are always going to be dependent to some degree on future data and improved understanding of systematics in equipment and observer practices.

    The “raw” record for observations is the record that was provided to us and stored, so I wouldn’t go as far as saying there wasn’t a “raw record”. Rather I’d say that “raw record” isn’t going to be the closest approximation of that true underlying value.

    If we see that duplicate reports got sent for consecutive days (easier to do with hourly reports of course), there’s no reason to assume that this is “better” than removing the duplicate record and flagging the missing data. If we see that equipment got changed, there’s no reason to assume they have the same systematic bias and measurement errors. Etc.

    Only a complete munchkin would argue that we should use the “raw” data in deference to quality controlled data. Steve Goddard is that munchkin, and it’s not surprising to see his surrogates here making the same stupid arguments here.

  282. Andrew_Ky:

    If NOAA can’t describe what they do/provide accurately, then obviously changes need to be made.

    I agree “network” as it was used here could be misleading to laypeople. I wasn’t mislead and I knew what they meant from context, but I’m more used to parsing technical documentation than you are.

    I think in the context it was used, “network” has a precise meaning, means something different than you appear to think it means, and is an accurate phrasing of what the USHCN actually is.

    I agree it could be confusing and the language of the document could use clarification. I don’t expect anything to change though. Perhaps Goddard could write an actually useful article, that’d be a change.

    There is a perception in my community that government labs often overstate the value of their contributions. The confusion could even be seen as benefiting them when it came time for renewal of funding for example (the administrators usually can’t make a differentiation between network in the geophysical data processing sense and network as used by e.g. NBC). It is even possible that the language choice used in this document suffered from a similar inflation of the actual value of their product.

  283. “The “raw” record for observations is the record that was provided to us and stored, so I wouldn’t go as far as saying there wasn’t a “raw record”. Rather I’d say that “raw record” isn’t going to be the closest approximation of that true underlying value.”

    ##############

    I dont call it raw record. I call it first extant report.

    This comes from my work with texts. So you have a manuscript and you assume its the ‘first’ version. Of course later you find out its not. A while back anthony showed up a couple first versions of some reports. Looking at it, it seemed plausible that there was a UR report.

    none of these issues arise in the experimental world because as you note we can just run the experiment again.

    One thing I liked about the berkeley method is that we didnt try to build up error sources from bottoms up.

    The error at a site falls out of the variation explained at correlation length 0. Jones had a cow when he saw how big that error was. but we could reasonable argue that the total error
    ( observer error, instrument error, site change error, etc etc) was all encapsulated by the residual at correlation length 0.

    To be sure that error was probably on the high side.. but thats a feature not a bug

  284. The error at a site falls out of the variation explained at correlation length 0. Jones had a cow when he saw how big that error was. but we could reasonable argue that the total error
    —————————
    Nuggets of ‘truth’?

    Verdammt! I’ve always suspected that there were etc etc errors.

    “but thats a feature not a bug” … if you’ve got a good fit for the correlation function, etc etc.

  285. “Nuggets of ‘truth’?”

    yup

    if you’ve got a good fit for the correlation function.

    There is a whole bunch of work to do if one wants to answer different questions. We set out to answer some very high level questions raised by skeptics about.
    1. station drop out
    2. Fudged adjustments
    3. UHI
    4. Microsite.

    The first two should be non controversial, but you still have goddarians out there.

    the last two and all small smaller spatial scale questions — it this site “right” is that region “right”– i would regard as open areas of research despite the fact that we published stuff on 3 and 4.
    I would not call either of those definitive nor would I say the spatial roughness/smoothness issue is put to rest.

    For google we pushed the resolution down to 1km and smiled.
    of course one can “do” this, the question is are you really seeing something significant. The 1/4 degree product is more interesting
    when compared to other 1/4 degree stuff ( like prism ) because they show a variability in trend fields thatmost think is “non physical” Block X has steep heating and Block X+1 has steep cooling. So the “how smooth” question is out there. What seems most wrong is a gridded approach where you get trends changing
    from cell to cell only because of an arbitrary gridding system imposed on the data first. as if trend fields know about lat and lon.

  286. There are ways to re measure the past: find a new method which works better or complements what is commonly used. I would get very hard nosed into borehole thermometry.

  287. Steven Mosher:

    I dont call it raw record. I call it first extant report.

    Good point.

    I can”t put my hands on any examples ATM but I do know there are stations where they’ve recorded hourly data onto log sheets that were then scanned in (hourly METARS I suppose), but if you look on the web (at least when I was looking) all you’d find is the global summary of day METAR.

    It’s my understanding that, funding willing, these will eventually all get digitized.

    What we are often calling now “raw data” is of course the “global summary of day” (GSOD) METAR. When these more primitive measurements become available, those GSOD measurement will get supplanted by the “even more raw” hourly data.

    Now as you know, most people are interested in the daily summary of day. But looking at (tmin + tmax)/2 imparts a bias relative to the daily mean….which is what you really want, though even more ideally, using a anti-aliasing filter designed to deal smoothly with missing data points is better.

    Just averaging the daily data can be an issue when you systematically have a chunk of the day that’s not getting reported. Antialiasing filters will remove more of the diurnal noise in any case and so are superior to simple averaging even when no data points are missing. (I have a reference to how to implement this method if anybody is interested.)

    So more primitive is still better, even when you aren’t interested in “diurnal noise”.

  288. but you still have goddarians out there.
    Only a complete munchkin would argue that we should use the “raw” data in deference to quality controlled data. Steve Goddard is that munchkin, and it’s not surprising to see his surrogates here making the same stupid arguments here.

    I repeat, I am not interested in Goddard or waiting for him.
    Calling him a munchkin is a really good argument, Carrick, Thanks for including that brilliant riposte.Better than Tamino when on a losing argument.
    Using raw data, real data, in deference to quality controlled model inferred “data”? [lets call it gloop] is better?
    Note there are no tree rings or proxies in real data.
    Lets note that all data is adjusted [rounded] at some level by a person or program.
    There is nothing wrong with this and rounding can go up as well as down Mosher, it evens out.
    Real data by a thermometer is pretty accurate. It was taken on the day, it was written down, it is still amazingly extant in “the true original value, it must be retrieved from DSI-3210”.

    Now to correct some minor subterfuge.
    The first USHCN datasets .
    defined a network of 1219 stations in the contiguous United States.
    24 of the 1,218 stations (about 2 percent) have complete data from the time they were established.
    The initial USHCN daily data set contained a 138-station subset of the USHCN.
    Even though there is supposed to be a network of 1218 stations from which the model is derived for most of its life since 1987 USHCN has used variations on a smaller critical subset to issue its temperature model, roughly the 138/1218 or 10 % of the stations [do not get picky on my maths].
    Steven said “USHCN version 1 data comprise about 5% of station months, generally in the earliest years of the station records.”
    This is not correct, if referring to USHCN which he seems to be though he may mean USHCN compared to all US CONUS.
    Furthermore
    “Monthly values calculated from GHCN-Daily are merged with the USHCN version 1 monthly data to form a more comprehensive dataset of serial monthly temperature and precipitation values for each HCN station”
    Err no , USHCN is supposed to be worked out from its 1218 stations, infilled by surrounding non recognized stations when data is missing and then this is incorporated into GHCN, smaller to larger, not using the world data to fool the American data, surely, please.
    I understand the reams of data , Steve, so when you josh around telling the less able people like myself to go and do the work that a highly trained person like yourself found almost too hard it is not even comedic, just sad and not helpful.
    Lets have real history and explain we use models for science, but they are not real.

  289. “There is nothing wrong with this and rounding can go up as well as down Mosher, it evens out.”

    you realize that some people didnt believe that.

    as for the rest of your comment you need to read more carefully.

    Finally,

    “Steve, so when you josh around telling the less able people like myself to go and do the work that a highly trained person like yourself found almost too hard it is not even comedic, just sad and not helpful.”

    The work is not too hard. its mind numbing and MOST importantly its INCONSEQUENTIAL to the questions that interest me.

    It may interest you. Go do the work. But once I found that USHCN was derivative the work was not WORTH DOING.

    there is no point in reverse engineering their job. unless you want to criticize it. Im not interested in that. im interested in doing the whole thing over.. from first sources.. why would I continue to do drudgery when its un necessary.

    write your congressman

  290. here an..

    this is just the start of things we have had to discuss here

    http://rankexploits.com/musings/2009/rounding-of-individual-measurements-in-an-average/

    every imaginable attack has been made on the records. As we answer them, they still dont go away.

    Now here is what I found.

    I made some of these attacks. I questioned GISS methods, CRu methods, adjustments, UHI, micro site, dropped stations, Essex’s paper, TOBS, SHAP, Filnet, questioned all of it.

    Then I worked through every issue and actually did work.

    guess what?

  291. I would like to trust you on this, and Zeke, but I just cannot see it.
    Sorry.
    You have both tried to communicate across the gulf and that is very
    appreciated.

  292. Steve Mosher,
    Some will never be convinced by data or logic; after you do your good faith best, all that remains is to ignore them. It’s not too difficult to tell who is not worth engaging.

  293. I think Mosher and Warmers in general can’t see the bigger picture. If everything climate science does to mutilate the temperature record is justified, you still just have a squiggly line that doesn’t reflect any reality anywhere.

    Ho hum.

    Andrew

  294. SteveF (Comment #131484)

    “Some will never be convinced by data or logic; after you do your good faith best, all that remains is to ignore them. It’s not too difficult to tell who is not worth engaging.”

    I think these exchanges are not only a big waste of time and effort at these blogs but take away from time that could be spent discussing some real issues dealing with the how the instrumental temperature records are adjusted. The change from GHCN V2 to V3 produced some statistically significant changes in trends over certain periods of time. The advent of the newer temperature data sets such as BEST and Cowtan and Way have changed in significant terms regional and global temperature trends. All these changes indicate that, unlike the seeming opinion of the recent past that the temperature record was settled, temperature adjustments are a work in progress.

    Areas that are fertile grounds for discussion are:

    (1) Benchmark testing – past results and building better tests.

    (2) Testing the limitations of currently used temperature adjustment algorithms to better acknowledge and understand what levels of non climate effects that could be missed of poorly handled by the algorithms.

    (3) Assurance that we know how to quantify the uncertainties in adjusted temperature series and particularly those stemming from the method used.

    (4) Exploiting all the available data for constructing temperature data sets – as noted in recent posts showing how the use of daily network data from GHCN by BEST can produce more complete long temperature series.

  295. Kenneth,
    I am all for discussing substantive issues, I just recognize that there are some people who are never going to be swayed by substantive discussion. Better they are just ignored.

  296. KenF
    These are excellent points for areas of discussion. For most of the skeptics, we can view the original unmodified data and then the corrected numbers. That sets the table for the discussions. It is understood that some corrections of the past are desirable but they need the unadjusted data such as the averages from the special meteoroligical summaries. Then the adjustments are in perspective. The Tmax for July 1950 is adjusted down because?

    SteveF,
    Best not to insult but just ignore and not say they should be ignored. If you want to ignore do so without editorial and both sides can ignore items not production.
    Scott

  297. I am all for discussing substantive issues, I just recognize that there are some people who are never going to be swayed by substantive discussion.
    The insight is astounding and concise.
    The problems are
    1. the degree of climate sensitivity,
    If the world has a built in system of back to inertia ie very low sensitivity to not only changes in CO2 , or sea basicity, or solar output but also soot SO4 etc then alarmism is barking up a very large wrong tree.
    This is the most substantive issue as acknowledged by Mosher and one Steve F should be engaging on instead of having fixed [ie never being swayed] views.
    2. The use of data that has been altered on a seemingly good scientific basis but is then being used where it should not be used ie Historically.
    The data can give an idea of what the temperature could have done. But not what it has actually done [ie in real life raw recordings.
    Being as how this is dogma to both sides I agree further argument is pointless but point out that obfuscation and hiding facts does not a good dogma make and I have asked multiple times for clarity on a particular issue and been given nothing substantive.
    Steve F what is the number of real stations in USHCN?
    Only people who are never going to be swayed by substantive discussion will say “I do not know and do not care”
    That is your position but due to the contradiction this imposes on your view of the world you cannot even see the logic that blinds you.

  298. Re: angech (Aug 5 21:32),

    Steve F what is the number of real stations in USHCN?
    Only people who are never going to be swayed by substantive discussion will say “I do not know and do not care”

    Not true if the number of real stations in the USHCN is not a substantive issue itself. Which it isn’t. The USHCN is no longer relevant.

  299. Then why did Zeke put a post up on July 7th at Climate etc that had over 2000 hits on USHCN in particular underneath the heading of “Understanding adjustments to temperature data” if it is so unimportant De Witt?
    With a promise of 2 more posts.
    After putting up 3 posts here first which also attracted masses of comment.
    The USHCN is,I’m sorry, extremely relevant.
    Exactly because it is quoted as the correct part US temperature history when it is only a modified, glorified model.
    It is NOAA ‘s NCDC centrepiece.
    It is extensively quoted across the US and globally when trying to compare changes in temperature.
    It is used in the global GCHN
    It has been updated and rebooted ( see Zeke re changes).
    If it was not relevant it would be thrown out with the bath water, instead you diss it then wash in it 24 hours a day.
    Quoting model temperatures instead of real temperatures and pretending the temperture has warmed up more than it has.

  300. The point for me is that there is no reasonable way to declare that there is a global warming crisis if the record must be massaged, processed, translated, purified to render a record which is at best ambiguous.
    As Steve has pointed out temperature records are largely irrelevant to the issue. I would add that accurate or not, what they are showing is dubious and certainly not a crisis.
    The climate obsessed, however, simply move from one failed metric or prediction to another, and cycle back to prior failed claims as memory fades. So we will be dealing with meaningless claims of temperature crisis for a long time.

  301. Zeke posted on adjustments to USHCN because goddarians see conspiracies

    “It is NOAA ‘s NCDC centrepiece.
    Not really, CRN is.

    It is extensively quoted across the US and globally when trying to compare changes in temperature.
    by NOAA, self serving.
    It is used in the global GCHN
    WRONG WRONG WRONG. GHCN comes in two varieties
    daily and monthly. USHCN is not a source for either of these.
    USHCN is NOT USED in GHCN-D or GHCN-M.
    It has been updated and rebooted ( see Zeke re changes).
    Yes, note that when spenser and christy change the past in UAH you dont have a cow
    If it was not relevant it would be thrown out with the bath water, instead you diss it then wash in it 24 hours a day.

    only GISS uses it. they should stop

  302. angech, it’s hard to take you at all seriously when you serially commit the same errors and refuse to carry your own water to the point of even reading publicly available documents, like the descriptions of USHCN or GHCN.

    And yet even when other people hold your hand and do it for you, your only response is “sorry but I don’t trust you”.

    This is seriously munchkinville behavior on your part.

  303. Steven Mosher,

    I don’t understand how you can say GHCN-D & GHCN-M don’t use stations which are part of the USHCN network. They just don’t apply USHCN adjustments. It seems those stations are important enough to get daily updates even when no monthly data has been received on a particular day. GHCN-M then produces a new monthly file for that day with just USHCN updated.

    There was a three day period just this past July(13th-15th?) when GHCN-M received no monthly data from anyone. USHCN past monthly values were still being updated by adjustments to data made by GHCN-D and GHCN-M dutifully released new databases.

    The last week or so of the month, GHCN-M was already publishing July monthly data for some USHCN stations. They modified those each day for the rest of the month as the data from the automated portion of the network stations came in. I estimate it’s about 70% automated now. It appears once a station can no longer exceed 9 missing days for the current month, GHCN-M includes monthly data for the month which has not yet ended.

    Without USHCN stations(GHCN-M has 964 reporting 2013), the GHCN-M would only have 167 US stations(including islands) reporting and only 1547 for the world.

    The USHCN exists because it was selected to be a balanced group of stations useful for reporting the contiguous US temperatures and hopefully useful as talking point. Using the adjustments with infilling provided by USHCN is an abortion. The GHCN way of using those stations appears to be more rational.

    If they dropped the name do you think they wouldn’t use reports from the same stations for US temperature and simply find another way to spin it? Maybe it is just that Goddard is making hay due to their manipulations. I think Goddard pointing out how bad their final product is, is a good thing. He is reversing the talking point value of the network adjustments used by USHCN. Showing those people to be incompetent or worse. That leads people to think more skeptically about similar products produced by others. Sounds like a good thing to me.

  304. ‘”WRONG WRONG WRONG. GHCN comes in two varieties
    daily and monthly. USHCN is not a source for either of these.
    USHCN is NOT USED in GHCN-D or GHCN-M. Mosher

    Ahem. from CDIAC

    the USHCN Daily database contains the same variables as NCDC’s Global Historical Climatology Network-Daily (GHCN-Daily) database. \
    USHCN-Daily is in fact a subset of GHCN-Daily, which serves as the official archive for daily data from the Global Climate Observing System (GCOS) Surface Network (GSN).

    As part of the GHCN-Daily quality control procedures, USHCN-Daily data have been subjected to a variety of internal consistency, frequent-value, outlier, and spatial consistency checks.

    Bias Adjustments” YES THEY DO MAKE THEM!

    At present, the USHCN daily data contain no adjustments for biases resulting from historical changes in instrumentation and observing practices.[Zeke says this is not true]

    This is true of the GHCN-Daily database as a whole, which includes the USHCN daily data. However, there is ongoing work at NCDC to develop adjustments that can be applied to daily maximum and minimum temperatures, and a GHCN daily derived product containing adjusted daily temperatures may become available in the future. [Already there unfortunately]

    Also from the USHCN site themselves
    “”We recommend using USHCN whenever possible for long-term climate analyses. The careful selection of each station and the series of adjustments applied to the data make the USHCN database the best long-term monthly temperature and precipitation data set available for the contiguous United States. It provides an accurate, serially complete, modern historical climate record that is suitable for detecting and monitoring long-term climatic changes. Other data sets, such as the Climate Division Dataset, may produce misleading trends due to artificial station changes.””

    No citing that his is modelled data not the true raw records is there? ANYWHERE.

  305. Carrick, I have read them, That is why I can see the incosostencies you deny. Are the Oompa Lumpa’s related to the Munchkins as they live in Wonky land where some of these clains come from

  306. DeWitt, the statement about government labs over stating the value of their products unfortunately applies in droves to USHCN.

    Bob Koss, the wikipedia says accurately “The GHCN is one of the primary reference compilations of temperature data used for climatology […] ” It is a database. So is USHCN.

    Both are data clients of the physical network of stations managed by NOAA. Both are written by the same group of people and have somewhat inflated language describing the value and primacy of their products.

    You can get the raw NOAA data from any of these stations without going through either USHCN or GHCN. USHCN and GHCN is are just convenient places to “shop” for good METAR data streams, but the actually METAR data are available from NOAA directly.

    While GHCN applies quality control and homogenization, these changes are based on simple and well documented algorithms. USHCN applies quality control and homogenization, but it’s buggy and not well documented. There’s really no reason to go through GHCN or USHCN except convenience.

    I actually prefer going through weather underground because I can search across a larger swath of data sources for a given area that I need met data for and I can get the hourly metar data directly from there.

    I think some of the complaints about USHCN are valid. They haven’t fully documented all of the changes they make, and there are obviously errors in their code. I think Steven Mosher is correct that this product shouldn’t be used for serious climate research.

    It is true they attempt to flag station data that are being infilled. But, in responding to Goddard’s bombastic rhetoric, Zeke discovered that as much as 10% of the USHCN data aren’t correctly flagged as infilled. Anthony has a pretty good post on it here. I think he gives Goddard credit for more than Goddard deserves, but whatever.

  307. angech

    “No citing that his is modelled data not the true raw records is there? ANYWHERE.”

    wrong.
    “The careful selection of each station and the series of adjustments applied to the data make the USHCN database the best long-term monthly temperature and precipitation data set available for the contiguous United States.”

    NOTE, if I tell you data is ADJUSTED, I’ve told you its not RAW.

    Next,

    You refuse to contact NOAA with your complaints. Instead you complain to me and zeke. Guess what? we aint NOAA.

    Here is a test. Spenser’s UAH makes as many if not more adjustments to raw data to produced their model of tropospheric temps. Roy runs a blog. Ask him why he doesnt clearly market his data as NOT RAW, but adjusted. The reason I suggest Roy is
    A) he actually is in charge
    B) you can interact with him on his blog.

    See how he accepts your notions.

    WRT to USHCN.. write your congressman

  308. Re: Steven Mosher (Aug 7 11:55),

    Speaking of Roy, wasn’t version 6 supposed to be out by now? They’ve been at version 5.6 since June, 2013. When versions changed in the past, the entire data set was revised. The big trend change, though, happened when they started correcting for orbital drift.

  309. “You refuse to contact NOAA with your complaints. Instead you complain to me and zeke. Guess what? we ain’t NOAA.”
    Thank god, you are real people.
    But you are here and you do listen and if any of the comments are valid you will probably pass them on to fix up.
    Further NOAA staff probably monitor web sites of interest like Lucia’s to see how quiescent the audience is. When they see reasoned arguments like mine get put forward a few times the penny may drop that they need to do better.

    “No citing that this is modeled data, not the true raw records is there? ANYWHERE.”
    You got me, it is wrong as a statement.
    As we both know I was referring to the published USHCN charts which Zeke has said do not need labeling to say it is modeled data.
    This was the context I was arguing on but if you wish to be nit picky I will just write it all out in longhand in future.

  310. “You refuse to contact NOAA with your complaints. Instead you complain to me and zeke. Guess what? we ain’t NOAA.”
    Thank god, you are real people.
    But you are here and you do listen and if any of the comments are valid you will probably pass them on to fix up.
    1. I won’t, Zeke might.
    2. you havent made a valid complaint.

    “Further NOAA staff probably monitor web sites of interest like Lucia’s to see how quiescent the audience is. When they see reasoned arguments like mine get put forward a few times the penny may drop that they need to do better.”

    1. They don’t
    2. You havent made a reasoned argument. You havent
    even looked at the data they provide. Here is how I
    work. We supply code and data. If I see that somebody
    hasnt looked at the data or code, I disregard what they
    have to say. If they have looked at it and refuse to send
    me mail, I disregard what they have to say. If they send
    me mail and and need help, they go into my que.
    Top of my que now are government agencies, industry,
    other researchers, open source developers, and grad students. I imagine that Matt and Claude have similarity priorities.

    “No citing that this is modeled data, not the true raw records is there? ANYWHERE.”
    You got me, it is wrong as a statement.
    As we both know I was referring to the published USHCN charts which Zeke has said do not need labeling to say it is modeled data.

    1. I dont know you were refering to charts
    2. Even if you were you are wrong. I would follow the
    practice of Roy Spenser. You show the chart, you provide
    links to the methods.
    3. Here is another test. Anthony published a paper using USHCN
    data. Ask him and John NG why they dont label their charts
    as “modelled” data.

    This was the context I was arguing on but if you wish to be nit picky I will just write it all out in longhand in future.

    Write it in longhand if you like it doesnt change the argument

  311. Speaking of Roy, wasn’t version 6 supposed to be out by now? They’ve been at version 5.6 since June, 2013. When versions changed in the past, the entire data set was revised. The big trend change, though, happened when they started correcting for orbital drift.

    ##########

    I think it would be interesting to see how the “past” changes with various verisons of UAH and RSS. I don’t know where he is in the process of updating.

    I will say that working with Sat data dwarfs anything I’ve done before.

    For example, I’m currently working on MODIS LST. For a test
    I’m looking at daily data for the Yukon. 1km resolution.

    The HDF tiles for the area are downloaded and then stitched and re projected into WGS84. started in 2000… a week later i am up to october 2002. 12 years to go.

    doing it for the full globe? I think I need a bigger disk and a better tool path as I broke all the tools trying to do the full globe.

    even then constructing a global daily LST series since 2002 from this data is huge task .. there are maybe 3 or 4 approaches used by various researchers.

    At AGU this year the AIRS guys and JPL will be doing an official version of some of the stuff I showed here ( trends over the whole series)

    Bottom line there is a ton of un used Sat data…

Comments are closed.