Munchkin

Mar25

Comparing IPCC Projections to Individual Measurement Systems.

Recently, the subject of using only one set of measurements to perform a hypothesis test arose. As many are aware, I prefer to average over instruments. But, I’m willing to consider each set individually. So, today I did that.

My main results are: Looking at the data 12 possible ways, I get 9 results that say “reject the IPCC best estimate” to a confidence of 95%.

So, in today’s post, I’ll explain my results. But first, I will explain why I prefer to use merged data when comparing IPCC projections to data.

Why I use an “ensemble average” of multiple GMST data sets.

When testing data, it is obviously possible to take two approaches: a) Select a data set and use only that data set or b) Use a collection of all data thought reliable by practitioners. I prefer the second approach. As I see it, the advantages and disadvantages of each approach are as follows:

  1. Pick one GMST data set, ignore evidence from other data sets.
    Advantages: Least effort. Readers will notice that back in January, when I first began using data, I would use only one data set: GISSTemp. This choice was dictated by pure laziness. I was interested in getting up to speed and gaining familiarity with available data sets and the literature.

    However, while some analytical laziness is excusable in a blog post, I always planned to include more data sets as they became available.

    Disadvantages: There are two main disadvantages of using only one data set. These are: one may be suspected of cherry picking and one increases β error. The difficulties with the first can be largely minimized by explaining one’s data choice prior to performing an analysis.

    The second disadvantage cannot be eliminated. Selecting one data set when 5 are available increase β error. Period. There are some valid reasons for discounting available data. For example, if some data are known to be of poor quality or in error, one can justify leaving it out of an analysis. So, for example, had NASA GISS failed to correct the Y2K error recently discovered in their data, this might be good reason to leave it out of the analysis. However, if reasons are valid, the reasons can be stated up directly, and should be.

  2. Use all the standard data sets thought reliable.
    Disadvantages: Slightly more work.
    Advantages: a) Appear more trustworthy, b) reduce β error, c) reduces uncertainty in the mean results.

I may be wrong, but in my opinion, the averaging over multiple data sets is better than relying on only one.

However, since this topic has been discussed in blog comments, I will now take the liberty to elaborate a further on these two issues, as both are important in the context of the “blog climate wars” we all enjoy. :)

What is the problem with raising suspicious of cherry picking? Of course no one cherry picks. :)

Nevertheless, should a blogger with a particularly point-of-view accidentally select a temperature record that happens to be the outlier that gives the result that blogger is known to prefer, using that particular one data set fosters suspicions of cherry picking.

I believe AGW to be true, but since I am willing to pro-actively test projections against data during what appears to be a “stall” in warming, much of my audience consists of skeptics. Clearly, they are not going to be convinced this stall is meaningless if I restrict my analysis to using GISSTemp, the data set that shows the least recent cooling. Rather, what will happen is this: They will decide I simply pick data to suit my pre-conceived notions.

I know that trust and distrust are feelings that last. So, for this reason, I prefer to include a variety of respected data sets in my analysis and report on as many results as possible. That way, when the temperatures do warm, and my updated plots and trends show the renewed warming, I think my audience will trust my plots are not simply attempts to present a tendentious argument in favor of a theory I believe to be true.

What is the problem with elevates β error? Oddly enough, the possibility that I might be accused of cherry picking is minor compared to the real difficulty which is that using one data set inherently introduces more scatter due to instrument variability. This elevates β error, without reducing α error. I discussed β error previously and explained that if a null hypothesis is actually wrong it can take many, many years of data to disprove even a false hypotheses to a chosen, high level of statistical confidence.

Since I know many of my readers are aware of β error. Many are aware that using test with high β errors is a well know trick to claim something is proven, when in fact, all one has done is failed to disprove using very little data. Since my readers know this trick, know that I know it, and know that any competent statistician is aware of this issue, I prefer to use methods that minimize β error, while holding the α error at a specified value.

This results in a hypothesis tests that, on average, do not increase the rate of rejecting IPCC projections when they are correct (i.e. α error), but have some chance of rejecting it when it is, in fact, false (i.e. β error.)

(By convention, a “failed to falsify” result should be accompanied by an estimate of the β error, or statistical power. I haven’t seen these discussed in the ‘climate blog wars’, but I do intend to extend my spreadsheet to include these at some point. Reporting both α and β errors are important if people are to draw inferences about statistical results.)

Current comparison between IPCC projections and five data sets.

My readers already know I computed the trend in Global Mean Surface Temperature (GMST) four data sets using data from Jan 2001 through Jan. 2008. I compared that result to the IPCC projections, and found the IPCC projections…. erhmmm… not so good? ( That is: a hypothesis test using Cochrane-Orcut, and confidence intervals computed using a “t” distribution, indicated that the IPCC projections should be rejected to the 5% confidence intervals when compared to the data.)

But now it’s March! So, February data are in. Also, due to interest in this exercise, other bloggers are now performing variations on this analysis. So, naturally, I am extending my analysis. I think the variety of results will give various people more information to consider when forming their opinions about the predictive ability of IPCC projections.

To extend the analysis, I have decided to show results of hypotheses test to determine whether the IPCC best estimate for the trend in GMST during the next three century, published in the AR4, is consistent with observation for the GMST measured after the projections were made.

I will use to basic analytical to test the hypothesis, both using two-sided 95% confidence intervals (i.e. &alpha=5%). The two methods are:

a) Cochrane-Orcutt (CO) , with two-sided confidence intervals, calculated assuming the uncertainty in the mean is t-distributed and
b) Ordinary Least Squares ( OLS )with the number of degrees for freedom adjusted using Neff/N = (1-ρ)/(1+ρ), where ρ is the correlation of the residuals for the OLS fit.
In addition, I will test:

  1. The “average” temperature for each month, computed by averaging the temperature from each of 5 reliable data sources.1. This gives a one trend based on an average. Done this way, the uncertainty intervals on the mean trend include the uncertainty due to weather noise; however, uncertainty due to measurement error, which arises due to lack of precision from each data source is mimized.
  2. The temperatures from each source individually. This results in 5 trends. Because the lack of precision due to each instrument, these will have the largest uncertainty intervals. Making conclusions based on these maximize β error. That is: we increase our risk of failing to reject the IPCC projections when they are wrong.
  3. The average of the trends for each source, calculated , with the uncertainty intervals calculated as if the residuals for each instrument at a given time are uncorrelated from each other. This is incorrect, but the this uncertainty band would enclose the uncertainties in the slope computed using the five sources. It is illustrative for this reason.

Methods 1 & 2 have identical α (alpha) errors. So, I consider the method with the minimum β error superior, as it is gives the least, overall, number of errors. This is why I average over all instruments. Method 3 is deficient as method to test the IPCC hypothesis, and merely gives some sense of the uncertainty due to measurement noise without regard to ‘weather noise’.

Results

After applying this test, I find that using the method I prefer (averaging first, then fitting the trend), I the best estimate by the IPCC is rejected to a confidence of 95%. It is too high to be consistent with the weather data we have experienced.

Results of Hypothesis Test For IPCC Best Estimate Projection of 2C/century.
Best Fit Trend Reject 2.0 C/century to confidence of 95%? (α=5%)
Method C-O C/century  <m> OLS( C/century) C-O OLS
Average all, then fit trend. -1.1 ±2.2 -0.3  ± 2.2 C/century IPCC Projection Rejected IPCC Projection Rejected
Fit trend to each, then average. -0.9 ± 1.6 -0.3 + 1.4 See note. See note.
Individual Instruments
GISS -0.4± 2.2 0.2 ± 2.3 IPCC Projection Rejected Fail to reject
HadCrut -1.6 ± 1.8 -1.0 ± 1.9 IPCC Projection Rejected IPCC Projection Rejected
NOOA -0.3 ± 1.7 0.0 ± 1.7 IPCC Projection Rejected IPCC Projection Rejected
RSS -1.4 ± 2.1 -0.6 ± 2.2 IPCC Projection Rejected IPCC Projection Rejected
UHA -0.8 ± 2.9 0.0 ±2.9  Fail to reject Fail to reject
Note: 1 ‘Method 3′, that is taking the average of the 5 individual trends results in ‘reject/reject’ for the IPCC 2C/century trend. However, as I noted, that is meaningless, as the uncertainty intervals only include the variation due to measurement uncertainty and fail to properly include weather.
Note: 2: Estimates using OLS are given for comparison only. When data exhibit ‘red noise’, the C-O results are more accurate than OLS.)

Examining the table, we see that the IPCC projections of 2C/century are “rejected” to the 95% confidence level using most the methods I tested. If we average the data, and then test, the trend is rejected to the 95% confidence level using both C-O and OLS. (Note however, that when the two methods disagree, C-O is more accurate.) Using each individual instrument, it is rejected under 7 out of 10 possible test methods. The ambiguous result “fail to reject” arises in 3 out of 10. However, due to the small sample time, we know that β error is large– so, “fail to reject” is best interpreted as “not enough data to tell for sure”, rather than “IPCC projections are likely correct”.

Below, I have graphically illustrated the main result and illustrated it below.


GMST vs Time March 25, 2008
Larger Image: GMST vs Time March 25, 2008

“Average” results are those obtained by applying Cochrane-Orcutt to the “averaged” temperature as my standard for determining the trend. The ±95 uncertainty intervals are also calculated using Cochrane-Orcutt; I assume the uncertainty in the mean is t-distributed. (These give very slightly larger uncertainty intervals than the corrected OLS. So, it reduces the rate at which I reject the IPCC trends.)

The best fit for each instrument is illustrated; as are all the data. Currently, GISS gives the least negative trend; HadCrut gives the most negative trend. Other instruments provide intermediate results with UHA MSU giving results closest to the mean off individual instruments.

The IPCC central tendency is illustrated: it lies outside the uncertainty intervals which corresponds to rejecting the hypothesis that the IPCC projections are correct.

So, can this change?

I suspect that the current trend will break, as all trends do. Warming is hardly excluded by the current data. As I have said repeatedly, warming is not rejected by this data. In fact, pre-existing 30 year trends are not excluded by the current data.

So, given the past trend, and the strength of the theory underlying the theory of AGW, warming is likely to resume. When this occurs, the central tendencies for all data sets should turn positive.

But what this data indicates is that if and when warming resumes, it will likely occur at a rate that is lower than projected by the IPCC. So, while the trends will turn up they are unlikely to reach the 2C/century of warming.

I’d also like to note another feature of these test. Let us supposed, the “true” underlying tendency turns out to be 1 C/century. How will these hypothesis test “look” over time?

Oddly enough, due to β error, we are quite likely to see a number of “failed to rejects” increase and decrease over time. The reality is that, though I have not calculated β error, we are in the period of time when β error is anticipated to be large. So, until there is sufficient time for β error to drop below 50%, we will tend to see more periods of “failed to reject” than periods of “reject” even if the IPCC projections are wrong.

Because of the effect of high β (beta) error, careful scientists rarely interpret “failed to reject” as confirming a hypothesis that has not been supported by very large amounts of historical data and sound theory with very few approximation or assumptions. While the theory of AGW is well supported, it is not clear to me that the specific quantitative predictions by the IPCC are, by extension, supported with equal strength.

My understanding is: The consensus states that AGW is proven. But the magnitude is still being debated. One of the ways to test the various hypotheses regarding the magnitude is to do data comparisons. This comparison to observation suggests the IPCC’s estimates are high.

Footnotes:

1. Data from GISS Land Ocean, UHA MSU, NOAA, RSS, and HadCrut

Updates
3/27/2008: I inserted a link to a relevant post comparing to IPCC projections to data.

3/27/2008: I uploaded a figure about beta error. The figure is supplied by reader martin ringo.
Illustration of Power of a Test

Previous Post:
« The Teeter-Totter of Temperature!

Next Post:
When were the models used in the TAR frozen? Around 2000. »


116 Responses to “Comparing IPCC Projections to Individual Measurement Systems.”

You can leave a response, or trackback from your own site.

{ 116 }

Comments

Read more comments, pages: [1] 2 3 4 » Show All
Page 1 shows the earliest comments. Earlier comments are above later comments.

  1. comment 1345

    Lucia,

    Saying you believe in AGW but think the IPCC is wrong is like saying you believe in Catholicism but don’t believe the Pope is infallible. It may be a perfectly reasonable stand to take but many Catholics will still reject you as a heretic. I also think that the majority of sceptics would agree that ~1 deg/century is a reasonable estimate of the effect of CO2 (e.g. Pat Micheals).

    On another note:

    Why are the uncertainties for UAH higher than the rest?

  2. comment 1346

    Raven– My parents baptized me Catholic, but I don’t believe the Pope is infallible. (Anyway, the Pope himself is fallible. There is some specific way in which he is infallible– in certain writings or something. :) )

    The greater uncertainties, when computed statistially, come from greater scatter around the mean trend. This is usually due to lack of precision. Precision is different from accuracy. Oddly enough, statistical treatments I know of can’t give us an estimate of bias–which related to inaccuracy. (That’s actually one of the reasons I don’t like to pick one instrument. By picking all of them, I can get some sort of idea about likely range of bias, even if I can’t say much statistically about it.)

  3. comment 1348

    Very nice and illuminating post, thank you. There is as you say no reason why one should not accept your conclusions, still think the IPCC is right about the main outlines, still think AGW is real, still think we should act immediately to lower CO2 emissions. But this seems not to be how it works socially or psychologically with the proponents of AGW.

    One important characteristic of the movement is that its followers feel compelled to defend a l’outrance even propositions which are both plainly wrong, and not central to the thesis. It is as if in earlier periods when defending the merits of Newtonian physics, people should have felt compelled to defend the great man’s views on alchemy. In my own field, we have seen it from Apple advocates, who defended the one button mouse for years after it became clear that this was both an idiotic user interface, and not a core element of the Apple UI. We have seen similar approaches in religion and politics. We see similar efforts to defend the indefensible in regard to MBH98’s statistics.

    It is, as Pielke suggests, potentially an enormous intellectual tragedy in the making. AGW has become so entwined in the public mind with environmentalism, and so entwined by its proponents with these peripheral issues, that it risks crashing itself should it fail on some of them, and it also risks crashing environmentalism as a whole.

    When analyses as reasonable as this one evoke a chorus of abuse, which it is doing and will, it is only human to stop listening to anything the chorus says. And yet, the central thesis may be correct, and is of fundamental importance. It is just that, sadly, the extreme advocates are, as so often, its own worst enemies, though they cannot see this.

  4. comment 1351

    On the question of data sets, I think there is a good non statistical reason to reject at least two of them for calculations over relatively short time frames: the potential for bias. On the one hand the GISS data set does not have a hands off relationship with Jim Hansen, one of the major alarmists, and for whatever reason, it shows the highest trends over the past decade or so. On the other hand, the UAH does not have an arms length relationship with Roy Spencer, one of the prime skeptics, and low and behold, the UAH data set shows the least warming since inception in 1979. These may be perfectly coincidental, but confidence in the integrity of the data sets would be enhanced if there were steps taken to ensure separation. Mind you I am not saying that there is biased intervention, rather there is the potential for small unconscious manipulations in the assembly and averaging of the data to conform to the prevailing view.

    Fred’s point about risking credibility is spot on. The AGW hypothesis is subject to rejection if there is an extended cooling period, and with it would likely come a discredit of the movement. However I want to point out that advocates would scream louder than ever. The social psychologist Leon Festinger, who developed the concept of cognitive dissonance, did studies showing even stronger belief and louder prosletyzing after disconfirmations. In his book Failed Prophecy he reports on a California doomsday cult whose successive predictions of specific days for the end of the world passed with rescue by UFOs. The group became more shrill and more convinced that the next time it would happen. Look for advocates to become ever louder and more opaque in their explanations.

    On the technical front, I took a shot at seeing if correlations could be substantially reduced by linking the residuals of OLS to the ENSO index. I did a cross correlation between the residuals for the OLS fit to Hadley data and the MEI(multivariate ENSO index). The best fit is for a three month lag between the temperature signal and the MEI, with ENSO leading as it should. However the r^2 is only 0.27, so only 27% of the residuals is explained by ENSO. This gives a little over 20% reduction in the 1- month and 2-month autocorrelations. So C-O or some other treatment is still warranted.

  5. comment 1352

    Fred

    There is as you say no reason why one should not accept your conclusions, still think the IPCC is right about the main outlines, still think AGW is real, still think we should act immediately to lower CO2 emissions.

    In fact, I think all these things! I’ve been for alternative energy sources since…. the oil embargo in the ’70s! At the time, my thoughts were not related to CO2 accumulation, but this is now an additional important factor.

    In terms of getting action, I can’t help but believe that admitting uncertainty in our ability to predict average surface strengthens rather than weakens the case for action on developing alternative energy sources. After all, there are many reasons to diversify the source of energy for electricity generation (going Nuclear), to increase the amount of solar or wind power, or to have sewage treatment plants run co-generation and use less natural gas.

    If the only reason ever advanced for these things is AGW, then how are we to convince those who simply don’t believe to make the monetatry investments to act?

    Roger:
    I agree that ideally, the GMST data should be supervised by people who don’t have strong POV’s with regard to the theories we test using the GMST data. Yet, it’s probably unrealistic to expect to achieve this ideal. After all, simply working on a project causes one to develop strong POV’s.

    If more data were available, I might consider the POV of the supervisors and reject both GISSTemp, and UHA for those reasons. But, otherwise…. well, there are only 5. And, as far as I am aware, most practitioners seem to believe these data are sound.

    Both groups correct data when error in algorithms are uncovered. (Though, my impression is Spencer found his own errors, announced and corrected. In contrast, Hansen’s were found by outsiders. Still, the fact that outsiders now have access to the NASA methods is useful from a statistics/bias perapective.)

  6. comment 1353

    Roger,
    “On the one hand the GISS data set does not have a hands off relationship with Jim Hansen, one of the major alarmists, and for whatever reason, it shows the highest trends over the past decade or so.”

    HadCRUT is controlled by Phil Jones - Hansen’s alarmist twin from the UK.
    RSS is controlled by a warmer group as well.

    I tend to trust the satellite measurements more because there are two groups with opposite biases using the exact same datasources and their algorithms are public knowledge. This competition keeps them both ‘honest’.

  7. comment 1354

    The point by Raven is well taken, and his/her observations suggest that an unbiased approach would be the average of the two satellite data sets.

    I have personal experience in how subtle the effects of bias can work. More than 10 years ago my company was involved in a patent law suit with another company. The details aren’t important, but the issue resolved down to the molecular weight of a polymer backbone for a particular product. If the MW was about 1200, we would win; if it was around 1500, the opposite side would win. The judge ordered each side to pick three outside experts to do the measurements and determine the right answer. Polymer MW measurements can be tricky because of entangling and cross-linking effects, etc. The selected experts were top analytical chemists of unquestioned expertise and integrity. Our experts came in with an answer of 1200; their experts got 1500. How could this be? The difference was that the opposite side told their experts beforehand what they thought the right answer was. And sure enough you could see in their experts’ lab books where they had made decisions that took them to 1500. It was not conscious bias, just a mental “tilt” that influenced their work and ultimately the answer. Since then I have had a great respect for the influence of POVs in so-called “hard” science.

    I would also like to point out that it matters a great deal how large AGW is, because it goes to the policy for dealing with it. And the differences are not nuances. Even taking the IPCC mid-range case as a given, integrated assessment models deveoped by the prize winning environmental economist William Nordhaus (The Challenge of Global Warming: Economic Models and Environmental Policy, April 2007)show that a 50-year wait before enacting restraints is very close to optimum policy in terms of benefit/cost ratio. He further finds that Kyoto-like cap-and-trade policies are “inefficient and ineffective.” If AGW effects are indeed smaller than IPCC projections, as is suggested by empirical results, optimum economic policy tilts yet more in favor of developing more economical emissionless technology rather than reducing economic growth, especially in the developing world, through restrictions and deploying current uneconomical technology.

  8. comment 1355

    Both groups correct data when error in algorithms are uncovered. (Though, my impression is Spencer found his own errors, announced and corrected. In contrast, Hansen’s were found by outsiders. Still, the fact that outsiders now have access to the NASA methods is useful from a statistics/bias perapective.)

    Actually Lucia this isn’t quite true, the errors in the satellite method used by UAH were first pointed out by others, Fu et al., Mears et al., Prabakhara et al., Wentz et al. among others. Spencer & Christie somewhat reluctantly accepted them and implemented changes which had the effect of giving a warming trend rather than the original cooling.

  9. comment 1356

    Fred,

    The whole idea of combating an environmental threath with less consumption is bit odd, as the underlying assumption is that we can stop and go back to something that was sustainable. However I find little support for the idea that our society ever was sustainable in the industrial age. Instead we have managed so far only because of rapid development (and sometimes not rapid enough, e.g. sulphur dioxide emissions were cut by a factor of 1000 in less than 40 years thanks to technological developments, but some forrests still died).

    If we instead realize that our current state of affairs in not sustainable and that going back is not an option, the conclusion is instead that we need more development of technology, agriculture methods and produce, along with more specialization/trade etc. That takes wealth and freedom for a lot of players to pursue the optimal solution. Hence neither crashed economies nor heavy handed regulation is likely to lead to better environmental stewardship. Wealth, obviously, is needed for development to take place. Regulation are often counterproductive (as an example, I recently developed a technology that has huge potential to reduce energy consumption, but struggle to implement it as I cannot use the material I need because of the REACH directive in Europe; maybe I will find a way around it, but if not, a significant reduction in environmental load will not be realized because I cannot use a few kg of a not very harmful substance).

    The environmental movement, which for a long time felt like home to me, has abandoned many of its likable features and has instead entered a mindset where enterprices are enemies, those who disagree are evil, and hope for deliverance has been replaced with a fervour of pessimism in spite of all the progress both the world and the movement has achieved historically. It is in this context that Kyoto and the scare stories of global warming were born.

  10. comment 1358

    Phil!
    Thanks for clarifying! I was misinformed.

    So, in both cases, the errors were found by outsiders. Well, one of the strengths of science is that outsiders can and do look at methods and data. Then, if the criticism are warranted, the community does eventually come aound.

  11. comment 1359

    I would trust Spencer and Christy 110% Would not be surprised if there original data ends up back up there

  12. comment 1360

    Vincent, Part of the problem we see in over politicized science is this:

    1. a refusal to admit small errors.
    2. banishment for small errors.

  13. comment 1364

    [...] in temperatures in the last 12 months here a decline in temperatures in the last 7 years reported here, decline in the last 10 years here, and now, indications of atmospheric stability back to 1995, or [...]

  14. comment 1365

    Phil,

    It seems your recollection is a little different to mine - I’m mainly going from memory here, so I could be wrong.

    Fu et al made a number of criticisms of the UAH MSU data set. As far as I am aware, all of these were found to be without merit and have not been included in the current data set. I’m unfamiliar with the Prabakhara work so can’t comment on that.

    Mears and Wentz of Remote Sensing Systems generated their own analysis of the MSU data, and found some discrepencies. Spencer and Christy provided their code for the diurnal drift calculation, and Mears and Wentz identified an error in the software. This was duly recognised by Spencer and Christy and fixed, with due credit given to the RSS team. (Why do you refer to these as “Mears et al” and “Wentz et al”? Its like arguing MBH98 was authored by “Mann et al”, “Bradley et al” and “Hughes et al”)

    Far from changing a “cooling” to a “warming”, it increased the warming trend from around 0.09 deg C/decade to around 0.12 deg C/decade - the change was actually within the stated error bars for the trend calculation (stated at 0.05 deg C / decade at the time). Of course since then the trend has further increased to 0.14 deg C / decade (due to new measurements, not more software errors).

    It should be noted that Spencer and Christy returned the favour, helping identify a software error that was causing the RSS data to become too cold in recent months. Mears and Wentz also promptly updated their code and acknowledged Spencer and Christy in doing so.

    The reasons for divergence of the UAH and RSS data has not been fully assessed as far as I am aware, although two obvious notes, the UAH includes more of the South Polar region (which has not warmed to the same extent as the rest of the world, so introduces a warming bias to the RSS data), plus there have been some questions raised regarding the diurnal correction (I believe RSS uses a climate model to estimate the correction required, whereas UAH uses surface station data)

    I’m only aware of one significant correction (the diurnal drift) being identified by anyone other than Spencer and Christy themselves, so I’m not sure why you are pushing the idea that there were lots of problems spotted by lots of people.

    So far, the UAH and RSS teams seem to be getting on with good science, helping each other to develop the best possible data sets. It would be nice to see this kind of approach elsewhere in climate science, rather than people digging their heels in and refusing to give any ground even when errors are blindingly obvious.

  15. comment 1366

    Spence_UK
    Thanks for the elaboration. On the “et al.” terminology, Phil is an academic. It is quite likely he is referring to some specific papers, that were written by more than two people. Academics get in the habit of referring to papers the way they would type the words in a manuscript. (I used to do the same thing all the time.)

    Quite honestly, I think all he measurement groups are doing their best to provide data products they thing best reflect the real earth temperature. UAH and RSS seem to collaborate well, and that is to be applauded.

  16. comment 1367

    Following Roger’s account of bias, I offer the story of the psychology professor delivering a lecture on ’suggestion’. His class demonstrated their mastery of the subject by prearranging that each time the eminent gentleman moved to the right side of his lectern, they would pay rapt attention, whereas any movement to the left would be greeted by coughs, yawns, nose-picking, and heads in hands. Needless to say, the poor man wound up almost in the corridor.
    Now, some here (me too) may say psycology is about as “soft” as science can get, even suppsing it’s science at all, but I don’t thnik that’s quite the point. I would say most psychology professors would be at least as concerned about not appearing to be a douchebag , manipulated by their own class, as any “hard” scientist would be about leaning on his data to get a ‘favourable’ result. Some may also say, psychology professors are not ‘poor’ men, but deserve all they get except their salaries, but that be lacking in human compassion, so shame on you.

    Anyhow, kudos to all here (from a math free sceptic) for open minds and good manners. Let’s get to the bottom of this thing. Go, Lucia! You’ll have to ecplain your results at doggy level for me to understand, but I believe you can do that.
    Advanced doggy level, then…

    Woof.

  17. comment 1368

    Lucia,

    Thanks, I am familiar with the et al terminology :) But it is unusual to refer to a single paper, written by Mears and Wentz (link below), as both “Mears et al” and “Wentz et al” in the same sentence, especially without clarifying that you are referring to just one paper. Given that Phil was trying to highlight the number of different people who had found errors in the UAH code, that seems a peculiar way to make a point.

    http://www.sciencemag.org/cgi/...../5740/1548

    Mears and Wentz have published various other documents regarding satellite measurements (including one, quite recently, which was quite critical of model predictions, here), but the one above is the only one I’m aware of that resulted in a correction to the UAH code.

    PS. Nice work on the data sets, by the way. The IPCC method of estimating confidence bars for model outputs is, IMHO, seriously flawed, and your analysis helps to illustrate that. Ironically, the bigger error bars that sceptics would like to see on model outputs would make the AGW hypothesis more difficult to falsify. Who would have expected that?

  18. comment 1369

    Spence_UK March 26th, 2008 at 4:22 pm says:
    “Ironically, the bigger error bars that sceptics would like to see on model outputs would make the AGW hypothesis more difficult to falsify.”

    Large error bars would have to be propagated over time and would ultimately undermine the credibility of the model - especially if the error bars make it look like cooling is a possible outcome in 100 years. Think about it. How much weight would you put on a prediction of 3 degC +/- 25 degC?

  19. comment 1373

    Spence UK, I was actually referring to different papers as Lucia supposed.
    There were in fact several corrections dating from around ‘98, Fu et al. (U of W) identified stratospheric cooling as a source of error, correctly, whether they overcorrect isn’t the point.
    Wentz & Schabel identified the decay of orbits as a major source of error, particularly for LT, Christy suggested that other diurnal corrections etc. would counteract this error. However Prabhakara et al. did a reanalysis limited to near nadir data which also gave a warming trend contrary to S & C which reinforced W & S. Mears et al. also performed a reanalysis which agreed with P et al. and identified differences in the treatment of variations in the temperature of the hot calibration target as a source of error.
    As I recall the reason that RSS don’t go beyond 70S for the LT product is related to interference with the microwave signal by the ice (they also don’t take data for areas above 3000m for the same reason), Christy doesn’t accept that, I don’t recall the reference.
    At the time of the W&S correction S&C was showing a cooling of 0.05ºC.

  20. comment 1376

    Spence_UK.
    Oddly enough, large error bars both does and does not make falsification more difficult.

    One can falsify a central tendency against weather data. That means the 2C/century is not consistent — within uncertainty of weather data. You can draw large or small error bars around the 2C/century, that specific number is still falsified.

    What matters with regard to this test is the uncertainty in the trend that is consistent with the data.

    However, if the IPCC had large error bars, then, those regions within their error bars that are consistent with data would not be falsified. So, if, for example, their error bars included 1.0C/Century. That would not be falsified. So, they wouldn’t be “wrong”.

    But 2C/century would remain just as wrong as if they had provided no error bars.

    Looking forward what one would hope is that a group making predictions will publish realistic uncertainty intervals. Policy makers and the public need these to make realistic decisions.

  21. comment 1377

    Agreed that your reconstruction of the process is a reasonable interpretation of the rather opaque prose. But its a quite extraordinary situation if it is correct. We’ve the error due to observation from the stations. Then there is the error due to the first level models failing to match the observations precisely. Then there’s the error from the second level models failing to match the first level ones precisely. Then we are invited to become seriously alarmed at what these second level models show. If anyone proposed developing an engineering package to be used in construction or naval architecture like this, they would be thought mad. But basically what we are talking about here is something which should be a sort of Prolines or Maxsurf for the planet. Extraordinary.

  22. comment 1383

    Martin Ringo read my dicussion of β error. That is, the likelyhood that if the IPCC is wrong we would fail to falsify. Though many unfamiliar with statistics assume the difficulty with small amounts of data is that one is more likely to reject the IPCC projections when they are correct, that is untrue. That likelihood is dictated by the “α” selected for the test. (I have chose α=55).

    In my dicussion, I pointed out that the major disadvantage of small amounts of data is that we can’t falsify. This error is β error. The power of a test is defined as 1-β.

    I said I hadn’t calculated this value, but I would assume it was high. One of my statistician readers was curious about this, calculated this value. In his test, he calculated the power of a hypothesis test applied to the low end of the IPCC range: 1.5C/century, and provides results of the power as a function of both α and an assumed “real value” for the trend. (Power tests need this to be assumed.

    Here are the results:
    Illustration of Power of a Test

    This example may help those unfamiliar with these test understand the graph:

    Supposed the IPCC predicts the underlying trend, stipped of weather noise, is 1.5C/century, and people will accept their projections as true until shown inconsistent with data..

    Of course, we can’t know the real underlying trend, stripped of weather noise.

    Suppose the real value for the trend is 0C/century. That is: the IPCC is wrong, and high by 1.5C/century.

    Supposed to “falsify” 1.5C/century, we pick a confidence level of 95%– that is α=1-0.95 = 5%. That is: we say ” We won’t lose confidence in the IPCC unless you show the prediciton is inconsistent with actual weather, and the weather we get would have happened less than 1 time in 20 by pure random chance. That’s about the rate of flipping a coin head between 4 and 5 times in a row, starting with flip “1″– not cherry picking from a string of 100 flips.)

    In this hypothetical case, run to do statistics, the IPCC is wrong. So, you would think that most of the time, you would find that, right? No.

    To find the power of the test, find “0.00 C/century” on the horizontal axis.
    Now, trace up to the 5% line. Now trace to the left, and read the power. It’s roughly 10%.

    This means that, given the amount of data you have, if the IPCC were wrong and the “real” value of the trend is 0.0C/century, we would get the correct answer only 10% of the time. That correct answer is “The IPCC is falsified”.

    What happens the other 90% of the time? We get the incorrect result: “We failed to falsify”.

    Basically: in this hypothetical, the IPCC is wrong. Because we have very little data we find:

    The likelyhood proving them wrong is only 10%. So, 90% of the time, we don’t get the right answer– but the error is on the side of assuming the IPCC is right. This means β error is 90%.

    Meanwhile, on the flips side, we have the other hypothetical: What if the IPCC is right? Well, in that case, the way we set up the experiment so that we would mistakenly conclude they are wrong 5% of the time. (This is α error and it’s called a “false positive”.)

    How do we decrease α error? We just pick a lower α The analyst picks this. Obviously, if we set α to 0.0000001% we will practically never falsify anything. Oddly enough, the amount of data we take doesn’t change the rate of false positives.

    How do we decrease β error? There are two ways. The most common one is to hold α constant and take more data. The next most common way is to increase α.

    Increasing α to as high as 50% is commonly done by “normal people”. If, for example, your boss had a hypothesis that you thought was falls. You suggest that he’s wrong, but don’t have any data. Your boss, having some confidence in you, might be willing to consider his hypothesis is wrong.

    So, he might say: Ok. Take a few samples and compare to my hypothesis. Come back, and if you can show me there is an α=50% chance I am wrong, I will give you more funds to investigate further.

    So, this sounds like a reasonable boss, right? So, what about science;

    Considering the possibility that a new untested projection might be wrong when it fails at a confindence of α=50% is a rather common in science. After all, this means the projecition is more likely wrong than right. Stubbornly insisting that it cannot possibly be wrong because it falls inside wide uncertainty intervals is rather novel.

  23. comment 1385

    Phil,

    I’ve referenced the only paper I’m aware of that resulted in a required change to the processing due to an error being found. The fact that you can dredge up a whole bunch of spitballs that were hurled at Spencer and Christy, and that most were found to be without merit, is rather unimpressive.

    As I made clear, I am aware that Fu et al highlighted what they perceived as an error, but their analysis, as you note, actually makes the error worse rather than better; this was already discussed 8 years PRIOR to the Fu paper in the following paper:

    “Precise monitoring of global temperature trends from satellites”, Spencer and Christy 1990, Science

    So Fu basically raised a point that Spencer and Christy had already investigated in depth and had rejected as increasing the error term in the output data. Hardly someone else discovering Spencer and Christy’s errors, as you suggest, but someone else doing a reanalysis of something Spencer and Christy had already addressed and that person failing to understand the valid reasons as to why it was rejected - despite that reason already being a part of the peer reviewed record. So your categorisation of this as someone else finding Spencer and Christy’s error is a particularly peculiar form of revisionism.

    I looked at the Wentz and Schabel paper. As you note, they take a guess at what might be the difference between their analysis and Spencer and Christy, and get it wrong. How is this “someone else finding Spencer and Christy’s errors”? Based on what, the notion that two wrongs make a right? Rather than finding and fixing the actual problem, the diurnal correction, lets put in an additional - lets call it - “adjustment” for orbital decay that counteracts the software error in the diurnal correction. I guess “hey, it’s climate science” applies here.

    As for turning a cooling to a warming, again you don’t understand the consequence of adding new data. In their 1998 paper, W&S perform analysis on the set 1979 to 1995, giving a -0.05K / decade trend - barely different from zero, given the error bars, so questionably a “cooling” in the first place. Their 1998 paper failed to find the error in the diurnal correction. By the time the error was found, some seven years later, the trend (1979-2005) was +0.09K / decade. So your claim that the fixes have turned “cooling” to “warming” is another odd claim. Additional data turned the “cooling” into a “warming”, irrespective of the fixes. The fixes do change the data, admittedly, but since the magnitude of the error was slightly less than the stated trend error bars, the most change possible would be from “marginal, probably insignificant cooling” into “no significant trend”.

    As noted, I’m not familiar with “P et al” although your perception of the events above do not make me particularly inclined to follow it up. The only paper from Prabhakara I could find on the topic claimed the MSU data were too contaminated by clouds and rain to be used to detect global warming trends - clearly not a generally held view today. Can you link to the original paper, and to the change in processing that Spencer and Christy made as a direct result of that paper?

    Finally, yes I am well aware of the reasons why RSS stop at 70S, which have some merit. However, this does not change my stated position that Antarctica has not warmed as much as the rest of the globe over the satellite era, and the lack of these data introduces a warming bias with respect to global mean temperature, irrespective of the reasons for the lack of data.

    I reiterate my position that the UAH and RSS teams have worked together well, fixing each others errors and generating high quality climatic data sets. Quite why the climate science community feels the need to continually denigrate and misrepresent the good work of Spencer and Christy is beyond me.

    Raven: indeed, I think that may be an important driver!

    Lucia: sorry, you’re probably not aware of my own view - that weather noise exhibits self-similarity on increasing scales - which has rather nasty consequences and means the method being used to test for 2C/century trends would be inappropriate. However this is an assumption I make, not the IPCC; as such, your test is completely valid in terms of a test of the IPCC claims. Such a test would not be valid under my assumptions, and in fact a 2C/century trend would be surprisingly difficult to falsify. But then, you’re not testing my assumptions :) I hope that all makes more sense to you than it did to me when I typed it.

  24. comment 1387

    Hey Spence if you want to be a propagandist for the S&C position fine, however the facts are that far from being ’spitballs’, orbital decay was a problem, particularly for the LT, stratospheric cooling can’t be ignored, calibration was an issue. Agreed that UAH and RSS seem to work together well, this wasn’t the case in 98 as S&C had clear ‘ownership’ issues.
    In 98 Spencer said: “The temperatures we measure from space are actually on a very slight downward trend since 1979 … the trend is about 0.05 C per decade cooling.”
    Wentz & Schabel and Prabhakara et al. produced results in 98 showing that this was in error. The P et al. reference and abstract are given below:
    The work of S&C is not being denigrated in the scientific community quite the opposite, as Hansen said:
    “In crediting Wentz and Schabel for discovering the satellite altitude effect, we should not forget the credit that Christy and Spencer deserve for pioneering MSU analysis and bringing it to the point that a correction of 0.1ºC has such a large effect on interpretations of climate change.”
    That their initial calculations included errors which were identified by themselves and others and corrected is part of the normal progress of science.

    GEOPHYSICAL RESEARCH LETTERS, VOL. 25, NO. 11, PAGES 1927–1930, 1998

    Global Warming Deduced from MSU

    C. Prabhakara, R. Iacovazzi Jr., J. -M. Yoo, G. Dalu

    Abstract

    Microwave Sounding Unit (MSU) radiometer observations in Channel 2 (53.74 GHz) made from sequential, sun-synchronous, polar-orbiting NOAA operational satellites have been used to derive global temperature trend for the period 1980 to 1996. Christy et al. (1998) emphasize that they find a tropospheric cooling trend (−0.046 K decade−1) from 1979 to 1997 with these MSU data, although their analysis of near nadir measurements yields a near zero trend (0.003 K decade−1). Using an independent method to analyze the MSU Ch 2 nadir data separately over global ocean and land, we infer that the temperature trends over both these regions are about 0.11 K decade−1, during the period 1980 to 1996. This result is in better agreement with trend analyses based on conventional surface data.

  25. comment 1391

    Phil, you’re just too funny - but “propagandist” is no insult when it comes from someone so far from a neutral point of view as you are.

    Hansen, of course, does not a climate community make, and he is hardly being graceful even in the quote you give there. I still remember Raypierre on RealClimate denigrate Spencer and Christy’s “serial egregious errors” - note plural errors, repeated in your post above where you say “… errors were pointed out”. Yet I’ve asked you over and over again to show that more than one error has been identified, and so far the only actual error found is the diurnal correction fix - just one error, which I pointed out and linked to.

    Thanks for the abstract of the Prabhakara paper. Alarms bell ring the minute Prabhakara decide to pick a different date range to analyse (probably makes no difference, but why would they do that?) UAH trend between 1980 and 1996 is +0.032K / decade AFTER the diurnal correction fix. So most of the discrepancy between the UAH and Prabhakara findings still exists. Hmm, lets check RSS. Oh, they find a trend of +0.104 K / decade for the same period, almost identical to Prabhakara. Hmm maybe this analysis is not so independent. So it seems this discrepancy still isn’t resolved, even after the diurnal correction fix. So, tell us all Phil, what error have Spencer and Christy made to cause this remaining discrepancy? If you don’t know what is presently causing this discrepancy, then isn’t it just possible that the error is down to an error on the part of Prabhakara, Mears or Wentz, just as much as it could be Spencer and Christy’s analysis? Or that someone has made an incorrect, or just different, assumption? Or someone is comparing apples and oranges?

    So come on, Phil. You used “errors” plural as did Raypierre, but so far we have just one error, the diurnal correction fix, which had a net effect of less than the error bars on the data. Where, specifically, are all these other errors, plural, that you were talking about earlier?

  26. comment 1393

    Lucia, since most people have confirmation bias wired into their brains they may never get beta. NEVER EVER.

More comments: [1] 2 3 4 » Show All
Page 1 shows the earliest comments.

Leave a Reply

Your email is never published nor shared. Required fields are marked *

*
*

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

 
 

Recent Posts

Popular Categories

No categories

About

You have no about page, you should add one through the admin interface, or edit 'footer.php' and put some super cool information here!

  • Recent Trackbacks:

    • The Blackboard: Ninety Month Trends: IPCC AR4 2C/Century still outside ±95% uncertainty bands.
    • The Blackboard: Hypothesis test for 2C/century: now with Monte Carlo!
    • The Blackboard: Result of Boring Series: Gavin’s “Closer” Process Falsifies.
    • The Blackboard: Result of Boring Series: Gavin’s “Closer” Process Falsifies.
    • The Blackboard: AutoCorrelation for Averaged AR(1) process: Boring post 2 in boring series.