What Are The IPCC Projections? And How Not to Cherry Pick.

After I described a possible falsification test for IPCC data, and explained what future data might falsify IPCC predictions two notable things happened:

  1. Some of my readers asked me to start sifting back through historical data to discover if there is any data anywhere that falsifies any IPCC projection ever made. I’m not going to do that: that would be cherry picking.
  2. Some of my readers mentioned what I said on other blogs. Commenters at other blogs immediately there said my start date is cherry picking. Often, these commenters don’t even look at the plots and assume my start date is 1998, and the results are due to the record high temperatures in 1998. That’s not my start date.

I think cherry picking is a grave sin in data analysis. For this reason, I’m going to explain how and why I picked Jan 2001 as the start data for validating or falsifying IPCC projections, and why, with regard to testing IPCC projections, selecting any other start date would be cherry picking. While this post is long, and rather boring, I think it will be useful to those who may wish to support their arguments in blog comments with my results, as I will have explained precisely how I picked my start date of 2001, which is not cherry picked.

The organization of this post will be as follows:

  1. What are the minimum requirements to validate of falsify a predictive model?
  2. What are the IPCC Projections? A trend of 2C/century.
  3. What time frame should we select for validation? Start with data from 2001.
  4. What do the results look like to day? Deferred until later!

Requirements to validate or falsify

The minimum requirements to validate of falsify a predictive model are based on logic.

  1. To validate predictions we must see how well a model predicted.

    This means that positive validation of predictive power must rely on data obtained after the predictions were made. This requirement is the same for psychics, GCMs, econometricians or what have you. Many can predict the past; it’s predicting the future that’s difficult.

  2. Falsification predictive models can, include past data. We can identify bad models on the basis of inaccurate hind-casts.

    However, generally speaking, competent people don’t claim to predict the future using models that were clearly wrong in the past. So, although we can falsify using past data. However, to reign the possibility of cherry picking, when falsifying other people’s predictions, it’s best to limit falsification to data collected after predictions were made.

  3. Both validation and falsification require the projections be sufficiently specific to compare to empirical data. This means: the projections must generally be quantitative. They should use numbers, not just compare colored pictures and say “Looks good to me!”
  4. Both validation and falsification should be based on fairly standard hypothesis tests that provide quantifiable results.

We can, of course all argue about other specific questions. For example: Is a specific hypothesis test reliable? What confidence intervals do we use? What do we select as a null hypothesis? Must we absolutely positively decrees a hypothesis is false if it fails only one test? The answers may vary, but do not impact the general principles above.

What are the IPCC Projections

My current goal is to validate or falsify IPCC Projections. I have no particular axe to grind. I just want to see if the IPCC Projected correctly or not.

I down selected validating or falsifying IPCC Projections after Roger Pielke Jr. asked a broader questions about what relatively short term weather could hypothetically falsify climate model predictions. My goal was to make the question more specific.

I needed to identify some relatively short term IPCC Projections, and it turns out these exist.

Some of you will recall that Roger Pielke Jr. caused a minor blog kerfuffle by
IPCC Projections

In the figure above, I overlaid a straight line showing the IPCC’s near term projection of 2C/century for the first 3 decades. This near term projection is based on computation from a variety of climate models, using SRES marker scenarios. The IPCC reports that we expect warming of 0.1C per decade if the atmosphere is held at 2000 levels, and

“About twice as much warming (0.2C per decade) would be expected if emissions were to fall within the range of the SRES marker scenarios. This result is insensitive to the choice among SRES initiatives. “

(Ref: Understanding Near Term Climate from the Technical Summary. TS 5.1 page 62: )

So, if we examine the figure above, we see temperatures are expected to follow a trend of m= 0.2C per decade, where “m” is the trend, with some standard of deviation in temperatures.

What is not clear to me is whether the scatter about the trend is the uncertainty in the predicted underlying trend, m, only or uncertainty in the actual temperature. (The difference matters. Because weather add variability beyond the uncertainty in the trend, the uncertainty in temperature anomaly for any given year is larger than the uncertainty in the trend. For now, I will be simply assuming the use 2C/century. Later, if it becomes important, I’ll delve into the error bands. Their existance in the IPCC projections can be dealt with should they assume make-or-break importance in any particular hypothesis test. )

SRES basisSome of you might wonder if the 2.0 C/century depends on the SRES scenario. It turns out it does not. On page 69 of the technical summary suggest the variability in the trend, m, is small during the short term; the report itself says the prediction is fairly independent of SRES scenario.

So, in short: The IPCC projections are for m=2.0 C/century over the first few decades of 2000.

Why start validation/falsification in 2001?

The correct data to begin validation/falsification of recent IPCC projections truly is Jan 1, 2001. The principle I apply is to begin validation using data acquired during the first full calendar year after a document containing specific predictions is published.

When I first did the analysis to explain how one might hypothetically falsify IPCC projections I suggested that one must look at the projections in the report published in 2007, and so begin comparison in 2008. I thought this a reasonable principle even though the most recent IPCC report, published in 2007, contained “predictions” starting in 2001.

However, Ian Castles, a reader, readers set me straight about the publications data. He wrote:

With respect, I don’t think that this is so. The IPCC “prediction” in question is based on simulations which make use of the projections of future emissions specified in the Panel’s Special Report on Emissions Scenarios (SRES). This Report was approved and published by the IPCC in 2000. As the Panel decided at its Plenary Meeting at Vienna in November 2003 that ‘the SRES scenarios provide a credible and sound set of projections, appropriate for use in the AR4’, NO post-2000 data have been incorporated into the ‘prediction’ that you are seeking to test. It is therefore valid to use post-2000 data to test the ‘prediction’ for 2000-2010 in AR4.

So using my rule, I would begin validation with data collected in 2001. I also begin falsification efforts on that date.

Now, some may point out that failure to hindcast does suggest a model is invalid.
However, despite that, I prefer, other people’s predictions I prefer to select a common data both for validating and falsifying.

I do so to avoid cherry picking, even accidentally.

Why cherry picking is back and how I avoid it.

Suppose, instead of setting a rule for myself, I permit myself to pick any start date whatsoever for validating or falsifying the recent IPCC projections. What date do I select? 1978? 1998? 1494? And how do I select? I know if I pick 1978, I’ll find a statistically significant upward trend. If I pick the 1998 peak, I maximize my chance of finding a downward trend.

Some will quickly realize that if I wanted to cherry pick, I could just write a program and data mine arbitrarily picking start and stop dates; doing that, I could likely prove whatever I want.

The problem with this, form a point of view of statistics, is that I did straightforward, undergraduate level analysis to pick a criteria where, if I defined one particular experiment, and then collected data, I could set a criterion where there was a 5% α error.

But, say I do this: After doing the analysis, I look at one experiment, don’t get the answer I want and then pretend I never did that experiment. Then, I run another experiment, and another and another. I keep going until I get the result I want.

What do you think the chances are that once in 20 experiments I will get results that I expect to happen once 5% of the time by pure random chance? It’s fairly high!

The difference between getting a statistically unusual result in 1 experiment and getting a statistically unusual after doing 20 is the problem with cherry picking.

So, with respect to the recent projections in the most recent 2007 report: I will be validating or falsifying with respect to data starting in 2001 and no sooner. I will continue always including whatever data are most current and reliable.

Because cherry picking does happen (especially in politically charged blogs), I may sometimes note whether or not the conclusion is strongly influenced by start date. So, for example, if I find a controversial result by starting an analysis in 2001, I may see whether the same result is obtained if I include 2000 data, or start in 1999.

This sort of test is useful because I know that doubters are never convinced by conclusions that depend on a very specific start date even if that date is not cherry picked. If my results break down when I pick a different start date, I should admit this to doubters. It’s also important for those who agree with my results to know the results are very sensitive to choice of start date.

As for analysis methods: I may change data analysis methods, as I teach myself new ones, so as to deal with data problems I had not foreseen.

I will intend to always report what I found with the initial analysis proposed. That analysis was to test the IPCC report projections against annual averaged data for GMST, which I think presents the most robust test for validating IPCC projections of warming. Since no data source seems any more reliable than others, I will be generally compare to the major reliable data sources: GISS, HadCrut, RSS and UAH — or sometimes averages of the four. (In any case, if I don’t others will! )

I think this policy should allay fears of accusations of cherry picking.

What do the GMST results say right now?

I know you are wondering what results say now. You are wondering why I would post to allay fears of cherry picking! Skeptics all know January was cold. Maybe I’m sitting on exciting results? 🙂

Well, I have new results but they aren’t that exciting. I have a post written and published using password protection. It contains a new result, but the validation/ falsification conclusion is based on a statistical method I have not previously used. Someone with statistics background is reviewing what I have done to find out whether or not I used the method correctly.

That is: I am trying to make sure I didn’t just totally misunderstand how to apply a technique and get wrong answers for that reason.

Back in Early January, I would have just posted. I used to blog about knitting. That was a lot more peaceful. I didn’t worry about posting mistakes. You make a mistake. Sometimes you catch it; sometimes a reader does. You admit it, you correct it. Not a big problem.

But having read the the tone at some climate blogs this week, I now know for sure: climate blogging is not like knitting. I accept that I will post results people disagree and argue with. I’ll probably also make mistakes. But hopefully, I won’t forget to divide by 2 or subtract 1 when doing a Taylor series, if you know what I mean. 🙂

Updates:

March 10: Paul M. asked for links to the specific IPCC projections I am discussing. Here are:

  1. The page for the IPCC AR4 WG1
  2. The technical summary.
  3. Chapter 10.

Precise projections can vary somewhat from report to report. That is why I like to state the specific hypothesis I test, when testing. I’ll be trying to remember to add links to specific reports more regularly, but if I don’t, do ask. It will help with clarity.

11 thoughts on “What Are The IPCC Projections? And How Not to Cherry Pick.”

  1. The problem I see with your argument is that the IPCC keeps publishing new reports. Once you have collected enough data a new report with new estimate will exist. The argument will be even if you invalidate Ho that there is a new value and your result is irrelevant. Starting with a previous report (TAR) and it’s projections and using data that was collected after the report was completed doesn’t seem like cherry picking to me. You’ve specified at start date (1970), data set (GISS) and test set (end of report to ten years out). All are preset criteria before you do your test. At least you can verify the ability of the IPCC to prognosticate. It doesn’t prove that any new predictions will be wrong but it may bring into question they’re modeling ability or, or on the other hand, give some confidence to the following report’s projections.

  2. BarryW–
    There is no end date for testing and validating. There is a start date. For the current report it’s 2001 and on. So, I can keep testing forever. People may stop caring, but I can keep re-evaluating, and that’s not cherry picking.

    With respect to the TARR, I could test the TAR projections, say something about the TAR.

    In that case, I need to find out the precise dates when the TAR came out, and if there is any odd date for their basis, I need to use that.

    Is the TAR on line? And if yes, were their computations pinned to any particular date? Since the most recent report included projections from a 2000 report, I’m assuming the TAR was based on projections published in 1990. So, I could use 1991, and that would be fair. But picking say 1998 to test the TAR would be unfair unless the TAR was published in 2007, and the projections were purely based on documents not published previously.

    But I don’t have the TAR, or specific knowledge of the TAR. And I can only test a finite number of things at a time.

    Soo….. the reason I’m not checking the TAR projections is not related to cherry picking. It’s that right now, I’m spending my blogging time, checking the the most recent predictions. But, if someone else repeated the exercise for the TAR projections, I’d have not objections, provided they explain why they picked a certain start date.

  3. Sorry if I appear to be nagging. I only suggested the TAR because it dates from 2001 and much of the political arguments have been based on it and it uses data from 1961-1990 as a baseline with the same sensitivity range as the 1990 assessment (1.5 to 4.5C). So as of 2001 they were sticking with the projections from 10 years earlier if I understand them correctly. Thought it might provide a useful test case, but if not then I withdraw the comment.

    Yes it’s online. Here’s the reference I have:

    link

    The volume 4 F.3 Projection of Future Changes in Temperature has the information on the projections that I think someone would need.

  4. Thanks BarryW. I will be looking at it. You don’t need to withdraw your comment. The issue is that I have finite time, and I am looking at something quite specific right now. My tentative results– if not screwed up — look interesting. And I want to finish this particular analysis and post it before moving to a different hypothesis test.

    One thing that’s nice about this being my blog, is I know where to find all the links later. That’s why I started it. People bring me information once they figure out I’m ready willing and able to learn new statistical techniques, explain and apply them. 🙂

  5. Lucia, The increases in temperature projected in the TAR (2001), derived from emissions projections in the SRES (2000), were actually much higher than those in the SAR (1995) which were based on the IS92 scenarios. The TAR projections were for a temperature increase of 1.4-5.8 C between 1990 and 2100, compared with the SAR range of 1.0-3.5 C. According to the TAR SPM, the increase in the range was due “primarily to the lower projected sulphur dioxide emissions in the SRES scenarios relative to the IS92 scenarios” (p. 13).

    If you are able in due course to take up BarryW’s suggestion to look at the TAR projections, I’d advise you to follow his link to vol. 1 (“The Scientific Basis”) and then to Appendix II (“SRES Tables”). These Tables show, for each of the six illustrative SRES scenarios, projections by decade from 1990 to 2100 for (a) emissions of main greenhouse gases; (b) atmospheric concentrations of each of these gases; (c) the concomitant forcing for each of the gases; and (d) the resulting projected rises in temperature. There is not a great difference between scenarios in the projected warming between 2000 and 2010: all scenarios project a rise in mean temperature of around 0.2 C for this decade.

    In an article in “The Age” (Melbourne) on 21 February (“Dire new warning on climate”), Professor Ross Garnaut, an eminent economist who is advising Australia’s Federal, State and Territory Governments on emissions targets, is reported to have told the newspaper that “Recent rises in global temperatures … were at the upper end of what was predicted in 2001”, and that “The rate of change is at the bad end of what was identified as the range of possibilities.” These statements appear to be inconsistent with the following sentences from a news release by the UK Met Office on 3 January (“Global temperature 2008: Another top-ten year”):

    “The Met Office, in collaboration with the University of East Anglia, maintains a global temperature record which is used in the reports of the IPCC. The forecast value for 2008 mean temperature is considered indistinguishable from any of the years 2001-7, given the uncertainties in the data.”

    The temperature data are noisy, but none of the leading measures seem to show ANY warming since 2001 – let alone a warming “at the upper end of what was predicted.” Contrary to statements by the IPCC and Professor Garnaut, recent data do not at this point show CONTINUING warming since 2001. Does anyone disagree with this?

  6. Lucia,
    Please make it clear in your posts exactly what IPCC document you are talking about,
    since there are confusingly many different versions (not to mention the quiet updates
    to correct their blunders like the sea-level table where the numbers didnt add up).
    Is it AR4? And is it the Synthesis report SYR or the WG1 report?
    And are you referring to the SPM? etc etc – please do this to avoid confusion.

    There is a marvellous example of cherry picking in AR4 SYR SPM page 1.
    They say the 100 year trend from 1906 is greater than the 100 year trend from 1901.
    Why did they choose to go back 100 yrs? Just look at their graph on the next page.
    You can see there was a sharp drop in temperature from about 1900 to 1905,
    hence the higher trend starting from 1906.
    How stupid do they think their readers are? Did they think no-one would notice?

  7. Hi Paul–

    I agree. The IPCC is constantly coming out with new documents.

    The links to the documents I am discussing are:

    http://ipcc-wg1.ucar.edu/wg1/Report/AR4WG1_Print_TS.pdf for the technical summary. The title says

    “A report accepted by Working Group I of teh intergovernmentas panel on Climate Change but not approved in detail.

    and

    http://ipcc-wg1.ucar.edu/wg1/Report/AR4WG1_Print_Ch10.pdf For chapter 10. “global climate change projections.

    Obviously, if the final document approved in detail changed, that changes various conclusions. But, since this is the one I found online, it’s the one I’m using!

  8. Paul– On the issue of the question about “stupidity”… I’d actually suspect some of these changes in years just have to do with different authors using different years. It’s the sort of thing that happens when large groups work on thing together.

    But yes the magnitude of trends does change noticably depending on years selected. Obviously, any analysis must use a particular set of years. Equally obviously, when conclusions change as a result of small shifts in years, or indefensible choices in start or end years, then I always lean toward not making a conclusion.

    In my most recent post, evaluating the recent IPCC projections, some concrete conclusions would have been different had the IPCC made their projections in 2001, making the appropriate start year for analysis 2002. Earlier start dates change things too. So, of course, that makes me a bit hesitant to make strong conclusions. Best to wait and see what the weather does! 🙂

    (Obviously, uncertainty doesn’t prevent me from testing out an analytical method. It can always be repeated as more data comes in!)

  9. Pingback: Niche Modeling » Example of Simple Linear Regression - global warming trends

Comments are closed.