Rasmus posted a brief comment on On the statistical significance of surface air temperature trends in the Eurasian Arctic region, by C. Franzke, suggesting the analysis in the paper suffers from confusing ‘signal’ with ‘noise’. If I understand Rasmus correctly, he is suggesting that Franze’s autocorrelations for “noise” in the time series are computed based on the time series either without detrending or with improper detrending. Either mistake can result estimates for the autocorrelation of noise that show long term persistence when none exists. I discussed this in un-naturally trendy.
After reading Rasmus’s post — and posting two comments– I read Franzke’s paper. Having read Rasmus’s post, I expected to discover that
[Franzke] examined the autocorrelation functions (ACF) of fitted noise models with the autocorrelation of the temperature records, and found that the ACF for phase scrambling was the same as for the original data. This similarity is expected, however, due to the fact that the phase scrambling preserves the spectral power.
Update Jan 3: Rasmus’s description appears to be entirely accurate. Franzke did not detrend in anyway.
However, it seems to me that this is not precisely what Franzke did. The paper itself states
14] Thus, for the significance tests, I will focus on the cubic regression trends. The magnitude of a trend is defined as the range between the minimum and maximum value of the trend line which in most cases corresponds to the start and end of the time series. This is a robust definition because it is a very smooth function and variability on interannual and decadal time scales has thus been removed. The cubic regression is very similar to the EEMD trend and EEMD has been shown to be able to extract climate variability on interannual and decadal time scales [Wu et al., 2007; Franzke, 2009; Franzke and Woollings, 2011] and meaningful trends. Furthermore, defining the magnitude of the trend as the range between the start and end point gives similar results.
[15] After identifying the trends I have to assess their statistical significance. This has been done by examining how often they are outside the trend ranges of the ensembles of surrogate time series generated by the three null models representing the background climate variability of the respective stations. To create ensembles of surrogate time series I use a first order autoregressive model (AR(1) [Franzke, 2010, 2012]) as a SRD model, and an autoregressive fractionally integrated moving average model (ARFIMA(0,d,0)) [Robinson, 2003; Franzke, 2010, 2012] as a LRD model, were d denotes the LRD parameter. As a non-parametric way of computing surrogate data with exactly the same autocorrelation function I use the phase scrambling method by Theiler et al. [1992]. This method computes the power spectrum of a time series and then randomises the phase spectrum. Because the power spectrum is the Fourier transform of the autocorrelation function (Wiener-Khinchin theorem) randomising the phase spectrum does not affect the autocorrelation function (see Figure 1).
I think this means that separated the signal and noise and found the autocorrelation functions (acf) for his estimate of the noise. That noise would have been the detrended time series. Moreover, his estimate of the signal was a cubic. This means that he will have attributed some (and I would guess most) of the non-linearity in the true deterministic signal to “signal”– thereby keeping it out of his estimate of the noise.
While it is true that any deviation of his estimate of the signal from the “true” signal could affect his estimate of the acf, it seems to me Franzke took quite a bit of care to remove this signal from the noise.
If so, Rasmus’s criticism which hypothetically could be valid– appears to be off the mark. Rasmus’s criticism would be valid if Franzke’s acf is computed from time series data that (a)are not detrended at all or (b) are detrended using a straight line (when we anticipate warming is non-linear) and it might be valie if (c) detrending with a cubic is insufficient. But it seems Franzke’s him(her?)self has not confused signal for noise and. Moreover, detrending with a cubic — as I think s/he did appears sufficient to prevent– or at least reduce– the degree to which non-linearities in the deterministic signal would likely inflate estimate of power in the low frequency components of the noise.
I admit I’m not entirely certain of whether Franzke detrended. If s/he did not then Rasmus’s criticism is valid. I’ve emailed Franzke asking for clarification and if possible the processing code to see whether I have misinterpreted what s/he might have done. But the text suggests efforts to separate the signal from noise.
Why would Rasmus not write a comment to GRL?
.
Maybe because Fraznke could then reply? We’ve seen this sort of thing before in the UEA emails about ‘objectionable’ papers.
.
Actually, I am surprised Lucia’s comment passed muster. I guess that is progress at RC.
My comments are generally approved over there. One once got lost– but maybe I screwed up. They’ve had various methods of entry– maybe way back when I previewed and thought I submitted.
The real issue is: Is what Franznke did ok? Reading the description of what s/he did that appears in the paper, it looks ok. OTOH, if he’d done what I think Rasmus says was done, then.. the method would have ‘issues’ I think.
The think is, I don’t think Franzke just “fitted noise models with the autocorrelation of the temperature records” is an accurate description of what was done. That description fails to capture an important step which is that Franzke’s method involves a step to separate noise from signal– and Rasmus’s description doesn’t seem to give credit for that.
I am always surprised that 1) descriptions are written so badly you don’t actually have any idea what specifically has been done and 2) why the hell they don’t run a simulation that matches the actual, based on whatever deconvoluted constants they arrive at.
What I don’t know is what is the point; What EXACTLY is the hypothesis that is being tested?
Doc–
I don’t know what you mean by “a simulation that matches the actual, based on whatever deconvoluted constants they arrive at.”
I think the null hypothesis being tested is “The difference between the maximum and minimum for the polynomial fit during the time intervals corresponding to time series = 0”
I suppose what he did for the AR1 case was:
1) Fit the cubic to the time series data. That polynomial has a max and min in the time series intervals. Record that for later as dT_obs for later use.
2) Find the residuals between the data and this fit.
3) Find the AR1 model that best corresponds to the residuals.
Then start to do montecarlo which involves running “N” cases doing the following for each case:
i) Generate a time series using a random generator that creates AR1 noise with the properties from (3).
ii) Fit the cubic to the time series.
iii) Find the difference between the maximum and minimum for this cubic. dT_i
iv) See if this difference is greater or less than dT_obs found in (1) above.
After running N montecarlo, find the fraction of cases where dT_i > dT_obs. If that happened in less than 2.5% of cases, then you decree warming was statistically significant.
He did this for each station.
Then he did something similar for the other noise models.
Of course, this doesn’t entirely tell me every specific computer module he used– but I think that’s what he did.
Mind you, I think Rasmus is suggesting Franzke found the AR1 data that fit the data itself rather than the residuals obtained by subtracting the best fit cubic. But I don’t think that’s what Franzke’s paper says he did.
Note: Using R, it’s possible that a package was used that permits one to find the best fit cubic under the assumption the residuals to the fit are AR1. So, the method could be somewhat similar to what I describe–but done in one fell swoop. But in that case, the paper would still be using residuals to estimate the acf.
lucia, may I as a dumb question here. If I have 100 data points can I make an estimate of the s/n ratio by looking at 100 points, 50 pairs averaged, 33 triples averaged, 25 quads averaged.
Doc–
I’m not sure. I don’t know what you are asking me. It depends what “signal” is anyway.
I read the Christian Franzke paper (GRL); I think it is pretty clear that he subtracts the best fit cubic function before determining the constants for several noise models. He then goes back and tries to determine if the non-detrended station data can be reasonably explained by the various noise models. I think he could have been a little more clear in his description of the process.
.
Anyhow, I don’t think it is a very good paper because he considers only if individual station trends are significant based on the noise for each of the individual stations…. that strikes me as a very weak test. A stronger test of statistically significant warming might be the pooled trend for a much smaller number of geographically distant stations (say more than 500 Km apart). If the pooled data trend is significant against the different noise models (based on the detrended pooled data), then that would (I think) be a stronger test of a real trend.
.
Lucia,
Any email reply from Franzke?
No reply. That said, (a) it’s the holidays and (b) for all we know he is in antarctica. Seems to me either can cause delays in answering emails.
I’m limiting my comment on other issues until this one is clarified. In principle. there are lots of things one might try to do to improve power or say more. But in practice– I’m not sure. But yes–pooling is one of the things one might do. The tricky thing is spatial autocorrelation.
Obviously, before doing that, one wants to know if he made the mistake Rasmus suggested he made. You and I agree the text suggests Franzke did not make that mistake.
Happy New Year to all from the Florida Keys.
Happy New Year from the Chicago Burbs!
If we really knew how much the signal is and how much the noise is there would be a lot more certainty. So hats off to them for giving it a try.
SteveF Ditto 🙂
Best New Year Wishes for 2013 from Houston.
Thank you, Lucia for this blog site and the work you do to run it so well.
May 2013 be your best yet.
Happy lukewarm year to all!
SteveF, dallas,
ditto once more,
john
Best wishes from Oxford MS for the new year!
Regarding Franzke’s letter….Is he actually computing the trend for each time series over the available range of data for that series? That smacks me as problematic at best. (Also computing the trends for individual stations not surprisingly leads to bloated uncertainty ranges compared to e.g. a regional average.)
Anyway, this paper could serve as an examplar of everything that is wrong with climate science. He references a data set that now appears to be “stale” (Original link appears unaccessible). Methods are poorly explained and no code is provided. A table of results including stations IDs and numerical values of trends for each method would be helpful as an SI, but is missing. Without that, this study is essentially unverifiable and unreproducible.
You just shouldn’t need the good will of the author in order to replicate his work.
I have scanned the C. Franzke paper, and while I would agree some essential details are missing, the approach used sounds promising to me and touches on issues that have been discussed at the Blackboard. The failure to find the expected plethora of significant deterministic warming trends at that global zone might have more to do with the station data used than the methodologies that Franzke applied.
Franzke talks about the relatively short term ARIMA model and the longer term ARFIMA model as paradigmatic null models and these models capabilities to generate relatively lengthy stochastic (as opposed to deterministic) trends. The author apparently makes use of the acf to determine a fit to the expected pattern of the model. The relatively short temperature records that would be available in the global zone north of 60 degrees make testing an ARFIMA model from the observed data with a d value >0 and <0.4 neigh onto impossible and further even an ARIMA model.
I would doubt that Franzke would have not used the regressed residuals for modeling, but that very calculation seems somewhat contradictory to me. If you assume a deterministic trend and then detrend the data on that assumption and then find that the model of the residuals shows that trend to be insignificant what is the next logical step or what is the meaning of that finding? It would appear that you have detrended a stochastic trend that should have actually been included in the residuals for determining the best model.
I have not read in sufficient detail but has the author used the acf properties of the observed data to develop/characterize the models used for simulations?
Been outta the loop for a while but when do we expect to see any annual anomalies posted for any of the major indices (obviously, not for a few days yet if not quite a bit longer?)
So I spent a bit of time digging into this. The data appear to be based on the Klein Tank data set, the updated link for this data set is located here:
European Climate Assessment
You can still use KNMI interface to access the data, but you can also pull the data straight from this site (though you have to register to do this, which is free).
It does appears to be set up somewhat better for research usage than the KNMI site, for example you can define a custom ensemble to download, and the data are provided in ZIP format, so the file size is much smaller for down loads.
(One can write scripts to pull data sets off it, but it appears that what you end up doing is pulling data to the KNMI site from the ECAD site in this case and then downloading them individually to your computer from there…of course this is much more time consuming to do).
Kenneth Fritsche:
I’m sure one problem is individual stations have a huge amount of noise. The detection threshold for that is a much larger trend than for an ensemble mean. There may be other issues too.. for example, coastal sites are generally going to have a smaller trend than inland ones.
If he isn’t using a consistent measurement period for the trend (e.g., 1980-2010)—as appears to be the case—then it becomes even more problematic to interpret what the absence of a trend means for a given site.
For example there are sites in the United States that were as warm during the 1940s as now, and, if the data were available from that period through say 2010, you might not find a statistically significant warming for that site. A second site that only “came on line” in 1980 on the other hand might will show a net warming.
That’s just a bit odd if I’ve interpreted
the tea leaveshis article correctly.SteveF (Comment #107939)
December 31st, 2012 at 1:13 pm
“I read the Christian Franzke paper (GRL); I think it is pretty clear that he subtracts the best fit cubic function before determining the constants for several noise models. He then goes back and tries to determine if the non-detrended station data can be reasonably explained by the various noise models. I think he could have been a little more clear in his description of the process.”
Steve, please bear with me here as it is after all the day after New Year’s Eve. Franzke uses a model to do 1000 simulations and then calculates trends from that model to determine whether the observed trend falls outside the 2.5%/ 97.5% range. If it does, he consider the observed trend to be deterministic. My question is how does he arrive at the models he uses for the simulations? If he uses detrended data and models the residuals, but later finds the trend was not significant then is not Franzke required to go back with at least another iteration and model without detrending.
Kenneth–
No. The standard method for testing the deterministic signal is to estimate noise using residuals from the best fit deterministic signal. The alternative is circular.
Think about this: Suppose you really do have a linear trend with very little noise. Now, estimate the “noise” *including* the linear trend. You will never reject the null of no linear trend. In estimating noise using the residuals give the correct frequentist outcome: false positives at whatever rate you select– and reject the null when it is wrong.
You can do a bunch of monte-carlo simulations to test this.
An example of what I was talking about is Greenville, SC.
If you go to the NCDC “US Climate at a Glance” then click on Greenville, you’ll see a negative trend for that data series. Imagine the series started in 1980.. in that case you’d see a positive trend. There are plenty of other temperature series where the trend is small enough compared to the noise that it wouldn’t be statistically significant.
I think you pretty much have to select the same time interval when comparing trends in climate, otherwise the comparison is meaningless.
Carrick’s missing link to European Climate Assessment in #107968 is this.
“You can do a bunch of monte-carlo simulations to test this.”
I may have to. Perhaps a residual from New Year’s Eve is interfering with my ability to see why if I truly had an ARIMA model or an ARFIMA model with a stochastic trend I would obtain the same model if I removed that trend. Now I understand that if that trend was shown not to be statistically significant it might not be large enough to interfere much with obtaining the “correct” model if it were removed, but it would change the modeling data and change it to something it is not.
I do know that I can simulate very long ARFIMA series that contain trends over an extended period of time that appear as deterministic trends (when, of course, we know they are stochastic). If you have a short series like the instrumental temperature record, it would be difficult to impossible (and probably impossible) to obtain a meaningful ARFIMA model for the observed data.
I truly think that without an independent method of determining a model for the observed data, one has to make some assumptions when modeling it.
Does he generate his surrogate series Di using daily station data D
1. from D using “the phase scrambling method”
2. from de-trended D by some method or another.
It seems to me that he says (1) (see para 16). On RC he is criticised for this by Rasmus and Bo Christiansen. But if that is a mistake it seems too elementary. So maybe 2 is correct, or maybe it is not a mistake at all.
He has a paper (2012) referenced but paywalled (he is a ‘he’). Without that I don’t see where you go next.
Kenneth–
It’s always true that you have to make assumptions when modeling data.
It’s also true that you can get apparrent trends in synthetically simulated data that has no true trend, and that if the noise is ARFIMA, those trends can be ‘large’ even over long time periods.
But I think the question here is: What’s the proper way to set up your model to *test* for the trend? Three minimum requirements are:
1) If you picked the “correct” statistical model to estimate the noise, and you apply your method to “trendless” data, you will get false positives at the rate you claim. (Often this is 5% false positives.)
2) If you pick the “correct” statistical model *and a trend exists* then you *will* reject the null of “trendless” at some rate that is higher than your false positive rate. (That is: your *correct positive rate* will exceed your false positive rate.)
3) As you get more and more data that contains a trend, your ability “correct positive rate” will increase.
I’m pretty sure that if your method of estimating the “noise” assumes that *everything* is noise, your method will fail on points (2) and (3). In fact, your “true positive” rate might be *lower* than your false positive rate!! That is not a useful method *for testing whether a trend is statistically significant*. Because whatever method you use, it needs to be at least hypothetically possible to reject the null of “no trend” when (a) a trend exists and (b) you did correctly specify your statistical model for the noise.
This has to be the case even though it is difficult to correctly identify the statistical model and even though in a real application you want to test, the true trend is not known and so on.
Franzke posted here:
http://www.realclimate.org/index.php/archives/2012/12/what-is-signal-and-what-is-noise/comment-page-1/#comment-312405
(And sent me an email.)
I remain uncertain whether Franze’s acfs are based on the full data series or whether he tried to tease out some of the noise. I’m mostly just puzzled.
Lucia,
From my brief reading of the Franze paper (and the subsequent comment at RC), the time series has not been detrended prior to taking Fourier transforms. The null hypothesis is that all the variability is noise, so that the observed trend is simply what you get from a cubic regression on the noise. The random phase “realizations” are then used to establish how likely it is for a time series with this particular ACF to exhibit such an apparent trend. If the true trend is “extremely unlikely,” then the hypothesis is rejected, and the author concludes that there is a “statistically significant” trend which is not caused by noise.
Oliver–
If you are correct, then I think Rasmus is correct.
I *think* doing it that way will suffer from violating (2) and (3) above. When one does a garden variety test for a linear trend, you estimate the noise as the residuals from the trend in the data. Then you test whether that trend would be statistically significant relative to what you can get from noise measured by the residuals. Then, if you reject the linear trend on this bases, for subsequent analyses, you use the full series as “noise” because… well.. you just diagnosed that apparent trend as not being deterministic.
You can– of course– do residuals to more complex fits (e.g. quadratics cubics and so on) and test whether those higher order terms are statistically significant. But in all cases, when testing the terms for the part you suspect might be deterministic, your noise estimate is based on residuals obtained under the alternate hypothesis.
lucia,
You are right, and if it had been just the paper, I might have assumed the residuals are what made it into “the three null models representing the background climate variability of the respective stations.” (based on [7]–[8],[15]) The comment at RC seemed to indicate the other possibility, however. Did Franze provide any different details in the email, relative to the RC post?
Addendem:
On the other hand, from playing with some cubic polynomials + random noise series, it seems really hard to generate an ACF that looks anything like Franze’s Fig.1 (the x-axis is labeled “Time in days”!) unless the cubic regression is removed from the time series first!
(Sample test setup used a synthetic time series of form f+e, e.g., e = Gaussian noise with SD=3, trend is f = 2*(t/100)^3, t in years, or a cubic trend that of 2 degrees rise in 100 years).
Oliver–
No. He wrote this:
I had asked for the code– if it was in R. But… of course, I realized that even if it’s in Matlab which I don’t have, I can still read it and that would be enough to be certain.
But you are correct that what he wrote makes it seem he did not base his noise on the residuals.
I just can’t wrap my head around someone testing the any feature of the alternate hypothesis using ‘noise’ under the null. I think that’s never done– though I could be wrong.
Here’s the thing to ponder: Not even a trend. Simplest possible problem.
Suppose we have a population of X which is just a gaussian and white, and the true mean really is m=1. Moreover, the standard deviation about the mean is 1/2.
But if we were to test a null of m=0, and estimated the noise around m=0 as if that is the known mean, we would get a standard deviation of sqrt(1^2 + 0.5^2) = 1.12. This is happening because the mean is contributing to that.
But all the frequentist statistic tables are set up to work of we estiamte the sample standard deviation by computing about the sample mean and then using “n-1” degrees of freedom to correct for the fact that we computed about the sample mean– which is not the true mean. That will tend to give is the correct value 0.5.
If we use 1.12 as our estimate of the standard deviation (aka ‘noise’) we aren’t going to get the proper false positive rates.
I guess if we have enough data, we will eventually reject m=0 consistently. But it seems to me we end up having more false positives and lower power than we ought to have.
I do not think English is Franzke’s native language. In spite of reading the paper and his comment at RC, it remains less than 100% clear to me if he did or did not remove a fitted “deterministic trend” and then work with residuals in determining his noise models. I still think he did, but I am not sure. He REALLY needs to clarify the method used.
SteveF–
It’s a shame the reviewers didn’t force him to clarify.
Lucia,
A meaningful review takes a fair amount of time, especially if a paper has problems. An excellent review of a paper with problems will point out specific changes which will improve the content and clarity (effectively tell the author(s) how to fix the paper), and can represent a substantial intellectual investment, for which there is no compensation nor credit…. so getting people to do them is maybe not so easy.
SteveF (Comment #107939)
December 31st, 2012 at 1:13 pm
“Anyhow, I don’t think it is a very good paper because he considers only if individual station trends are significant based on the noise for each of the individual stations…. that strikes me as a very weak test.”
———————————————
You and Tamino would seem to agree (http://www.realclimate.org/index.php/archives/2012/12/what-is-signal-and-what-is-noise/comment-page-1/#comment-312422)
An alternative method of testing the hypothesis that a trend is deterministic is to assume a model for the temperature series or find an independent method to at least characterize the general type of model that might be appropriate for the data. The AFC can be used in that characterization or some other spectral analysis as I think Franzke suggests in his paper. At RC Alexander Harvey talks about the fractal character of an ARFIMA model. I have read some papers and articles pertaining to these characterizations, but I totally lack the experience to determine whether this is what Franzke did or if it is really appropriate.
You then simulate the assumed or independently derived model many times to determine how frequently a trend as determined from the observed data would occur in the simulations. The Cohn and Lin’s paper as I recall was criticized by Lucia (rightfully so in my estimation) for assuming a deterministic trend would be linear amongst other things like a spectral analysis not fitting to an ARFIMA model. We talked about using a polynomial model to fit that trend in that thread as I recall. I went back to that thread and I am not at all certain that Cohn and Lin did not assume models for a temperature series and do what I suggested here. I need to go back to their paper to be sure.
The weakness of the use of a “detrended” series is where you could be detrending a stochastic trend and not a deterministic one. Alternatively, we can assume a trend will always be wholly deterministic and then also assume its shape. We then feel good about detrending and modeling the residuals. As Franzke notes in his reply at RC, the intermediate case is where you have an overall trend that can be partly deterministic and partly stochastic. Those trends could even be in opposite directions.
The problem of nonlinear “determinstic trends” in most (non-periodic) geophysical time-series is largely one of intellectual hubris in the face of severe paucity of data. With sufficiently long time series, naive notions of a forced signal hidden by random noise evaporate upon the realization that the “trends” are merely available segments of very low-frequency components of processes with variously structured continuous power densities.
Owen (Comment #108002),
Tamino and I agree? Well there is a first for most everything. I am waiting for him to agree that climate sensitivity has to be lower than the canonical 3+C per doubling…. but not holding my breath. 😉
SteveF
I have a feeling we are all going to have to go see if a blue moon is scheduled for this month. Because… I too don’t think you should test whether the trend is statistically significant estimating the noise w/o the trend.
I thought about the precise problem and realized that I may have explained things incorrectly to Kenneth F. The problem is: not detrending reduces the statistical power of the test. This can be shown. I’m not going to go on the line for the fouriet tranform method as I haven’t specifically done it. But I know that analytical choice reduces the statistical power of the test for no good reason if you are using AR1, ARFIMA or any similar test. You will in limits of loads of data get the right answer (I think)– mostly. But you will also reduce the power– and for no good reason.
Next blue moon is not until August 2013…. http://en.wikipedia.org/wiki/Blue_moon
Kenneth
No. This is not a weakness when you are testing for the trend. If someone is testing a least squares linear fit, the fact the residual to the straight line were estimated by detrending is accounted for with the reduction in degrees of freedom. Consider the classic least squares regression where the residuals to the best fit line are white noise. In that case, if you have ‘n’ residuals, you estimate the standard deviation of the residuals by using “n-2” in the denominator. The “2” takes care of the fact that the best fit trend involved computation of a trend and intercept each of which might actually be zero. This computation gets you the correct estiamte of the noise whether or not the trend and intercept end up being zero.
There is no “weakness” associated with detrending. It’s what one should do.
Moreover, you would find by detrending you create a method with higher statistical power— which is what you want.
lucia,
Matlab code should run on GNU Octave, which is free.
OTOH, it’s yet another compiler/interpreter to deal with and ‘easily portable’ may be an overstatement for a newb.
Dewitt–
I have the code. But now, Christian clarified. His point three says:
He is mistaken. You can use detrended data. Using the residuals based on the alternate hypothesis to test the null is not only permissible, it is standard. The reason it is standard is that using the residuals from the alternate hypothesis results in a test with higher statistical power. In this particular discussion residuals from the alternate hypothesis involve some sort of detrending. The best alternate hypothesis and the best method to detrend could be argued. But not detrending will certainly result in a test with lower statistical power than the alternative, and you don’t want that.
There is no disadvantage to detrending.
It is a shame the paper didn’t report the results with detrending and simply omit the ones that are in fact reported. Had the results based on detrended data been reported, we could move on to other issues.
As it stands: Rasmus is correct. The “noise” is estimated in a way that will contain signal if it exists. This could have been avoided — or mitigated– by detrending.
DeWitt:
It is. Unless you test your code on both interpreters, there’s no guarantee any given code will run on one interpreter, written on the other. And if you aren’t a MATLAB guru, you won’t be able to get it to run either, well, that’s been my experience anyways.
The other common problem is that MATLAB is licensed piece-meal, so a package in MATLAB may depend on a module that you don’t have a license for. On the other hand, OCTAVE is a much more limited environment, so a substantial fraction of available MATLAB features are missing outright if you choose to develop on OCTAVE.
When text is unclear, code can be useful even if one can’t run it. You can read the code and notice whether a step like “detrending” is in the code. My goal wasn’t to replicate or audit, but to resolve what appeared to be an ambiguity.
But it appears Franzke did not detrend. At. All. Rasmus was correct. And I agree with Rasmus that not detrending is sub-standard. Specifically: it reduced statistical power of the test. That’s a very bad thing.
I’m willing to believe Christian found similar results when he detrended– though I would be surprised if he didn’t get a slightly higher level of “rejects” with detrending. Not necessarily much higher– but a little. It’s unfortunate his paper doesn’t discuss the detrended results. We’d still have to speculate on what finding lots of “fail to rejects” means— possibly not all that much. But at least those “fail to rejects” would be based on detrended data.
Re: Carrick (Jan 3 09:05),
That’s always been one of MATLAB’s most unattractive features to me.
Lucia, I think Franzke is basically trying to retain low-frequency information in his null hypothesis, and if you simply detrend the data, you won’t be able to test for that.
If you compare individual stations, you’ll find that the trend of a given station can be substantially different than another nearby one. This does suggest there is a large uncorrelated, low-frequency component present in the station data.
Detrending each station individually will certainly remove that stochastic low frequency component so Franzke is right about that.
(It is certainly the case that if you allow a large potential noise source into your fit, that the power of the test will be reduced. But that is “as it should be.” You can’t make Nature better than it really is.)
DeWitt:
Agreed. If you use site licenses, these only go for one year, and expire at the end of it, so if your institute drops its license you end up with code that you have to purchase an individual license to run, or you can end up with unusable code.
I personally use it to prototype with (along with Mathematica, which frankly I use more often). However, I have collaborators who collect and analyze data using MATLAB, so having some fluency with MATLAB is a requirement for me. (It’s pretty common for me to have to translate analysis code into MATLAB so they can run it.)
I’ve also been forced to learn the differences between it and OCTAVE. None of this is pleasant stuff to have to deal with.
Re: Carrick (Jan 3 09:29),
Sort of like buying a car and just getting a frame. Oh, you wanted something you could actually drive? That will cost extra.
Carrick–
Maybe. But with regard to AR1 that’s certainly not necessary. You already have a spectral shape chosen once you’ve specified AR1.
But in this case, the power is reduced for no good reason. While I understand that you are limited in detecting low frequencies– well you just are. I’m pretty sure you can just run the monte-carlo. You need to do the sets to look at false positive and false negative. The results will be:
1) With respect to false positives (i.e. “reject the null when its true”) detrending will make no (little?) difference. You’ll get exactly the same rate of false positives. (At least I think so… But maybe we could test that. )
2) With respect to false negatives: When you don’t detrend, you do get more false negatives than with detrending.
The claimed reason for not detrending is that you might get too many false positives. If you do get more false positives (or more specifically, more than claimed– since you specify the rate.) But…well… maybe that’s something that needs to be shown? And of course, even if detrending results in more false positives, to motivate no-detrending, one would need to we would need to demonstrate that not detrending fixes the problem– and that there is no other simple fix that does not result in lost power.
(After all: We can ‘fix’ the problem that the estimate for lag-1 autocorrelation is biased when we assume noise is AR1 while still detrending. So, we don’t need to sacrifice power the amount we would by skipping detrending– we just fix the problem that exists.)
This is why I like Mathematica better. You can start with a single-pole system like this:
$latex R_1 q'(t) + q(t)/C = V_0 + V_1 e^{-t/\tau}$
and solve this with Mathematica with a one line command:
Solve[{q'[t] * R1 + q[t]/C1 == v0 + v1 Exp[-t/tau], q[0] == 0},
q[t], t]
and end up with this in just a few steps:
$latex \frac{{C_1} \left(\tau \left({V_1} \left(e^{-\frac{t}{{C_1} {R_1}}}-e^{-\frac{t}{\tau }}\right)+{V_0} \left(e^{-\frac{t}{{C_1} {R_1}}}-1\right)\right)+{C_1} {R_1} {V_0} \left(1-e^{-\frac{t}{{C_1} {R_1}}}\right)\right)}{{C_1} {R_1}-\tau }$
Of course this can be cleaned up from here, but you can get a lot of insight from seeing closed form relatively well simplified results that you can’t get e.g. from MATLAB code.
Carrick,
I think the simple solution is to average stations which are not so close as to suffer too much regional correlation and look at the average trend. This will drastically reduce both the high and low frequency stochastic trends and make the analysis more powerful. Looking at the trends for individual station data is the kind of thing some at WUWT are fond of doing (for which they are rightly criticized); it can only cause confusion.
SteveF–
Yes. If one wants to detect signals that are present, it’s best to pool results.
Carrick–
I’m pondering the issue of loosing the low frequency components. Clearly, this is always a problem. I just don’t think that pulling in the mean trend when it is there is ‘the solution’. I know it isn’t if one has already posited a functional form for the spectrum– and specifying that the autocorrelation is AR1 does that. Even specifying ARFIMA does that (though in a more complicated way.) On the one hand, we know if we run montecarlo casesa for AR1 noise and analyse short data sets, we under estimate the autocorrelation. But we can also fix that up using an iterative method where we find the error in the lag 1 autocorrelation given the value we got. That would permit us to ‘fix’ the excess false positive rate without undue loss of power that arises if we don’t detrend. We can do similar thing with ARFIMA and– I think– any situation were we have made an assumption about the functional form of the autocorrelation (and so the functional form of the spectrum.)
There is a big of an issue if one is computing the spectrum and then only using the finite number of fourier components on actually obtained. I can see how– in this case– one might think it’s important not to lose the lowest frequency cases.
It seems to me that rather than suck in the mean trend into the estimate for the spectrum, it is better to posit a generic functional form for the spectrum– based on some parameters– fit to your residuals and then use that fitted form. That’s analogous to specifying something like AR1– but you just specify in spectrum space. This way you would essentially be filling in the low frequency modes based on the modeled form of the spectrum. But the parameters for that model would come from the residuals to the alternative hypothesis.
SteveF, you’ve got to the heart of the matter, which is “does what Franzke is trying to test for even sensible?”. I happen to agree that it probably isn’t. If you want to worry about loss of power, using individual stations itself is the biggest “infraction” here.
Of course that doesn’t stop people from contemplating the right way to do such a test. 😉
Lucia, let me try putting it another way, I think this is the model that Franzke is trying to study. For station “k”,
$latex T_k = T_0 + (\alpha_0 + n_{\alpha,k}) t + n_{hf,k} (t)$.
Here $latex n_{hf,k}$ is the usual “high-frequency” stochastic noise process with zero mean we usually think about and $latex n_{\alpha,k}$ is e.g. a Gaussian random number (we can assume $latex n_{\alpha,k}$ is uncorrelated between neighboring sites).
If you detrend individual series $latex T_k$ you are clearly going to be unable to test for the influence of a nonzero $latex n_k$ on the statistical significance of $latex \alpha_0$.
I think we can assume it’s a “given” that both you and I disagree with the approach used by Franzke. The question I have is … how does one clean this up?
Carrick
I suspect we cross-posted because my comment at 10:05 is pondering this.
Clearly, we don’t want to just lose energy at a frequency if it’s there. And clearly, we can’t know. if the power at that lowest resolvable frequency appeared because it’s in the mean or because it’s part of the “noise”. So… what to do?
Well.. lets go back to naturally untrendy. In that case, I’d created series that were cubic+AR1. Since I created them, I knew what was noise– and I could look at the spectrum:
The lowest frequency point I can resolve is on the left– but we can easily imagine that correct value at even lower frequencies. Because we know the model for the noise we know we would be correct.
But if I detrended using a straight line, (not cubic) the spectrum looked like this:
In this case the spectrum flatted toward the left, but then shot up again. That “shooting up” is of different character– and one might speculate just by looking– that it’s the mean trend bleeding in. So, if we got something like that then by “eyeball” we might do propose an adhoc spectral model that was flat to the left.
This isn’t entire satisfying. But it falls along the line of estimating the power at lower frequencies using a model. If we do that, the fact that we lose 1 frequency by detrending isn’t all that important. Because we would recapture that by fitting to a model in the spectral domain. (In fact, we can recapture more because we can even include energy at lower frequencies by extrapolating using the model. That’s essentially what we do when we use models like AR1, ARFIMA and so on.)
If I wanted to generate random instances of $n_{\alpha,k}$, here’s the approach I’d start with.. I’d compute trends of individual stations in the ensemble then histogram them. For Franzke’s analysis that’s impossible because I don’t have a list of stations, but for reference here’s a histogram from GHCN V3.1:
figure.

If I were to Monte Carlo this data, I’d subtract off the mean temperature trend and convert this histogram into a temperature trend cumulative distribution, like this:
figure.

You can use this curve to produce $n_{\alpha,k}$ by generating a a uniformly distributed random number “u” in the range 0 to 1, then map “u” onto the associated temperature trend. (You can get a continuous function trend(u) from the summed histogram by using e.g. cubic spline interpolation.)
Then I’d model the high-frequency noise process exactly as I have done it in the past, and use the noise model $latex n_{\alpha,k} t + n_{hf,k}(t)$ to do my hypothesis testing.
[You’d have to do something more sophisticated if $latex n_{\alpha,k}$ were correlated between neighboring sites.]
Lucia:
Yes I see this now. (*channeling lazar*) I was like a dog with a bone at that point.
Somehow, I’m reminded of a Tamino post from a while back about how a trend in the noise could obscure the “true” underlying trend. 😉
I find the topic of discussion of this thread very interesting and mainly because it broaches on subjects and notions I have been pondering for awhile. I agree mainly with Carrick here and particularly on the notion that the noisy station data would be difficult to fit with consistent models to the observed. I like Franzke’s approach but I am thinking that he is perhaps not sufficiently familiar with temperature data sets and local effects on those sets. Comparing short and long series is a problem and so is the reliability of data going back passed 1920 and to a lesser extent even later than that date. Long temperature series can also show heteroskedasticity, although that characteristic often arises from a collection of regional data that becomes spatially more sparse going back in time.
If nothing else this discussion shows the assumptions that are made and required in analyzing and modeling any temperature series.
Carrick–
I was like a dog with a bone at that point.
You were probably typing while I was. The page doesn’t update until you refresh. But yes– I think the question is how to deal with the issue of losing stuff at low frequencies. I think Franzke is thinking along the lines you and Kenneth are.
Mind you: I *see* the issue. I just don’t think the solution is “not detrending”. It has to involve modeling– or making an assumption– but assumptions are already made.
Kenneth,
Independent of methodology issues, I just don’t think Franzke is saying much of anything interesting. Everyone (almost everyone) already knows that single station variability easily hides a regional or global trend, even if that trend is very significant. So what new does Franzke contribute? I don’t see anything very interesting.
I cannot resist making a remark on this statement by SteveF
“…can represent a substantial intellectual investment, for which there is no compensation nor credit…. so getting people to do them is maybe not so easy.”
Yet, in the interest of saving all mankind from himself, one would think the investment would be worth it. At least commensurate with the investment I am being asked to make towards said salvation.
Kan,
Not just in climate science, in every field.
.
But you don’t have to worry about having to pay a carbon tax based on anti-sense protein research or nanotechnology research. The green/left/Malthusian nexus has made climate science too politically important for the public to ignore the quality of the work being done in the field.
Roy Spencer has posted the Dec. anomaly. 0.202C
Lucia:
I absolutely agree that not detrending solves nothing. You have to model it, but the question is “what is the right way”?
I’m worried now that what I call $latex n_{\alpha,k}$ probably has spatial structure in it. To get that right takes some thought.
To make the result meaningful would take more: I think you’re left with “individual stations have a larger noise than the mean of the ensemble, making it harder to detect a signal with a given amplitude… hm, so what?”
Carrick, “what is the right way?” Well, could there be no “right” way?
WUWT has another “bombshell” post, http://wattsupwiththat.com/2013/01/03/agw-bombshell-a-new-paper-shows-statistical-tests-for-global-warming-fails-to-find-statistically-significantly-anthropogenic-forcing/
From my perspective, as may ways a possible to double check is the only “right” way.
Carrick–
As for the general conclusion of “hm, so what?” I agree. I think the issue is more interesting as a way to figure out what we think about how we would deal with it when applying it to another issue: The global temperature. It’s sometimes useful to discuss the technical issues as applied to a less momentous subject. After all: If individual stations are just too noisy…so? That doesn’t mean we reject AGW.
OTOH: If this method is legit, why not apply it to global temperature? I wouldn’t not detrend on that. I think you need to detrend that. But the potential for failing to detect the low frequency oscillations persists there. So– if you need to deal with that for station data, then one should deal with it for the global anomaly. So, thinking about how to do that on this more hypothetical problem were none of us are especially invested in the specific outcome can be useful.
dallas–
But some ways are wrong. And someways are sufficiently suboptimal as to not be worth the time.
I’m not entirely sure that paper is a bombshell. I don’t entirely understand what is done in the technique called co-integration. I need to read through the mathematics in 2.4 of that paper to see if I can better understand it. (We’ve gone over this stuff before. My recollection is in the past cases the visitors discuss this tended to not want to explain the math– but here it appears it might be done.)
But returning to my not being sure that’s a bombshell, I read this in the paper
What does that mean? And is there some reason the latter should not be cointegrated with temperature and solar irradiance? If there is no reason, why shouldn’t we conclude AGW is corroborated?
(Real questions btw.)
There also seems to be quite a bit of qualifying about steady state vs. transients and so on. So… I need to read more to know if this is or is not a bombshell.
Re: lucia (Jan 3 13:17),
The dreaded unit root rears its ugly head again. This is olds, not news. I have no interest in looking at the paper because in the end it will still amount to testing for unit roots, which tests will always fail if there is, in fact, a trend that isn’t removed before testing. And data with near unit roots will test as having a unit root, but will cointegrate. I would say it’s a dud, not a bombshell.
SteveF (Comment #108041)
“So what new does Franzke contribute? I don’t see anything very interesting.”
I was talking about my personal interest in the methodology and analysis that he applies to a time series. I am not versed at all in spectral analysis but have gained an appreciation for its use in modeling.
Franzke is doing nothing here that I have not seen applied to time series before. I have not read in detail what he has done or even if he has applied it correctly or that I would know exactly if it were applied correctly.
Actually this kind of analysis might be more appropriate for proxies making up a reconstruction. From the modeling I have done on proxies I would conjecture that the proxy response to climate or to whatever it is responding has LTP and to an extent greater than instrumental temperature record shows or at least in my meager efforts to find it or that it could be found with such a short series. I am interested in what a spectral analysis might reveal in these two cases.
Discussion of the Franzke paper also brings to the fore some of the point counter points that Lucia has initiated here regarding trend analysis, and, in my mind anyway, the weaknesses in the alternative approaches.
SteveF, maybe I am just more easily impressed/interested than you are.
Lucia, bombshell was in quotes because most are just bombs. The results though have a good chance of being right, because “global” average temperature is probably not all that great a metric for CO2 and Anthropogenic forcing and looking at Tmin only, the data is non-stationary. The same thing with the Franzke paper, the Tmin of Iceland and Scandinavia appears to be non-stationary, the little ice age recovery issue, and land use change would have added a lot of noise to the interior data series.
So I would think Tmin, Tmax and Tave would have to be the minimum data sets used to attempt to “prove” anything.
The Iceland Tmin is classic BTW, It rose to ~0C from around 1910 to 1930ish then stayed at near 0C until the mid 1990s.
https://lh6.googleusercontent.com/-DzjcvM5RR3A/UOGscyiFXuI/AAAAAAAAGW4/lP1Gis1izuM/s817/iceland%2520Tmin%252011yma.png
That is only one station, but I would think that station should have a lot of weight versus Nanuk’s trading post in nowhere Siberia. There is a huge difference in the heat capacity that the two represent.
So if Tave is the metric, couldn’t more methods be simply wrong?
lucia (Comment #108048)
http://web.uconn.edu/cunningham/econ397/cointegration.pdf
“Consider two variables, one is I(1) and the other is I(2). They cannot be cointegrated. BUT if there is a linear combination of these two variables which is I(1), the linear combination CAN BE cointegrated with another I(1) variable.”
I thought that we had determined that I(1) order model was not appropriate for a temperature time series because it implies unboundedness. Does not the confusion come when the series contains a deterministic trend (or even a stochastic trend that appeared as a deterministic one) and differencing can remove the effects of that trend that makes the series appear to have d=1?
Kenneth–
I don’t know exactly what they did. If their I(1) merely means the deterministic trend exists, then…well.. those can exist provided there is a deterministic cause. In contrast, I think the ‘noise’ cannot be I(1).
I *think* they would argue the same. But, I’m not sure.
Re: dallas (Jan 3 14:38),
GHG atmospheric concentrations are not random variables. Period. We know where they’re coming from and how much. They cannot possibly be even I(1) for exactly the same reason that temperature isn’t I(1). The bucket has a hole in it. The only noise is the measurement error and small (very small) year to year variability in the exchange rate with the biosphere and oceans. I seriously doubt that the noise after removing the deterministic trend is I(1). Even if we believed that solar variability is high enough to influence the climate, that can’t be I(1) either or we probably wouldn’t be here. It’s really easy to confuse a real trend with non-stationarity.
Re: lucia (Jan 3 14:52),
If there is a deterministic trend that isn’t removed, the unit root tests will fail. Period.
And then there’s this:
DeWitt said, “It’s really easy to confuse a real trend with non-stationarity.” Without a doubt. With CO2 we know where most of the trend comes from. About half of the fossil fuel use CO2 seems to stay in the atmosphere. We know that CO2 has radiant properties and would have some impact on climate. That is close to it.
My saying that Tmin appears to be non-stationary over a period longer that the instrumental period doesn’t mean that there is a runaway effect in the works, just that it appears to be non-stationary for the instrumental period. You can look at most ocean based paleo data and see long term trends and recurrent patterns. In fact, having a wider control band improves the chances of us being here.
Since CO2 is not the only horse in the race, but we have pretty good background on that horse, it is all the other questions that create the problems, not CO2 IMO, I think it is a great tracer gas.
So what is not going along with theory? Along with the Iceland trend, there is also the shift in the diurnal temperature range trend starting in 1985ish. Is that due to CO2, land use, longer term ocean pseudo-cycles, faulty data, background noise or what possible combination? So I believe it is just as easy to fool yourself into seeing a trend in non-stationary data. What is the “right” method?
There’s something I’ve always been curious about that’s somewhat related to this topic. Would it be better to try to account for noise prior to combining stations records? If stations have different amounts/types of noise, it doesn’t seem like simple averaging would be the best way to try to find common signals in them.
Re: dallas (Jan 3 15:35),
The Beenstock, et. al. paper is simply ludicrous. It ranks right up there with the premise of the TV show Revolution, that posits that you can mess about locally, i.e. planet wide, with one of the four fundamental forces of the universe, electromagnetism, such that electric power generation no longer works, but nerve transmission, for example, still does. I refuse to watch the show so I don’t know what they have done with other gross manifestations such as lightning or the Earth’s magnetic field, not to mention the structure of matter. But EM radiation in the form of sunlight is obviously still there. Gerlich and Tscheuschner’s paper is arguably worse because they, being actual working physicists, should know better. The only reason, IMO, why anyone is taking Beenstock, et.al. seriously is confirmation bias.
DeWitt Payne (Comment #108057)
I had to chuckle at “seminal” and “revolution” in scare quotes!
Boy the authors of the paper you linked must really hate ‘co-integration’.
OMG
So… cointegration is officially confusing? Or sufficiently confusing to merit someone saying so in a journal article? I feel redeemed!!! And much less stoooopid!
Tell us nothing that’s amazing language in a journal article where authors often have to put in caveats etc.
DeWitt
It could also be that some people don’t like to admit they don’t understand it well enough to comment on it. In the past, there were some isolated claims I could parse well enough to say that can’t be right (or must be right.) But with a number of other things it would take quite a bit of effort to get up to speed to know what a test actually claims to do, what assumptions were involved in the set up and so on.
The economics paper you linked is hilarious!
Re: lucia (Jan 3 16:27),
I found the link in the comments at WUWT, but apparently no one commenting afterward has read it.
See for example:
The reply to which is obvious, assuming that the poster wasn’t being sarcastic in the first place.
There are tests in R for a series being trend stationary or non stationary. The Augmented Dickey-Fuller (ADF) test, if feed a series with a trend that is otherwise stationary, the test will reject for unit root. If the ADF test is fed a series with trend and is otherwise non stationary the test will not reject for a unit root. Obviously if the test is fed a series without a trend but otherwise is non stationary it will not reject for unit root and vice versus for no trend and stationary.
I think the problem with this distinction is where there is a failure to recognize that a trend can exist in a series or perhaps where a trend or a non stationary series could exist but neither is known or assumed. In the case of a series like temperature you can assume that you can rule out d=1 and thus the series has a trend. That trend over a short part of a longer series could be a stochastic trend and therein lies the rub in distinguishing between those two.
Dewitt, “The Beenstock, et. al. paper is simply ludicrous.” Quite possibly. Selvam’s fascination with the golden ratio and SOC may also be. I can think of a number of concepts designed to predict unpredictable events all would be ludicrous. It is easy to fool oneself when there is more noise than signal.
Hansen for example states that it is “impossible” for internal variability to produce the temperature changes indicated in paleo climate data. Toggweilder et al. believe that changes in the Drake Passage flow can produce abrupt 4 to 5 C changes in “global” temperature with the NH warming by ~ 3C and the expense of the SH cooling by ~3C. Location of source and sink can have a major impact on heat flow and while radiant physics is sound, the ERL of Earth is not a fixed ideal radiant layer. Internal changes in the center of thermal mass if you will, will produce changes in the efficiency of the atmospheric effect.
Since climate science has turned into a battle of statistics, what is the “right” method? It seems to no longer be a thermodynamics problem.
dallas–
Two not rhetorical quesitons:
a) Do you understand the Beenstock paper?
b) Do you understand what “co-integration” means in the context of the Beenstock paper?
DeWitt Payne (Comment #108069)
“To all the negative commentators above, I will remind you that ALL of the top research departments of the world’s central banks use this methodology and result format.”
DeWitt, the poster has to be into irony or is a modern day Rip Van Winkle.
A) not completely,
B) it is a great way to lose money. Two stocks can make the same general moves for no real reason other than market sentiment, a correlation. Cointegration attempts to find a “reason” for the correlation by looking for all common trends between the to two stocks, inventory, hiring, acquisitions, anything that has a real meaning.
As for climate science, two measures of CO2 impact could change with the same general trends due to confirmation bias. Steig et al 2009 would be a fine example. That would be like market sentiment or herd mentality. In some way it is like Wedgeman’s “network” analysis.
I have no idea if or how well it will work, but as James Annan noted recently, some data is too good or words to that effect.
So Beenstock et al believe they have unraveled the complexity of confirmation bias and developed a method to reduce it. I take their “bombshell” paper with a grain of salt.
Lucia:
This is the part that is interesting IMO. If you can characterize the noise associated with each station, that allows you to estimate the uncertainty of the reconstructed global signal associated with a particular algorithm & methodology (e.g., GISTEMP vs. NCDC vs. HADCRUT3 etc.)
Putting another way, understanding the characteristics of the noise in a measurement has useful applications even if the way Franzke has applied it for this paper is limited in value.
Lucia – DeWitt,
Let’s say I have a theory that the impact of solar forcing is underestimated by a factor of two. Since the majority of the solar insulation is felt in the tropics, I use GISS LOTI regional data with the tropics as the x axis or abscissa and plot the individual regions on the y axis.
https://lh5.googleusercontent.com/-s31LYSIkX2U/UOYckc4BlbI/AAAAAAAAGbk/PzgE64Jr_-E/s912/the%2520sun%2520done%2520it.png
Then I say that the higher land fraction in the northern hemisphere amplifies the solar forcing and that the more stable southern hemisphere due to the thermally isolated southern pole would lag solar variation by 27.5 years, but not consistently, there can be a longer lag, volcanic impacts, CO2 and H2O forcing changes all sorts of noise makes it hard for me to show that solar insulation changes are amplified by two over a variable time frame of roughly 27.5 to 37.5 years.
Stating that the NH warms at the expense of the SH until the average ocean heat content stabilize, I throw out Toggweilder and Bjornsson as one of my references
http://sam.ucsd.edu/sio219/toggweiler_bjornsson.pdf
Then throw in a quick and dirty static model of how the effective energy of the “average” oceans is 334 Wm-2 which deeper penetration of solar short wave solar would impact more easily than 3.7 Wm-2 of CO2 forcing at a fictitious ERL located at some approximate altitude that that is subject to chaotic internal weather oscillations and stronger over higher altitude land masses than the sea level ocean, causing a bit of a quandry in determining exactly how efficient that route would be in warming the bulk of the oceans. Then I run to complete my grant proposal before the end of the fiscal year.
I would of course be laughed out of town because Toggweilder is no Hansen and common sense requires a PhD but, what statistical method would you use to disprove my theorized correlation and lag times with solar?
Admittedly the plot I posted could be considered a “novel” method, I wouldn’t know, I am not a stats guy, but CO2 doubling impact greater than 1.6C requires some pretty creative methods and as Graeme Stephens noted, the models tend towards a “range of comfort” more that reality. So if Franzke and Beenstock’s methods are directed at reducing confirmation bias and bring more thermodynamics back in the game, I am all for it.
Re: dallas (Jan 3 18:52),
Beenstock, et.al. and thermodynamics? Please cite where they even mention thermo.
Also, while I agree that there is currently a net transfer of energy from the SH to the NH with the AMO in its positive phase, I seriously doubt that one can say the southern oceans are insulated. The sea ice there melts almost completely in the summer. There’s only a little bit left near the coast.
The fundamental general distinction between “signal” and “noise” is that the two processes are stochastically orthogonal. Unless the signal is known a prori, assuming that the apparent “nonlinear trend” is all signal generally leads to illusory nonorthogonal decompositions of the data series.
The only geophysical examples of well-known signals are those that involve astronomical forcings, such as the tides. Thus one can justify removing the predicted tides from surface-elevation time series to reveal the random sea-swell and non-tidal long waves. In other cases, only an estimated linear trend can be removed from finite records without severely affecting the spectral structure of the stochastic components under study.
DeWitt, thermally isolated is a pretty common description. If you like you can compare the variance of 44-64S with any other latitudinal band.
I didn’t mean to suggest that Beenstock would introduce thermodynamics, just by pointing out some of the confirmation bias, allow more thermodynamics to be discussed. Trenberth just recently made a “minor adjustment” to remove an error that bugged the heck out of me and the Stephens et al Earth Energy budget comes close to actually representing what we do and do not know.
The Antarctic warming, not warming, warming, not warming, maybe warming, maybe not warming is fun to watch, but the Antarctic should really be out of phase with the ROW since it is outside that elusive ERL thanks to very low H2O. In other words, Toggweilder and team seem to be more correct than most.
Another 5 to 10 years will tell the tale, but the BEST “global” might speed that process up a touch.
dallas,
Has anyone other than you accepted the theory as plausible? If not, why would I bother to spend time disprove your ‘theory’?
Dallas, the 11 cycle would be apparent in the rate of cooling and heating during the Antartic 6 month day/night cycle. The daily Vostok data, over a year, would need to be averaged and then the average removed from the record. Years of higher solar flux would be positive and less than average negative.
You should see a 22/11 year cycle.
Re: sky (Jan 3 19:03),
So you choose to ignore the fact that CO2 emissions are non-linear with time? As I said to R.S. Courtney on WUWT, pull the other one.
And removing a calculated linear trend when the actual trend must be non-linear doesn’t severely affect the spectral structure of the stochastic component?
That, btw, is exactly why one finds unit roots where none exist. The test assumes any trend is linear. If it’s not, voila, a unit root or two.
Hi Lucia,
Michael Beenstock and Yaniv Reingewertz had a previous attempt with this:
“Polynomial Cointegration Tests of the Anthropogenic Theory of Global Warming”
as commented upon here:
http://wattsupwiththat.com/2010/02/14/new-paper-on/
It doesn’t seem to be cited by the current paperand I cannot find a published version of it.
Conintegration methodology may have its place in climate attribution; see recent PHD thesis submission linked here:
https://eric.exeter.ac.uk/repository/handle/10036/4090
which may be a good general reference for the technique as applied to climate.
I think there is some sense in all this in so far as an apparent correspondence between two time series with genuinely different fractional dimensions cannot persist indefinitely.
In the case of admixtures where series with differing dimensions are summed in one or both of the series, any apparent correspondence will eventually evaproate unless both share a component with the highest dimension value found in either, as given sufficient time it will dominate.
The problem with admixtures is that if that component is not yet dominant, the estimated value for the dimension can be too low to show its presence.
I think you and others have covered all this both here and elsewhere where you have argued that d>1 for the observed temperature time series is either unphysical or deterministically forced (by something other than a sustainable natural forcing).
It could be argued that of their estimates for d = 0.94 (NASA-GISS) and d = 1.05 (BEST) the second does at least tentatively imply an admixure of some component with d>1 e.g. that the temperature series contains an unnatural/unsustainable component.
Alex
Re: Alexander Harvey (Jan 3 20:54),
In fact, if you test non-linear functions containing white noise with much lower amplitude than the function values, you do indeed get apparent unit roots. A quadratic gives one unit root, a cubic, two, etc. Anyone want to bet that the AMO index, for example, will test as having at least one unit root? OTOH, after subtracting a sine wave fit from the index, the residuals won’t.
Alexander
The limit is d>0.5. If the temperature v. time have d>0.5, there must be some deterministic forcing to drive that. One might not know it– but it must exist. And there are many possible forcings that cannot themselves have d>0.5– unless someone can come up with an awfully good reason why they can over the long term.
Maybe the “previous” paper you link is not cited because it’s really the same paper but finally published– possibly with revisions.
Yes. But it could persist for an awfully long time. But also, it could be that the apparent fractional dependence of some quantity is incorrect– and even due to measurement error and so on. (I have to digest the paper more to know for sure– and also to think of “toy” problems that might illustrate issues.)
Lucia, there is no need for you to actually do anything with the theory. Since you likely don’t think it is plausible, you would stick with the “plausible” theory warts and all.
Toggwielder et al though have spent some time developing an ocean model that indicates that variation in the ACC can cause 4 to 5 C of abrupt “unforced” climate change. In general, more of the “real” skeptics tend to point out that ocean models are more likely to produce reasonable results, the current trend in coupled atmosphere/ocean models is a touch humorous and as the “sensitivity” drops the significance of other factors increase. Then the “shifts” that Tsonis and others propose would become more relevant.
That is beside the point though, your first instinct is that it would be a waste of time. Since there is a theory in place, there is a “range of comfort” that any alternate theory needs to be close to be considered “plausible”.
Whether Beenstock or Franzke applied their methods correctly or not, they both are attempting to remove confirmation bias which could allow more things to be considered “plausible”. I think they are on the right path.
dallas–
I think we are having a communication problem. Really, I think you are having a conversation with yourself. Certainly, I have no idea what points or claims you are trying to make with much of the recent exchange.
My reaction to those parts of your comment :
So? Are you trying to make a point? Or claim? Please just make whatever point you are trying to make directly. instead of just dragging in random names like “Toggwielder” out of the blue.
Are they attempting to remove confirmation bias? I thought they were trying to test whether we can say greenhouse gases are the cause of recent warming.
I have no idea why you think they are on the right path for … well… anything. Heck, I don’t know if you think the right path is the “path of trying to remove confirmation bias” or the “path of trying to figure out if we ghg’s cause warming”. I also don’t know what you think their papers would represent an step down whichever is the “right path”.
To be clear: I’m not saying I disagree with you. What I’m saying is I don’t have a clue what you are even trying to communicate.
I might be able to understand why you think whatever it is you think about the paper if your discussion stuck to actually discussing what Beenstock et all actually discuss in their paper instead of heading off onto tangents about Hansen or Trenbreth and so on.
Lucia,
Yes d>0.5 is unsustainable. Ooops
Which means I have no idea what their estimates for d (Table 1) imply. E.G. for solar they give d~0.8 which would be unsustainable yet the sun seems to have been in the game for a long time.
I did say “genuinely different” to acknowledge that it only held for the omniscient.
Alex
Re: Alexander Harvey (Jan 3 22:52),
For one thing, they’re using the wrong solar data. They’re probably also picking up on a low frequency oscillation that is making the test invalid even if it were the correct data. The classic unit root tests aren’t very good, especially when used blindly by people who don’t understand the underlying physical principles of the processes that generate the data. A d > 0.5 should have been a red flag that something was wrong.
The linear combination thing leading to a temporary effect is unphysical.
Co-integration, at least from what I understand, is a method to determine the commonality of two time series. Common drift is one term. To use it properly you would find a common drift between two time series then use the method to find other common sources that may contribute to the drift. You wouldn’t just say temperature and CO2 have a common trend, but temperature and CO2 have a common trend that is stronger when solar is at a certain level. It should be a more comprehensive approach than just assuming one or two models then finding some R squared that feels good. So proper co-integration use would compare a lot of series in several ways.
One of the reasons it is used in stocks is to find out if there is a real reason tied to a business sector for common drift or if there is an artificial cause. Not that people would ever intentionally manipulate a stock or use a statistical test that produces invalid results, but caution is a good thing with lots of money on the line.
When Graeme Stephens made his comment about the “range of comfort” I took it as a polite way of saying there is considerable confirmation bias involved. That is a good application for co-integration.
When James Annan said that his models tend to match the questionable data better than the supposedly “high quality” data, that indicates another good application for co-integration.
So Beenstock’s use doesn’t necessarily show that CO2 doesn’t have an impact on temperature, but that it doesn’t seem to have a significant correlation with the temperature data.
This may be a subtle point, but 70% of the “Surface Temperature” is not the “surface” that CO2 forcing would impact. It is meters below the “surface”. That surface would show a stronger correlation to solar forcing than CO2 forcing. Mosher noted on some blog that the new BEST “global” will use actual marine “air” temperatures which would be comparable with land surface “air” temperatures.
So I think it is kinda funny that models are overly tuned to match a surface they were never meant to match 🙂 In fact the “average” absolute surface temperature was decreased from 15C to 14C even though it is likely closer to 17.5 C (290.4K ~403Wm-2 +/- 17 Wm-2)
http://judithcurry.com/2012/11/05/uncertainty-in-observations-of-the-earths-energy-balance/
Is it any surprise that modeled absolute values tend to be worse than anomalies?
Now I can be wrong, but I expect “sensitivity” estimates to drop like a brick pretty soon.
DeWitt,
Thanks
What a pity it is when waters get muddied. I anticipate that we will get more papers that try to use these statistical techniques which might be making valid points and be published by those specialising in climate attribution, like the post graduate student whose thesis I linked to above. It doesn’t help if others have queered the pitch.
Alex
Re: dallas (Jan 3 23:30),
From Wikipedia on the Dickey-Fuller test:
The DF and the ADF test and other unit root tests assumes the time series is a completely random variable possibly with a linear trend. The ADF test attempts to remove autocorrelation in the time series before testing, the DF test doesn’t. Atmospheric CO2 is not a random variable and the trend isn’t linear. Applying any unit root test to raw atmospheric CO2 or other ghg data will wrongly fail to reject the presence of one or more unit roots. The same goes for the temperature. Cointegration theory is not about finding common drifts, it’s about finding reasons to reject common drifts as spurious. Once you have rejected the obvious correlation, then you can sometimes use arbitrary linear combinations of variables with no physical basis for the combination.
It can’t work for climate data because there are non-linear deterministic trends in the data, not random walk drifts (unit roots). Your speculation about confirmation bias is irrelevant.
DeWitt, “Cointegration theory is not about finding common drifts, it’s about finding reasons to reject common drifts as spurious.” That is kinda like half full versus half empty. What I thought I described using cointergration to determine if a common drift is spurious and possibly why it is spurious, “confirmation bias?”.
As you say, it can’t work for non-linear deterministic trends, if could work for spurious trends in what should be non-linear deterministic relationships. So if you are applying cointegration to a process where it should not work, but it indicates spurious trends, what caused the spurious trend?
Applying it wrongly is an issue with any “novel” approach, which is why I ask what is the “right” approach? In my opnion, cointegration is useful “only” for finding confirmation bias in climate, it is not in any way related to the thermodynamic process, only the human error that would impact the data.
So I think my “speculation about confirmation bias” is totally relevant, what else would cointegration “find” in climate data other than “convenient” correlations?
Mosher noted on some blog that the new BEST “global†will use actual marine “air†temperatures which would be comparable with land surface “air†temperatures.
#############
the first version will use SST to match existing indices. Still, discussing whether a MAT version would be interesting to people. I think so.. we will see. lots of stuff coming..
dallas
Even an approach is wrong, it remains wrong even if no right approach exists and even if a right approach exists but we don’t know it yet.
Huh? Look: if an approach is wrong, it can’t be useful for finding confirmation bias in climate. Are you suggesting Dewitt is incorrect and the approach is not wrong? Or what?
I agree with DeWitt. Your speculation about confirmation bias is not relevant. I think if anything, you are exhibiting confirmation bias. You at least seem to be supporting the ‘results’ of a method you admit you don’t understand and which you appear to concede is misapplied but are then trying to explain why the misapplication of this method might somehow teach us something useful. That doesn’t make any sense.
I don’t personally know if DeWitt is correct or incorrect in some of what he says. But I do know that if a method is utterly mis-applied then the results aren’t informative. Period. They should be ignored. This has nothing to do with “confirmation bias”.
Lucia said, “Even an approach is wrong, it remains wrong even if no right approach exists and even if a right approach exists but we don’t know it yet.”
Actually, an approach the produces consistent results can be “useful”, all approaches can be “wrong”. They are models after all.
“Huh? Look: if an approach is wrong, it can’t be useful for finding confirmation bias in climate. Are you suggesting Dewitt is incorrect and the approach is not wrong? Or what?” So not knowing if an approach will produce useful results makes it wrong? All I am suggesting is that the “intent” of the approach is to locate “spurious” trends. It is used in economics where “spurious” trends are known to exist and cost money. So since the possibility of reporting only “favorable” results exists, it can be useful.
“I agree with DeWitt. Your speculation about confirmation bias is not relevant. I think if anything, you are exhibiting confirmation bias.” I am very likely biased, I am aware that I am very likely biased. Everyone should admit that they can be very likely biased.
Now if there is no confirmation bias in any way shape or form in climate science, you and DeWitt are correct and I am wrong. But what other possible reason would there be for using cointegration?
In order to better understand the critical issues of trend analyses of temperature time series and producing proper models for that purpose, I read in detail the Cohn and Lins (C&L) and Franzke (Fz) papers referenced in this thread.
In C&L and Fz the final step of the analysis was regression to calculate a trend and determine the p values in executing the null hypothesis testing. In C&L linear regressions were carried out using OLS on the white noise model and maximum likelihood ratio testing on the ARIMA and ARFIMA models. The calculation of p values were performed parametrically and simulations were not used here. The C&L models were not derived from observed data but rather determined by “educated” guesses. In that case we are not concerned with by the removal of the trends (or not) from the observed data and using the residuals to derive noise models.
The main issue with C&L was using a linear regression when in fact the expected trend is not linear and further that linear regression makes rejecting the null hypothesis of no trend in the series more difficult. When models are assumed as was the case in C&L, obviously one can question the evidence or lack of it in selecting the particular models used.
Fz uses cubic regressions with AR1, ARFIMA and phase scrambling surrogates for noise. Fz states that the parameters used for the AR1 and ARFIMA models were estimated using the observed daily data. For phase scrambling, it appears to me that the surrogate data is derived by matching the ACF of the observed data. The potential problems with Fz lie very much in the details the author does not discuss in the paper.
The major issue with Fz was whether the observed data was detrended and the residuals used for estimating the model parameters. According to Franzke in personal communications with Lucia he did estimate model parameters with and without detrending and he claimed the differences were small. I would have thought that issue would have been discussed in great detail in his paper. I would agree that a proper analysis requires estimations with and without detrending.
Another issue with Fz is the use of daily data to determine the model parameters and then apply it to the monthly data in doing the significance tests. Franzke talks about a relationship between the two periodicities, but it is not clear to me how it was performed in his analysis and the assumptions used in that relationship.
What I cannot get my head around is how Frankze can apply a cubic regression to station data that can apparently be much shorter than the entire time period covered by all the the stations used in the analysis. I can visualize a long series as determined by the average of all the station data being represented by a cubic polynomial. My problem is the shorter series cannot be characterized by a cubic polynomial. The shorter series could be represented by that part of the longer cubic series that covers the time period of the shorter series. That process, however, would seem to me to exclude the localized nature of station data. In C&L the temperature series analyzed was for the northern hemisphere and thus the problems of using shorter series as was the case with FZ is avoided.
I would think that some combination of the C&L and FZ analyses with the caveats of the Black Board included would provide a better basis for temperature trend analysis.
Dallas–
Define “consistent”. In context, I am using “wrong” to mean “produces results that are useless”. Snappy quotes don’t change the fact that some models are useless. Moreover, results from methods that might be useful if when the method is applied properly can become less than useless if the method is mis-applied.
You have it backwards. Knowing that it can’t produce useful results in a particular application makes it wrong.
Did you read the criticism of the method in the econometrics journal? It points out that using co-integration results as a basis to diversify portfolios costs money relative to just using correlation!
What do you even mean by this? If you are concerned about the possibility that people cherry pick methods and report results they like, the existence of the co-integration method doesn’t fix that. If you mean that sometimes, people who look into an issue and use methods correctly find– after careful application of methods– that a result they happened to have believed at the outset– was true and they report it (but would have reported the outcome they did not believe as well). Well… it’s true that “favorable” results can exist– but that’s not really a problem. No one needs a new method to “solve” this.
Co-integration has a stated use. If it can achieve that goal, then it’s useful to that goal. If it can’t, it’s not. The stated reason has nothing to do with confirmation bias.
Sure. But that has nothing to do with whether (a) cointegration is useful for anything, (b) whether co-integration can usefully cure “confirmaitn bias” or (c) whether co-integration as applied in this paper produces results that are any better than a coinflip.
The “intent” of the approach is not especially important if the method is misapplied to the point where the results of the method tells us nothing about spurious trends. I could say “heads= trend spurious; tails= trend real” and flip it with the intention of identifying whether the trend is spurious. My intention is not relevant to assessing whether flipping a coin is a useful method for detecting whether the trend is spurious.
Huh?
Presumably the claimed reason for using cointegration would fall into the collection of “possible” reasons. The claimed reason is to determine whether there things are co-integrated. But– based on works linked above– it’s not entirely clear it does that reliably. And Dewitt asserts that it cannot work in this case.
All this blather about “intent” to reduce “confirmation bias” is irrelevant to assessing whether or not the results of the Beenstock paper are informative with respect to figuring out whether the recent warming is due to GHG’s.
Kenneth–
I think the C&L paper would have been improved by extending doing a second estimate of the “noise” based on residuals to some AOGCM mean trends. (Mean trend from 1 model or multi-model mean trends.) That way, if he still found ARFIMA, and still found “failed to reject”, we would have something quite interesting. The difficulty is that they end up “failing to reject” relative to an alternative determistic trend that no one believes. And this has consequences.
I think the Franzke paper would have been improved by simply reporting both sets of results.
I’m sure SteveF would think it still doesn’t matter– and in some sense he’s right. But it was published. It would have been useful if both sets of results were presented– and if there is any question about the issue of advantages of detrending, that should have been discussed!
In this case, it’s a shame the reviewers didn’t do that because I’m pretty sure Christain would have been happy to accommodate that. He’d already done the work!
Lucia,
“So since the possibility of reporting only “favorable†results exists, it can be useful.
What do you even mean by this? If you are concerned about the possibility that people cherry pick methods and report results they like, the existence of the co-integration method doesn’t fix that.”
Then cointegration could be useless, that is the “only” reason I thought it could be useful. Graeme Stephens’ “range of comfort” comment does imply that once a range is established, it is difficult to make large moves from that “likely” range. That is just human nature and not deliberate “cherry picking” as much as just being cautious. It would be nice to have some method that could find too good correlations as well as spurious.
“Did you read the criticism of the method in the econometrics journal?” Skimmed it and I started off by saying cointegration is a good way to lose money.
Steven Mosher:
A file included with the code BEST released (downloaded forever ago):
Mosher a month ago:
Giving relatively specific deadlines that you fail to meet makes it hard for people to trust you much when you then switch to vague deadlines. At a certain point, BEST is going to have to start releasing stuff they’ve promised or give up.
In the meantime, could you at least answer a question I’ve had for a while? Where is the code for the seasonal adjustment of BEST series? I’ve downloaded the code from BEST’s site, but I can’t find what I’m looking for in it. I assume that’s just my fault (partially because I can’t run MATLAB code). If so, could you point me to a filename?
lucia,
Beenstock, et.al. does appear to be directly relevant to this thread, IMO, because it’s really all about whether to detrend and how. Beenstock, et.al. don’t detrend and test raw data. If the Wikipedia article on Dickey-Fuller is correct, that’s a somewhat controversial choice based on no priors. I’m tempted to fit the raw atmospheric CO2 data to an ARIMA model after determining I with the ADF test and generate the envelope of possible CO2 time series. I’m pretty sure if I did that, everyone would see the ridiculousness of the assumption of no deterministic trend. If I assumed that the cointegration method works, the first thing I would then do would be to see if atmospheric CO2 and CO2 emissions cointegrate. I’m betting that they do. In which case I could model the atmospheric CO2 as a linear function of emissions and look at the residuals of that fit. If I model the noise of the residuals, I could generate a new envelope of possible CO2 time series. I’m betting that the range of that envelope would be a lot smaller. If so, I have then proved(?) that there is a deterministic trend in the CO2 concentration. Any possible unit roots in the noise would be irrelevant because the noise is small compared to the trend. Then there would be no barrier to correlating atmospheric CO2 to temperature.
lucia (Comment #108106)
“I think the Franzke paper would have been improved by simply reporting both sets of results.”
I reread C&L and Fz and I feel confident that I understand what C&L did (agreeing is another matter), but with FZ the more I look at it the more I feel I do not understand what he did. How did he do a cubic regression on short station series? How readily can you tranlate the AR1 and ARFIMA parameters determined from the daily data to the monthly data and with what assumptions? I can see that daily data giving you 30 times more data points but that data are going to be highly correlated and surely lessen your degrees of freedom.
Alexander Harvey (Comment #108086)
January 3rd, 2013 at 8:54 pm
“Conintegration methodology may have its place in climate attribution; see recent PHD thesis submission linked here:
https://eric.exeter.ac.uk/repository/handle/10036/4090
which may be a good general reference for the technique as applied to climate.”
I do not know what conclusions this dissertation makes or how well they might be derived, but from my reading through the first 30 pages, I would recommend the paper for explaining the statistics related to time series and temperature in particular.
I have found in the past that an advanced degree dissertation can be much more informative in the basic concepts used than what one sees in a peer reviewed publication where the space can be limited.
lucia,
The economics article I linked is in a rather obscure journal. That doesn’t make it wrong, but it does mean it carries less weight.
sarc
After all, we know the Nobel Prize committee could not possibly have made a mistake in awarding the inventors of cointegration the Nobel Prize in Economics.
/sarc
DeWitt,
Not all Nobels are born equal.
In their English forms we have:
The Nobel Prize in Physics
The Nobel Prize in Chemistry
The Nobel Prize in Physiology or Medicine
The Nobel Prize in Literature
The Nobel Peace Prize
The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel
spot the odd one out.
Alex
Re: Kenneth Fritsch (Jan 4 14:37),
Look at his toy model for CO2 concentration on page 61, equation 3.31a. He specifies it as a random walk with drift because the noise is integrated and the emission term is invariant. That hardly matches reality where emission increases non-linearly over time, not all of it stays in the atmosphere nor is the fraction that stays in the atmosphere constant and it’s not at all clear that the noise, however much there actually is, is integrated. Measurement uncertainty, for example, is not actually part of the process. An error in measurement in one year does not persist in the next year. I suspect that applies to estimates of total emission too. The actual emissions are what they are and it’s not clear that there is a stochastic component to them at all. Not to mention that there’s feedback between temperature and the rate of CO2 increase.
The other thing is that determining the order of integration is totally dependent on signal to noise. This is easy to demonstrate by adding different levels of noise to non-linear functions. The noise level of the temperature record is much higher than the CO2 record so it’s going to appear to become stationary after one differencing. If you add enough noise, it’s going to take a really long record to reveal non-stationarity. In fact, he shows that the ice core temperature data is stationary with long term persistence and some non-linearity when switching between glacial and interglacial regimes.
It all reeks of bogosity.
Kenneth,
Now that I have read about the same amount of that dissertation as you describe, I am better informed for my pains.
Alex
Re: Kenneth Fritsch (Jan 4 14:37),
And the icing on the cake is that while the observed atmospheric CO2 concentration can’t be cointegrated with the observed temperature record, the temperature data from the CMIP5 models, which use said record to produce said output, can be. Bo. Gus.
I see what he’s doing with 3.31a. He’s purposely creating a CO2 record that is I(1) and using a linear relationship to create temperature series that are also I(1).
Haven’t read all the comments on the cointegration thingy. But it seems to me that the rate that the heat comes out of the pipeline can be considerably damped. Ergo, cointegration is probably a red-herring (yumm! assuming they’re pickled as well!).
AJ, “Haven’t read all the comments on the cointegration thingy. But it seems to me that the rate that the heat comes out of the pipeline can be considerably damped.’
Yep, things can be amplified too, starting new damped decay patterns. Heck, things can get chaotic 🙂
DeWitt, that thesis mentions in the methods section that the best estimator is the one with the least variance. In GISS LOTI, the region with the least variance is 44S-64S and the SH, according to this cointegration stuff, tends to lead the NH temperatures. So it seems to me, that using something that has less noise to begin with might be a little easier than trying to massage out all the noise of the things that wiggle around a lot.
Since Franzke noted that the only individual locations that had a significant trend were near Iceland and Scandinavia and since Iceland and Scandinavia might tend to have less variance because of the much larger thermal mass of the not as often frozen oceans near them, wouldn’t that lend just a tad of support to Franzke’s paper or is that not complex enough for statistics?
DeWitt Payne (Comments #108119 #108121)
“The other thing is that determining the order of integration is totally dependent on signal to noise.” Are you saying that “Temp is actually the same order of integration as CO2, if only we could see it “? A hypothesis to be sure.
“while the observed atmospheric CO2 concentration can’t be cointegrated with the observed temperature record, the temperature data from the CMIP5 models, which use[d] said record to produce said output, can be.”
I’d be surprised if the past output of GCMs didn’t add something over just using past Temp to our ability to predict Temp , and vica versa. The fact that GCMs used CO2 is neither here nor there.
DeWitt Payne (Comment #108119)
I have not read beyond page 30 , but if the author is claiming that the trend in CO2 levels in the atmosphere is not deterministic in the past 50 to 70 years, I think you have spoiled the ending. He talked about trend stationary in the part I read and appears to know the difference between a random walk drift and a deterministic trend.
DeWitt Payne (Comment #108084)
I’m not at all saying what you impute. What I’m saying is that one MUST know quite precisely the nonlinear “trend” in a short series before removing it; otherwise the decompostion of the data series becomes severly nonorthogonal over the entire frequency range. The removal of an estimated linear trend, on the other hand, affects only the very lowest frequencies.
FYI, competent geophysical analysts do not resort to “unit-root” stochastic models. Those are largely the province of econometricians and geophysical amateurs. I have little time for discourse with either.
dallas
One can’t just support “a paper”. So, a tad support of what aspect of the paper? No one is saying that we necessarily think finding “no statistically significant” can’t be right. The criticisms range from “So what if the trend at any individual station is not statistically significant?” (This relates to doing an analysis on individual stations at all. One should generally try to group data to maximize power– not minimize.), “One shouldn’t use a method that lowers the power given a particular set of data.” (This relates to not detrending. All other things being equal, one should try to pick a method that maximizes power, not minimizes it.)
It might very well be that if Franze pooled or detrended, he’d find more or less the same thing. But that’s not done in the paper. That’s what people are criticizing the paper for.
Lucia, “One can’t just support “a paperâ€. So, a tad support of what aspect of the paper? No one is saying that we necessarily think finding “no statistically significant†can’t be right.”
From the abstract Franzke was just looking at stations to see which had significant trends. The only stations he found with significant trends had a good reason to have statistically trend, ocean heat capacity. The stations where Franzke found no significant trend, would be stations with less of a thermodynamic reason to have a significant trend. “One” may not be able to support a paper, but the thermodynamics can.
As far as showing detrended plus what is in the paper, the more the better, but his abstract basically says, “Hey! A lot of these stations don’t seem to have significant trends using this method.” If he or some else would like to try pooling or weighting, go for it.
The heartburn associated with Franzke paper is the people that use snips of Paleo reconstructions in the regions where Franzke’s paper indicates that finding a significant trend is a little tougher. I believe that the paleo selection process has been discussed from time to time with a number of questions. How was the regional instrumental data selected to find a matching trend?
If there is a serious flaw in Franzke’s method for analyzing “individual” stations, then I would agree with the pooling, but Franzkely, I think the ball is in the other court.
Dallas,
The only ‘flaw’ is that it is pretty well known that trends for individual stations are noisy enough to mask a real regional or global trend. The criticism is that the paper doesn’t make any real contribution. Franzke (or someone else) can show the same thing over and over, but that doesn’t add to understanding. It is just a technically very weak paper, and certainly not a “game changer” as some have suggested.
Re: sky (Jan 5 14:58),
Thanks for the clarification.
I had a longer reply about why cointegration isn’t really applicable eaten because I didn’t notice that my contact information had disappeared.
And everyone, please don’t tell me I should compose in Notepad or something to avoid that problem. It’s not worth the effort.
Re: HAS (Jan 5 11:40),
No. Well, sort of. What I’m really saying is that the results of order of integration tests for CO2 concentration and surface temperature aren’t valid because neither process is actually a random walk at all. We know where the extra CO2 comes from and about how it proportions between the atmosphere, ocean and biosphere over time. The only randomness is how precisely we can measure. The estimateded CO2 cumulative emission time series is at best I(1). That’s because the error in estimating annual emission does integrate. But that error is small compared to the deterministic trend.
Cumulative global CO2 emission (fossil plus land use) since 1960 is very well modeled by a quadratic equation, R = 0.9999. Atmospheric CO2 concentration (MLO) correlates linearly with cumulative CO2 emission over the same time period with R = 0.9988. Run the ADF test on data from the monotonic increasing part of a quadratic equation and I guarantee you will find an integration order of 2. But there’s nothing random about it.
Dallas
The main response to this is “yawn” — which is pretty much what SteveF is pointing otu. The fact that very noisy locations have no statistically significant trend is “yawn”. The fact that less noise stations have a significant trend is “well, we already know AGW causes warming…so.. this confirms it. Yawn.”
And the response to that should be “Yawn”. The issue isn’t a question of whether it’s right or wrong, but rather “who cares?” and if we find some one who cares “Please explain why anyone should care?” Because there is no reason anyone should care.
I have no idea why you are saying that the “heartburn” associated with the Franze paper has anything to do with paleo. As far as I can see the Franske paper has nothing to do with paleo.
Re: sky (Jan 5 14:58),
If I subtract a linear trend from quadratic data, I still have data that can be fit by a quadratic equation. And that, of course, is the problem with trying to determine integration order with a test that assumes that any drift is linear. It’s still going to get the wrong answer.
Re: DeWitt Payne (Jan 5 18:37),
Actually that should be an integration order of 1 for a quadratic. But if you use a longer time series for CO2, it isn’t fit well by a quadratic. That’s what gets you an integration order of 2. A noiseless exponential function would give you an infinite integration order.
DeWitt Payne:
Even something like $latex V_0 + V_1 e^{-t/\tau}$ ?
SteveF, ” It is just a technically very weak paper, and certainly not a “game changer†as some have suggested.”
I agree, it would take a lot more to make it a game changer. One on the major problems is that so many papers are just published just to add to the list. According to his abstract, he did what he did and is perfectly happy with it. There is no connections of the dots.
As you say, it requires a little more work to tease a trend out of the noise, everyone knows that. But why does it take so much more work to tease out that trend? Not at all covered in the paper.
As Lucia says, ” As far as I can see the Franske paper has nothing to do with paleo.” To show that it does, would require an astonishingly large amount of work.
I couldn’t just say, if you use Tmin for 70% of the globe and Tave for 30% of the globe, the resulting “global mean temperature” is flawed.
I would think that would be obvious and not worth the effort. Kinda like Leif Svalgaard mentioning that there is resistance to standardizing sun spot number reconstructions. It is wrong as it is, but comfortable, why actually correct an error?
If you were to rebut Franzke’s paper with a method for pooling stations that were no closer to a body of water than 100km versus stations that are at least 100km close to a major body of water then combine the two, you would have another paper than shows exactly what you would expect, the stations closest to the largest thermal mass have a more consistent trend the ones further away have greater variance.
Since Franzke’s paper does not “directly” address this, it is common knowledge after all, you could take some of the regional paleo proxies, add the 15 to 40 years of data not included because it diverged from “instrumental”, the say to yourself, “Self, this has the same well known issue that Franzke’s paper points out, there is no standardization of methods and thermal mass/heat capacity is something that does have an impact.
So since it requires a technically “strong” paper to point out the obvious, Franzke has just added to the noise.
BTW, I think Mosher is also trying to make some station distance to water/coast weighting for the BEST or his own use which is complicated because prevailing wind is a consideration.
I am sure that will be another yawner 🙂
Lucia, “I have no idea why you are saying that the “heartburn†associated…” Ramus may have an idea.
Dallas
Well.. then do the work to show some connection with paleo. It’s never mentioned in the paper itself! (I suspect you would fail.. but that’s just me.)
Rebutt? The claim in the paper relates to individual stations. That’s “Yawn.” Rebutting would require showing the claim about individual stations is not true. Well.. it may be true. But “Yawn”. You can’t “rebut” it with pooling stations– because while findings based on pooling stations is much more interesting different findings doesn’t rebut the (yawn) finding that individual stations might be too noisy to demonstrate a statistically significant trend.
“Directly”? It doesn’t address it. At. All. You are just making up ideas in your head that you think might be someone or another related to … whatever.
Oh? Rasmus didn’t say anything about paleo. So whatever “heartburn” he might have over the paper, that doesn’t seem to have anything to do with your attempt to connect the finding of this paper– which have nothing to do with paleo- to paleo!
“BTW, I think Mosher is also trying to make some station distance to water/coast weighting for the BEST or his own use which is complicated because prevailing wind is a consideration.”
Wouldn’t the diurnal temperature variation (DTV) be a good proxy for the distance? With the assumption that the DTV was calculated using rural locations. I recall seeing some data on DTV a year or so ago.
Re: Carrick (Jan 5 19:32),
I dunno about exponential decay. I was just thinking about y = EXP(x), since CO2 levels are increasing.
DeWitt Payne:
Yeah I figured that. Just curious if you knew.
(Quite honestly, I haven’t spent a lot of brainpower on cointegration.)
The “bombshell” post on WUWT was a disappointment, but it did have at least one positive effect: It got me to see this post. I’ve always respected Leif Svalgaard, and it’s interesting to see a fairly large change is needed in certain records.
So thanks for the link dallas!
DeWitt Payne (Comment #108132)
All I can say is that you know a lot more than me about CO2 and all that.
As I said – it looks like a hypothesis or two, but knowledge?
Here’s an actual question for anyone who does know something about cointegration: Suppose the generation of CO2 emissions were completely deterministic but we don’t know the details of the process. That would mean it was I(0), I think. So each year we estimate the emission for the year and add it up. Because there is presumably random error in the annual estimates, the estimated cumulative emissions will be integration order I(1). Does that mean that any correlation between the estimate and the actual total emissions must be considered spurious? Or, if the relationship between cumulative emissions and atmospheric CO2 level was known that any correlation between estimated cumulative emissions and atmospheric CO2 concentration is also spurious?
Or a more practical example: We have a three axis accelerometer which has some level of random noise in the output. If we integrate the output to get velocity, the calculated velocity will be an I(1) random walk. If we integrate again to get position, that will be I(2). One would think that would mean that accelerometers could never be used in inertial navigation. One would be wrong. If you have a drift free reference, specifically a GPS, the data from the GPS can be used with a proportional integrating filter to correct the drift in the integrated accelerometer data. The reason you do this is because you can get accelerometers that output at 100-1000 Hz while GPS is on the order of 1-10 Hz. This is important if you’re building an autopilot for a model airplane or logging data from a race car. You can improve things even more with a three axis rotation rate sensor for orientation using the GPS heading as a drift reference. And even more with a three axis magnetometer so you know the orientation even when the vehicle is not moving.
Lucia,
There are a number of papers which, lke Franzke, investigate ToE (Time of Emergence). ToE seems to have importance.
I believe it is defined for either a location, nation, region, etc., with differenct ToE for each.
Once you have passed ToE, you can consider that a particular signal has emerged and you are subject to it. This has ramifications for adaptation, e.g. prior to ToE you should have made provisions for what has occurred in any case. After ToE you can attribute any damages caused by the observed change to the climate change signal. Under some framework, legal or quasi-legal, you would not have a case until after ToE.
Currently local ToE is thought to be occurring but for the most part has not yet occurred. The first locations/nations to emerge will be in the tropics with a low level of warming, the last will be in the Artic following much greater warming. I think the dates are thought (by some) to be around 2020 (tropics) 2050 (Artic) for local ToE. Until those times any local phenomena should have been planned for irrespective of climate change. (I suspect that sea level rise would be an exception as it is primarily a global not local phenomenon.)
Perhaps ToE expresses the difference between has climate change occurred, which can be assessed globally, and can it be said to have caused a problem, any specific local/national issue. Would a local/national body be able to make a forensic case? The answer seems to be no, not yet.
Without forensics, I cannot see that anyone or group has a legal or moral case for damages, after ToE that might change, such is the stuff of treaties etc..
If I understand their thinking, a Russian heatwave might be evidence towards GW under consilience, (the convergence of evidence), but not forensic evidence of the effect of GW, (an emergence of evidence).
At this time, nothing much has occurred that a prudent person, community, or state should not have planned for. A lot of people would tell you otherwise, which could be countered by arguments based on the non-occurrence of ToE. This is a field of play with the possibility of getting more bloody, rather than less. People get upset about it.
Alex
DeWitt Payne (Comment #108146)
“Here’s an actual question for anyone who does know something about cointegration:”
I am only to page 57 of the thesis on cointegration and thus I am not quite an expert at this point.
The author talks about using OLS and TLS as I recall and what you describe what appear applicable to TLS since it involves measurement error in the dependent variable. How do get from a random measurement error to a random walk? The measurement errors are surely not cummulative.
I like the Cohn & Lins and Franzke papers because they both do something different with temperature series that are not commonly done in the literature. That does not mean that they applied what they did correctly.
Does anyone have a link to the 190 stations data sets that Franzke used. I am afraid that if I go to KNMI I’ll have to download each station individually or code in R to do that. Does anyone have the distributions of the station series lengths and time periods covered?
I see many problems with using stations on an individual basis and a couple of problems with Franzke trying to combine these stations into a regional data set. Doing temperature anomalies as most data set producers do would require a common period of time of a reasonable time span to be used as a base period for calculating anomalies. I suppose one could apply the scalpel method that is used in the BEST data set. I would predict that the regional series would have heteroskedasticity due to the sparseness of earlier data leading to more variation in that part of the series – just as we see in the BEST series.
Kenneth Fritsch (Comment #108148)
DeWitt, on more carefully reading what you stated I see we are talking about the total CO2 emissions and not CO2 levels in the atmosphere. I see that each years estimation of CO2 emissions has a random error that can be related to the change measured in the atmosphere for CO2 which will have a measurement error also. However, you have an annual estimate in CO2 emissions added to the atmosphere versus a change in atmospheric levels of CO2 that is not effected by cumulative measurement errors. Is not the relationship in question and of importance here?
Also do not you have a carbon isotope that allows a measure of anthropogenic generated CO2 into the atmosphere?
Since I have not read the thesis in its entirety yet I am probably missing some subtle point here.
DeWitt,
The autopilot on my boat uses a three-axis magnetic sensor and a three axis accelerometer (along with a cleverly programed micro-processor) to optimize response under rough (rapidly changing) conditions. It can also accept an external GPS reference as well to correct for slower changing factors like wind changes and water currents. You can buy the sensor (Honeywell?) for cheap, but programming and interfacing with a 1/2 HP DC servo-motor would be…. ahem….. not so simple.
DeWitt Payne (Comment #108146)
Would not the random walk produced by accumulated CO2 emission estimate errors have to be a random walk with drift to produce a trend that might be considered spurious. In other words the emission estimate would have to be bias in one direction. I would also think that tests could be run to determine whether you are looking at a deterministic trend , a random walk or a random walk with drift.
http://people.duke.edu/~rnau/411rand.htm
http://people.duke.edu/~rnau/411ser2.htm
DeWitt Payne (Comment #108134)
January 5th, 2013 at 7:09 pm
“If I subtract a linear trend from quadratic data, I still have data that can be fit by a quadratic equation. And that, of course, is the problem with trying to determine integration order with a test that assumes that any drift is linear. It’s still going to get the wrong answer.”
I finally read through that dissertation and I got to the part where the author uses ADF and KPSS tests for stationarity of the observed and model temperature series.
DeWitt, you have nailed it with the statement above. Those tests will get fooled when a time series contains more than one trend or if the trend were a fit for a cubic or other polynomial function. I use linear segmented regression for temperature series to get around this problem.
The author says the following about the concentration of CO2 in the atmosphere versus time: “Therefore, ct is assumed to be a random-walk process with drift.”
Looking at the CO2 versus time relationship I would expect if I assumed a linear trend and detrended, I would find for a stationary series, i.e. the Augmented Dickey-Fuller test would reject the null hypothesis of a unit root.
Lucia, “Well.. then do the work to show some connection with paleo. It’s never mentioned in the paper itself! (I suspect you would fail.. but that’s just me.)”
You are probably right on the fail part, but I think it would not be difficult to show that NH land paleo data is more strongly related to Tmin than Tave. The real “yawn” is that Tave sucks and paleo attempts to force fit to Tave which doesn’t have a statistically significant trend without extended pooling or smoothing with careful squinting. GHG impact should be more evident in Tmin, but Tmin is more closely related to ocean psuedo-oscillations than CO2 forcing changes.
Kenneth,
A random walk is defined as the integral of white noise. Your classic one dimensional random walk is to take a step in one direction if a fair coin comes up heads and one step in the other direction if it comes up tails. The most probably location is always the starting point, but the probability decreases with time as the standard deviation increases as the square root of the number of coin flips. This is also related to the gambler’s ruin problem.
Not if you look at the series from, say, 1850 on. Does this look like removing a linear trend would make it stationary? I need to go back to all the stuff I did when B&R first appeared, but I’m pretty sure that the raw CO2 data will test as integration order 2, not sure about drift. B&R tested the temperature series against the envelope of a large number of trials of an ARIMA(3,1,2) model, or something like that, and found that the temperature series stayed inside the envelope. But it’s all model specification. And, as I remember, they didn’t do anything similar for the CO2 series. I’m betting the CO2 series will exceed the envelope at some point, probably by a lot. Which means the model must be mis-specified, i.e. not I(2).
Another thing is that while CO2 emissions appear to be non-stationary now, it can’t be true forever because the supply of fossil fuel is finite and emissions will eventually trend back toward zero. Again something I haven’t done, but I would bet a full logistic curve, for example, would test as stationary but the initial rising part wouldn’t because it’s very close to exponential.
Re: Kenneth Fritsch (Jan 6 18:43),
You did see that he did exactly that for his toy model testing. His equation for CO2 is the sum over time of a constant and random noise. That’s I(1) with drift. Then he uses a linear relationship between CO2 and temperature to create temperature series that are also I(1). But that doesn’t look very much like the real CO2 series because, for one, emissions increase over time. I’m betting that a linear increase in emission over time plus noise would test as I(2), but it’s not because the drift term isn’t random noise. It’s completely deterministic.
Kenneth,
Taking the atmospheric CO2 anomaly from 1850-2005 and using auto.arima in R, I get ARIMA(2,2,2). But the fit is horrible:
Series: CO2[, 2]
ARIMA(2,2,2)
Coefficients:
ar1 ar2 ma1 ma2
-0.1260 -0.7041 -0.3404 0.5079
s.e. 0.1299 0.1171 0.1631 0.1173
sigma^2 estimated as 0.05737: log likelihood=1.13
AIC=7.74 AICc=8.14 BIC=22.92
I haven’t done a Monte Carlo but two simulations gave final values of -200 and +150 ppmv CO2.
Ludicrous.
auto.arima on the date column works much better;
Error in arima(x, order = c(1, d, 0), xreg = xreg) :
non-stationary AR part from CSS
Series: CO2[, 1]
ARIMA(0,1,0) with drift
Coefficients:
drift
1e+00
s.e. 7e-04
sigma^2 estimated as 3.335e-28: log likelihood=4683.32
AIC=-9362.64 AICc=-9362.56 BIC=-9356.55
Warning message:
In auto.arima(CO2[, 1]) : Unable to calculate AIC offset
Re: DeWitt Payne (Jan 7 09:13),
That now looks unlikely to be true. But since there was no drift term, the arima(2,2,2) model predicts that CO2 was as likely to go down as up. A bane on econometricians that treat concentrations of real substances and temperature as if they were the price of a stock. 1 ppmv of atmospheric CO2 is ~2Gt of carbon. It can’t appear and disappear at random. It has to come from somewhere and go somewhere else. The same goes for temperature.
lucia,
I am supposed to use the estimated sigma^2 from auto.arima to calculate the sd for arima.sim, am I not?
“DeWitt Payne
1 ppmv of atmospheric CO2 is ~2Gt of carbon. It can’t appear and disappear at random.”
Total Atmospheric Carbon: 810 Gt
Terrestrial Systems:
terrestrial vegetation: 540 †560 Gt
organic matter in soils: 1,500 – 2,500 Gt
Oceans: 38,000 Gt
Biotic Carbon (Ocean + Terrestrial + Atmospheric): 41,360 Gt
Carbonate Rocks: 60,000,000 Gt
Recoverable reserves of coal, oil, and gas: 5,000–10,000 Gt
Kerogens: 15,000,000 Gt
Annual Volcanic CO2 Primary Input approx 0.2 Gt.
Time for half of all biotic carbon to undergo mineralization and for half of all biotic carbon to be replaced by volcanic source: 100,000 years.
DeWitt Payne (Comment #108174)
“Not if you look at the series from, say, 1850 on.”
All that shows is that if there is more than one linear trend or some non linear relationship in the series that these tests for stationarity and arima fitting will fail. I am surprised that knowing what you do about the physics of the process you would even attempt to model or test it in this manner.
I think you are attempting to show where a naive approach can lead, but I would rather see where the modeling leads with a knowledgeable approach.
The hard sciences have a leg up on the softer ones in that they are evidently better able to apply some deterministic relationships when science is involved. On the other hand, I have seen some hard science physicists make some god awful mistakes in attempting to apply what they think are deterministic relationships to stock values and investment strategies.
Re: DocMartyn (Jan 7 14:26),
I know that.
The point is that all the non-buried reservoirs are in equilibrium with each other, even if the exchange rates can sometimes be slow, like with the deep ocean. Until we started burning massive quantities of fossil fuels and making a lot of cement, the atmospheric CO2 in the last few million years only changed significantly between glacial and interglacial epochs and was reasonably stable outside the transition periods. If atmospheric CO2 were really subject to random walk statistics when unforced, we would have seen evidence in the ice core record. But in fact, because it oscillated about a mean value, atmospheric CO2 over the last 800,000 years had been stationary.
As of 2005 humans had released 321 Gt carbon from fossil fuel consumption and 156 Gt C from land use changes into the atmosphere. Cement production is in there somewhere too. Since that accounts for more than twice as much carbon as is still in the atmosphere, the rest of the environment must be a sink, not a source. Unless you’re a fan of Richard S Courtney, it’s pretty obvious where the increase in atmospheric CO2 has come from.
DeWitt Payne (Comment #108173)
The auto.arima function in R is a quick way to find the best arima model but you have to be careful as it will return a best model with ar and/or ma coefficients very near, equal to or greater than 1. Did you assign xreg to the time series or leave it as null?
Re: Kenneth Fritsch (Jan 7 15:46),
The only obvious modeling approach is to relate cumulative carbon production over time to atmospheric CO2. That’s been done. In fact, I’ve done it crudely. But it’s nearly useless for forecasting because future production is unknowable. Which is why the IPCC uses different scenarios of future carbon production like A1B1.
I ran the arima modeling because I had never seen it done and didn’t know how really bad it was. How anyone who has actually seen the envelope of possible atmospheric CO2 time series produced by a naive use of ARIMA could believe that naive ARIMA could then rule out a relationship between CO2 and temperature is completely beyond me.
Re: Kenneth Fritsch (Jan 7 16:19),
I used default. But as you can see, none of the coefficients are all that close to 1. If I set xreg = x, the calculation crashes.
“DeWitt Payne
I know that.
The point is that all the non-buried reservoirs are in equilibrium with each other’
they are not in equilibrium at all. They are a product of life. The carbon in the air, on the land and in the sea are biotic. they are their because the planet is alive.
“even if the exchange rates can sometimes be slow, like with the deep ocean.”
OK have a look at the rate at which various types of marine shit fall from marine organism bottoms to the sea floor. You know that the bottom of the sea floor is made up out of dead, decaying organisms, marine excreta and insoluble particulates don’t you?
This mixture is anaerobic and the microorganisms use a Fe and Mn shuttle to reduce transition metals at the bottom and these get oxidized at the hypoxic zone interface. The general products of anaerobic oxidation of marine detritus are CO2 and CH4. These gasses of course rise, with CH4 being oxidized at the hypoxic zone interface.
Of course, the various movements of organic particulates (i.e. marine crap) down and of CO2/CH4 up, plays merry hell with isotopic labeling techniques.
The one thing we can be sure of is that the water at the bottom of sea comes from very cold dense waters that have fallen from the surface.
The Mediterranean is the best known example of this, with striations of brine at the bottom being identified from different winter surface waters sinking.
Re: DocMartyn (Jan 7 17:38),
You can wave your hands all you want, but you can’t explain why the behavior of atmospheric CO2 changed right at the time of the industrial revolution when it hasn’t under similar environmental conditions in times past when there was no human influence, and why that behavior closely matches in every detail what you would expect from anthropogenic carbon release. Life is also an equilibrium process on average. It’s not a random walk either. Not to mention that the atmospheric oxygen level has dropped by exactly enough to match fossil fuel consumption. Oh, I know. “Life” That explains everything and nothing. You might as well say it’s just a coincidence and get on with it. But that’s not a convincing argument. In fact, it’s a fairly convincing argument that you should be ignored in the future.
I don’t deny that the change in the steady state level of atmospheric CO2 is a function of mankinds burning of fossil fuels. My problem is you treating a dynamic steady state system as an equilibrium.
“DeWitt Payne
Not to mention that the atmospheric oxygen level has dropped by exactly enough to match fossil fuel consumption”
A drinker of the Kool-aid I see. There is no calculation which shows a drop in O2 matching the rise in CO2.
Such a postulate is quite clearly bollocks.
Keeling has shown a lovely change in the O2/N2 ratio, which is in its own right very interesting. Some people have seized on this as proof that steady state levels of O2 have dropped due to burning fossil fuels. However, Keeling has also shown a change in the Ar/N2 ratio. This latter pair is really funky. Ar is abiotic and atmospheric N2 almost so. Ar and N2 are essentially in dynamic equilibrium in the atmosphere/aquasphere (as the bitotic nitrogen pool is only a tiny fraction of the total).
Some times equilibrium thermodynamics and the equilibrium approximation are fine for describing as system. I have no problem with you working out the bulk atmospheric temperature by sticking a liver thermometer into a day old corpse. However, it would be in appropriate for me to establish room temperature by sticking a meat thermometer into your liver.
DeWitt, it does not take long after playing with the time series of temperature and atmospheric CO2 concentrations from the thesis to determine that, while it makes sense to properly (and I emphasis properly here) extract the deterministic signal from the temperature series and model the noise residuals, it makes no sense to even consider modeling the diminishingly small residuals after extracting the deterministic signal from the CO2 time series.
You commented that a quadratic signal with little noise would yield non stationary residuals on linear regression and that applies in spades to the CO2 series. Even after linearly segmenting the CO2 series with 4 calculated breakpoints the residuals of at least some of the segments remain non stationary. The problem on examining the residuals is that the noise level is much less than the deviations of the signal from a best fit straight line even though that straight line fits with a very high R^2 value.
I suspect if I add white and/or red noise to the CO2 series I would find the residuals from linear regression would result in rejecting the null hypothesis that the series had a unit root. I may do this just that to prove something to myself, but all the evidence is so convincing of a deterministic CO2 trend over the past 60 or so years there is no reason to consider the possibility of random walk and even one with drift. The series does not even look like a random walk.
DeWitt Payne (Comment #108183)
If x is a time series (or a ts function in R) you set xreg=time(x) and not x, i.e you give the auto.arima function the time points to do a linear regression. When I did this for the CO2 series the model parameters were the same as when I used the default, xreg=null.
DocMartyn,
*plonk*
Indeed Dimwit.
One knows that ocean temperature, ocean oxygen, oceanic organic carbon and oceanic inorganic carbon all have non-equilibrium levels from the organic soup of the seabed to the life rich surface.
The disequilibrium one observes, as a function of water depth, with the gasses O2, H2S, CH4, NO and N2O should provide you with some sort of clue as to whether CO2 in the atmosphere should be in ‘equilibrium’ with inorganic carbon at depth.
This might be over you head, but the physics that keeps a helium filled balloon aloft are quite different from those that describe the way a helicopter can hover in the air.
The balloon is in an local equilibrium, but the helicopters isn’t. Do you get it?
A helicopter isn’t a balloon and you can’t use equilibrium thermodynamics to describe a biotically drive steady state. In the same way a helicopter burns fuel to sustain its disequilibrium against the Earth gravity well the sunlight hitting plant photosystems sustain the disequilibrium of the Earths biotic gasses; like oxygen, methane and carbon dioxide.
Kenneth Fritsch (Comment #108200)
Add a little white noise to the CO2 series and suddenly the adf.test rejects for the null hypothesis that the series has a unit root.
Doc/Dewitt:
No “plonk” or “dimwitt” exchanges.
Re: Kenneth Fritsch (Jan 8 10:30),
I found an interesting anomaly in R. If I manually difference an exponential function, y=exp(x/30) where x is i in 1:156, with no noise added, adf.test fails to reject a unit root and kpss.test rejects stationarity for 5 differences at least, as expected. OTOH, the function ndiffs reports only two differences as does auto.arima.
How much is ‘a little’ white noise?
Sorry Lucia, I shouldn’t let these things get me down. I will try not to allow my feelings to get the better of me.
Let him think of helicopters as balloons.
DeWitt Payne (Comment #108212)
I used the monthly CO2 concentrations as anomalies for the period 1958-2012 where the units go from approximately minus 40 to plus 40 and white noise generated by rnorm in R with a mean=0 and sd=4.5. That produces a series a lot less noisy than the temperature series.
Re: Kenneth Fritsch (Jan 8 10:30),
And exactly what CO2 series are you testing? I’m using the 1850-2005 data as an anomaly so the range is 0.6 ppmv to 94 ppmv. To get adf.test close to rejecting even one unit root, I have to add white noise with an sd = 20. Using ndiffs, I get 1 difference at sd=5 or greater and 2 for sd =2 or less.
lucia,
There won’t be any more exchanges. Period. Problem solved.
DocMartyn (Comment #108186),
There is of course a rain of organic debris in the ocean, but most of the organic material (80-90%) is oxidized to CO2 in the top 1000 meters, with oxygen concentration reaching a minimum at that same ~1000 meter depth. The very deep ocean has much higher oxygen than the minimum oxygen level. (see: http://en.wikipedia.org/wiki/Oxygen_minimum_zone)
There can of course be anaerobic decomposition in sediments on the ocean floor, especially in shallower waters near continental shelves (mainly ~200 to ~1000 meters depth), where the flux of organic material is higher and the sedimentation time is not sufficient for complete oxidation. In these sediments methane does form, and that methane is converted to methane hydrate; the methane hydrate produced in these sediments does not make its way to the surface on less than geological time scales, and does not contribute to oxygen depletion at great depth (unless you are talking about oxygen depletion within the sediments themselves). The deep ocean is mostly well oxygenated due to the thermohaline circulation. There are a few places where oxygen can become completely depleted, but this is rare in the deep ocean.
SteveF, you are indeed most correct in your description. Organic matter is oxidized all the way down, initially using oxygen but switching to alternative oxidants in the the hypoxic region (from about 100 m.
There is nowhere on the planet where the oxygen in the atmosphere is in ‘equilibrium’ with oxygen in sea water.
The deepest layer of water has a higher level of oxygen than the layer above, coming as it did from the surface polar region. The fact we know that this water comes from the surface doesn’t worry people when they state that communication between the surface waters and the deep ocean is in slow ‘equilibrium’, with >decades generally banded about as the rate of exchange.
Now every piece of evidence we have shows that O2 and CO2 are not, and never have been, in equilibrium in the atmosphere and aquasphere. However, people still insist on treating this dynamic, biotic, steady state system as an equilibrium, so they can use simple equilibrium calculations.
DeWitt Payne (Comment #108216)
DeWitt, I suspect you recognize that the series I am using is from Mauna Loa, Hawaii from March 1958 to December 2012. The difference between what you found and what I found, depending on the time period covered, illustrates well the problems that we have been discussing.
I used a the CO2 record that approximates a straight line even though it is better represented by an exponential function. Those deviations from a straight line and the very low noise level lead to failure to reject for a unit root(s). Add a little white noise 4.5 units sd over a range of 80 units and the noise swamps out the deviations from a straight line and now ADF rejects for a unit root, i.e. the series is seen as trend stationary.
When you use the CO2 record, which I assume uses some ice core measurements going back to 1850 but of which I am not questioning the validity, you see more clearly the accelerated effect of increasing CO2 levels in the atmosphere which is better represented by some polynomial function or a segmented linear representation using the calculated breakpoints. If you ADF test for unit roots, like in the case closer to a straight line that I used, you will obtain deviations from that line which are very large compared to what I found, and then, of course, your case would require larger sd’s to swamp out the larger deviation effect.
Your case is closer to what you would see with temperature series without all the weather noise. CO2 levels do not have sufficient noise (some seasonal) to even consider modeling it and the evidence is overwhelming that the levels are deterministic. In contrast the temperature series have sufficient noise to make modeling that noise worthwhile in my judgment. The unit root test must be done with consideration of the shape of the underlying deterministic signal. Even then we know that a trend in a temperature time series should be deterministic just from the physics of the matter.
For illustration purposes I can take your 1850-2005 data and do breakpoints to find approximate linear trends which I would then add noise to. I suspect the added noise required to ADF test reject for the unit root to be in the range of 4 to 5 for those segments. If I have monthly data I might have sufficient data points for a reasonable ADF test, but for annual data I doubt it. Do you have a link to your CO2 data 1850-2005?
Re: Kenneth Fritsch (Jan 9 11:03),
https://dl.dropbox.com/u/91578766/CO2%20anomaly%20series.txt
That’s significantly smoother than the underlying data. It’s been interpolated, but not by me, to be annual too. Before 1959, it’s based on ice core data from two or three cores at Law Dome, I think. I could dig out the original data which was used to generate the annual data if you want. That will be noisier.
DeWitt Payne (Comment #108256)
DeWitt, I used the entire annual 1850-2005 series and fit it with a 4th order polynomial (R^=0.997) and then used the residuals and added white noise until ADF rejected the null hypothesis of a unit root. The minimum noise was approximately 4 units out of the series range of 95 units. Please note that I used that high order of a polynomial because I wanted to reduce the residuals for illustrative purposes not that I thought it was the correct model.
In the comments at Realclimate Kristoffer Rypdal has some interesting points. He seems to be supporting Franzke, and to some extent disagreeing with Rasmus who also comments in the thread.
He has several comments, the first is here:
http://www.realclimate.org/index.php/archives/2012/12/what-is-signal-and-what-is-noise/comment-page-1/#comment-313037