Climograms: Hurst? No Hurst? Trend? No Trend?

Over the past week, I’ve been discussing various topics with Demetris Koutsoyiannis. During the conversation, he pointed me to his recent paper “Hurst-Kolmogorov Dynamics and Uncertainty” which appeared in the June 2011 issue of Journal of Water Resources. The paper contains an interesting discussion mostly focused on modeling water resource needs and compares estimates of uncertainty obtained based on Hurst-Kolomogorov Dynamics vs. several other methods. My impression is Demetris’s intention is to make the case that it is wiser to predict water resource needs using Hurst-Kolmogorov Dynamics rather than using other methods.

Since my interest is not estimating water resource needs, I focused on subsidiary discussions which touch on questions like: What do the methods in this paper tell me about our ability to detect forced response in stochastic time series? If they tell me something, what is the uncertainty in our estimate of the force response in a stochastic time series? (Of course, I am also specifically concerned with global mean temperatures. But discussions can be more general.)

To clarify language a bit, I’ll turn to an equation. Suppose we have a time series for a measurable quantity “x”, which could be global mean temperature, but at the current level of abstraction could be the temperature a thermometer placed 1″ down from the top of the surface in a pot full of delicious soup of some sort. For the time being, let’s stick to soup. Suppose further that the magnitude of ‘x’ varies both because the rate of heat addition varies over time and as a result of “internal variability” inside the sauce. (So for example, if you watch soup just below a simmer you’ll see the surface move as a result of some upwelling of warm fluid. )

For such a system, one might suggest that the quantity “x” can be modeled as a sum of a forced response, $latex \displaystyle f(t)$, which varies as a function of the rate of heat addition $latex \displaystyle Q(t) $ and the internal variability $latex \displaystyle u(t) $ whose properties depend on the physical properties of the soup, the dimension and design of the pot, and possibly (unfortunately) the magnitude of the rate of heat addition to the soup. This can then be written as:

(1)$latex \displaystyle x(t)= f(t) +u(t) $

When modeled this way, the forced response would be viewed as deterministic and predictable based on $latex \displaystyle Q(t) $. In fact, food scientists designing and cooks count on the notion that temperature in the pot is at least somewhat predictable based on the supplied heating rate. But meanwhile, the exact temperature at the thermometer is unpredictable because the quantity $latex \displaystyle u(t) $ arising from internal variability is unpredictable.

So: my interest is focused on given a set of data, “Can we detect f(t)?”, “How well can we know f(t)? Is f(t) = 0, or so small that one should neglect it?” In contrast, my impression is Koutsoyiannis is somewhat more focused on “What’s the safest and best method of forecasting ‘x’ such that we can assure ourselves we’ve provided the proper level of water for the city of Athen’s?” The two questions are somewhat related since if we can predict $latex \displaystyle f(t)$ with confidence then the one can get better forecasts for “x(t)”. However, if one tricks oneself into believing they can predict $latex \displaystyle f(t)$ with confidence when this is not so, this can result in over-confidence in predictions and one can make grave errors.

My impression is the latter is one of D. Koutsoyyiannis points; it’s a point worth making. Having said that, I will now examine one of DK discussion of the Monthly time series for the lower troposphere and explain that — because I am concerned with a somewhat different question– I would examine that figure somewhat differently.

On page 489 of his paper, DK discusses the monthly temperature series for UAH; figure 10 shows the time series and the climacogram for the time series. I’ve inserted those to the right. DK discussed Figure 10 as follows:

The same behavior can be verified in several geophysical time series; examples are given in most related publications referenced herein. Two additional examples are depicted in Figure 10, which refers to the monthly lower tropospheric temperature, and in Figure 11, which refers to the monthly Atlantic Multidecadal Oscillation index. Both examples suggest consistency with HK behavior with a very high Hurst coefficient, H = 0.99.


When I read this and glanced at the figure, I admit that the Figure 10 seems to suggest consistency with HK. It does so in the following sense: On a log-log scale, the standard deviation of non-overlapping running means of scale “k” in the series appear to decay linearly. That is: You can put a straight red line through the blue diamonds in figure 10 and the fit looks — to use a technical term “pretty darn good”.

Nevertheless, I couldn’t help wondering: Suppose I take a more conventional view and assume that the time series can be described as a sum of a forced response, $latex \displaystyle f(t)$, and a stochastic process, $latex \displaystyle u(t)$, and I further assume that for $latex \displaystyle f(t)$ is fairly smooth, and approximate it as linear: $latex \displaystyle f(t)= mt + b $. I then find the best fit linear model to the data, and find the best stationary ARIMA(p,0,q) that fits $latex \displaystyle u(t)$. How would that climacogram look? Using the same standards of “looks pretty darn good”, would I conclude the climacogram “suggests” the data are the sum of a linear trend and a stationary “ARIMA(p,0,q)” model?

I think the answer is “yes”. More over, I will suggest that the climacogram for the ARIMA+Trend model looks better than the Hurst model.

My Climacograms
I’ll begin by creating a new climacogram. Because UAH seems to be in a transitional phase, I will substitute RSS TLT data for which I have 389 months of data. The maximum scale at which I can compute the climacogram is 2^7 (128); this permits me to compute two non-overlapping 128 month means and take the standard deviation based on these two means $latex \displaystyle \sigma^{(128)} $. The minimum scale for the standard deviation is 1. After selecting a start year, I computed the non-overlapping values of the standard deviation of running means $latex \displaystyle \sigma^{(k)} as a function of scales k at all powers of 2 from 1 to 2^7. I repeated for 3 choices of start year, each shown in a different color, and then computed the mean over all three cases by taking the square root of the sum of the squares of the standard deviations:

The heavy black line represents the mean over the three choices of start dates. Readers should also note the thin vertical line; data to the left of the black line represent data at scales included in DK’s paper. Those to the right are not included. When assessing data to the right of the vertical line it is worth knowing that for a given start year, the final point on my graph is a standard deviation based on only 2 samples, and so is contains a lot of noise. The point second from right is computed based on 4 samples; the third from right based on 8 and so on. So, points to the left contain less noise– a fact that can be verified using monte-carlo computations using synthetic data. Though noisy, I prefer to leave the points on the graph so as to permit readers to have some notion of where the points happen to fall.

Note that if we left off the three points to the right of the dashed line, or interpreted the information in that portion of the traces as “noise”, we might well be tempted to fit a straight line through the four points on the right. The line would fit rather well– and the graph would then seem to suggest Hurst-Kolmogorov behavior.

As I previously mentioned, I looked at this a different way. Instead of fitting a straight line to 4 points I wanted to know what a climacogram would look like if I assumed the data were “trend+ Arima”. So, I applied a least square fit to the data, used auto.arima (an R function) to find the best stationary ARIMA function that fits the data, and then created

a) a climacogram one would expect from $latex \displaystyle u(t) $ alone if it was estimated using the ARIMA process that best represents the residuals to the least squares fit to the RSS data; this is shown in dashed grey trace. This function is approximately equal to 0.17 at a scale of 1 and declines with scale.

b) the climacogram one would expect if the RSS was explained by the linear function with the trend from the linear fit to the RSS data and no natural variability. This is shown in dark blue dashed trace. The function is approximately equal to 0.13 at a scale of 1 and increases with scale.

c) the climacogram one would expect from the sum of the linear trend and natural variability $latex \displaystyle u(t) $ that is unaffected by the magnitude of the trend. This is shown by the dashed black trace.

I would suggest that if we compare the dark black trace representing the climacogram for the observations average over 3 possible start times to the black trace, the two have a strong similarity. Specifically, the two resemble each other but the observations are shifted to be somewhat lower. The slight shift may be explainable both by recognizing that the of data points on used to compute the standard deviation for the observation is only 128, and, moreover, the number of effective data points are even lower. So, this sample standard deviations can be expected to be both noisy and biased slightly low. Given this, the similarity in the shapes of the ‘theoretical’ climacogram for “trend+ARIMA” strike me as suggesting this model may well be correct and quite likely better than an HK model that simply posits that no trend exists and apparant trend can be explained by natural variability being Hurst with $latex \displaystyle \sigma^{(k)} =C k^{H-1} $

That said, though I strongly lean toward interpreting the data as having a trend, when I look at the climacogram computed based on observations of TLT from RSS, I think at best those interpreting this graph must admit the message in the graph is ambiguous . If you lean toward the conventional view that recent temperature do contain an upward trend caused by GHG’s and natural variability is likely to be short-term persistent (in the sense that ARIMA is adequate), then, the graph– including all 8 points–might seem to suggest the data are well explained by “trend + ARIMA”. If you lean toward the view that surface temperatures are likely to contain no trend, with apparent trend arising as a symptom of long term persistence (i.e. HK), then you can judge the last three points to be too noise to interpret, throw them away for that reason, fit a line through the remaining ones, and suggest the graph is evidence of HK and no deterministic trend.

Returning to the sorts of questions that interest me, I’ll repeat this one: “Is f(t) = 0, or so small that one should neglect it?” I would suggest that if we seek for explanations that do not automatically assume f(t)=0, then I think the climacogram of data from RSS suggests the temperature in the lower troposphere contains a positive deterministic signal. That is, the answer is: I think the graph suggests f(t) is sufficiently large to recognize as “true”. That is: I think the climagram looks is consistent with the heat addition manifesting itself as warming. Mind you: It would be nice to have a longer time series, so that we could better verify the upturn in standard deviation at a scale of 128, but, I think this climacogram suggests a positive deterministic (i.e. forced) trend in the data.

So, it seems seems to me the climacogram is consistent with believing that “climate change is happening” (i.e. there is a trend) and short-term persistence (i.e. ARIMA instead of Hurst) may be adequate model to describe natural variability in LTLs . Or, looked at another way: the climacogram says absolutely nothing to contradict the view that the data contain a deterministic warming trend. To my eyes, the climacogram’s features strongly resemble the shape I would expect if warming is ocurring

Now that you’ve all seen this climacogram, I bet some of you are thinking: Surface temperature records go back at least to 1900. What do those climagograms look like? After that, I can introduce ‘slopeagrams’ the nickname DK gave to some graphs of model results I ginned up for him. 🙂

218 thoughts on “Climograms: Hurst? No Hurst? Trend? No Trend?”

  1. Why is it either or? There could be (indeed we reasonably expect) some forced component to the data in the form of a trend, but an underlying trend need not be the only contribution to the apparent trend in the data. I think really the idea that climate behaves in an HK manner but also has underlying forced components is reasonable. Or does HK somehow exclude the possibility that “real” trends (as opposed to persistent behavior that looks like trends) exist in data?

  2. yip… just like physics class… you start off in one direction and end up somewhere that was in a whold different direction … made possible by magic which took place in the middle of the equasion…
    … in the mean time, if anyone has any concerns about water resources… feel free to run up to the banks of the Mississippi with a straw…

  3. Isn’t calling it H-K dynamics just another way of saying that the amplitudes of climate fluctuations obey an approximate 1/f relationship?

    That relation is actually easily seen in the temperature anomaly spectrum (and is pretty well known in the community).

    Certainly, if you have a 1/f type noise source and you try and model the measurement using white noise, you’re going to come up with the wrong confidence intervals.

  4. Lucia,

    Thanks for making public our discussions and for discussing my paper. You neatly describe my point that it is wiser to approach water resource planning and management using Hurst-Kolmogorov Dynamics rather than using other methods. In particular, as I show in the paper, a GCM-based approach may be dangerous as it may predict a future that is too stable and underestimate natural variability and uncertainty. Also, other simplistic deterministic approaches have similar problems.

    A few comments on your nice post:

    1. Your decomposition x(t) = r(t) + u(t) [note, I replaced your f(t) with r(t) to avoid confusion with f which in probability is reserved for pdf], albeit common, may be misleading. I would propose a different way to deal with the “forced response” r(t) on our process x(t). Specifically, I would propose to try to derive the conditional distribution F[x(t) | r(t)] and use this, instead of the marginal distribution F[x(t)]. In this way, we get rid of a “linear decomposition” assumption and the “signal plus noise” stereotype. The latter may misleading, because people usually think that “noise” is a small quantity, while geophysical processes have high natural variability, even (particulary?) on climatic time scales. Actually, I think that geophysical variability should not be called noise at all. When speaking of the current state of water resources and climate, my opinion is that F[x(t) | r(t)] may not be very different from F[x(t)], so it is most important first to find F[x(t)].

    2. Of course you can devise infinitely many models (processes) that could fit a time series (data) on a statistical basis. So, it is not a surprise that the sum of a linear trend and an ARIMA(p,0,q) model provides good fit. However, Occam’s razor tells us to be as parsimonious as possible in the model type and number of parameters. The Hurst-Kolmogorov model has a very simple structure and three parameters only: mean, standard deviation and Hurst coefficient. I doubt if you can find an equally parsimonious model making an equally good fit. Note, even a linear trend plus white noise model has already three parameters.

    3. Your linear function r(t) [or f(t) in your notation] is derived from the data a posteriori, and I doubt if it deserves the name “forced response” that you use. It is just a statistical fitting on the data. If you wish to use a more deterministic model for this, I would suggest using GCM trends rather than fittings on data–I have noticed that you are an expert on extracting trends from models. In my paper you discuss, I provide more explanation about the problems in identifying trends and shifts from time series a posteriori.

    Cordially,

    Demetris

    PS. For interested readers, the link for the paper is http://dx.doi.org/10.1111/j.1752-1688.2011.00543.x (the journal’s name is Journal of the American Water Resources Association)

  5. Carrick- You’ll definitely need better than white noise, but what noise model is appropriate? You seem to think pink noise (colloquially “red”) is appropriate, based on references to 1/f spectrum. Is there any particular reason why this works better than, say, Brownian noise?

  6. May I suggest that you ignore water resources. My reading (FWIW) – Cohn & Lins(2005), DK & Keenan (see Bishop Hill) – would suggest that the more important thing is to establish whether there exists a significant trend in the temperatures. I believe DK &c. show a ‘not proven’ (i.e. the null hypothesis wins) for ‘annual mean anomalies’ (whatever that means) and I was about to ask (plead with) you to try a similar analysis on monthly data.
    DK, I think, makes a valid point in his comment (sp. [2] – long live Occam, or at least the razor).
    Assuming an AR(1) [or , indeed an ARIMA(p,d,q)] is, at best, an (ahem!) assumption.

  7. Andrew_FL–

    Why is it either or?

    It’s not. It could be ‘HK & trend’. I think the appearance of the climacogram doesn’t exclude that but I happened to use ARIMA.

    Carrick–

    Isn’t calling it H-K dynamics just another way of saying that the amplitudes of climate fluctuations obey an approximate 1/f relationship?

    I’m not sure. I don’t know if there is a one-to-one relationship. The difficulty has to do with the extremes. I do know that if we were to create a create a climacogram of the ARIMA processes, in the limit where scale-> infinity, the standard deviation of ‘x’ eventually decays as 1/sqrt(1/2). This assures that the integral time scale of the process (integral of the autocorrelation) is finite. But this is also true of some Hurst processes.

    What I do think is that generally, Hurst processes have more energy as you approach f=0 than ARIMA processes with similar total power. But there is that difficulty with f=0 for 1/f process and I haven’t looked into it enough to know what happens to power spectra.

  8. As always, the problem is insufficient data. A short time series can be modeled more or less equally well by a trend plus noise, a unit root random walk or an H-K model. But there does seem to be evidence from longer term proxy series like ice cores that the standard deviation for higher values of k does not fall off the straight line as it would for an ARIMA (Markovian?) series.

  9. Demetris

    Specifically, I would propose to try to derive the conditional distribution F[x(t) | r(t)] and use this, instead of the marginal distribution F[x(t)]. In this way, we get rid of a “linear decomposition” assumption and the “signal plus noise” stereotype.

    I don’t think (1) implies “signal plus noise”. I’m not sure what you gain by the conditional distribution since in the decomposition if x(t) and u(t) do not interact (which one might assume in some problems) then F[x(t) | r(t)] is just the E[x(t) | r(t)] + F[u(t)] where F(u(t) is the cummulative distribution of u and “E” is expected value conditioned on the forcing r(t). After that, I can just write E[x(t) | r(t)]=g(t)– which is merely recognizing that E[x(t) | r(t)], while a function of forcing, appears as a function of time if I know r(t). Then, I get back exactly my equation (1) with function “g(t)” replacing F[x(t) | r(t)], and other than having decided not to use “f” to reserve it for pdf’s I have exactly the same equation.

    2) However, Occam’s razor tells us to be as parsimonious as possible in the model type and number of parameters.

    Occam’s razor use useful, but not dispositive. There are other principles to consider– like fitting in with what we know about radiative. Throwing widely recognized principles by arguing “Occam’s razor” to suggest a model that requires us to ignore a know cause-effect relationship is unwise. The fact that higher GHG’s should lead to warming iswell known even if GCM’s are imperfect.

    Your linear function r(t) [or f(t) in your notation] is derived from the data a posteriori,

    The magnitude of the trend is derived from data a posteriori. But the knowledge that the trend exist is priori not theorized on the basis of the data itself.

    Moreover, I could pull a trend out of climate models and repeat this. In that case, the trend would not be computed based on the models, and I’d still get a climacogram that “looks good”. If I restrict to AR1 I’ll use just as few fitting parameters (sd. ar1 parameter, mean) as you do and the fit will look “just as good”.

  10. Dewitt

    H-K dynamics includes persistence with higher fluctuations that is not well modeled in GCMs.

    I’m beginning to suspect GISS Model EH has HK natural variability. I’ll discuss that later. (I’d sent Demetris some “slope-o-grams” that I’d ginned up because of my suspicions about these things. I made the ‘slope-o-grams’ because of questions I had about Model EH, and I’ll be discussing them further later on.)

  11. Heretic–
    I agree it is important to establish whether a trend is statitically significant. But this must be done by doing tests that are not flawed at the outset.

    I think there is a flaw in Cohn&Lin’s that makes it not convincing. It has to do with their test specifically assume f(t) in my equation 1 is a linear with time throughout the thermometer record– it is not thought to be linear. So their test involves a null hypothesis (f(t)=0 in my question 1) vs. an alternative hypothesis no one believes to be true. Moreover, unlike assuming linear trends from 2000-2030 where piece wise linear is approximately true for the IPCC forecasts, linear is not remotely close to the IPCC hind case from 1900-2000. So, it’s not as if Cohn&Lin’s alternative hypothesis is even close to what people think might be correct.

    This has the potential for leading to mis-estimates in the HK parameters. I discussed this before– but possibly not as clearly as possible. (I think Cohn & Lins is a good paper, but I just am not convinced their finding is meaningful for the reason stated.)

    I think Keenan’s WSJ analysis is flawed by both assuming f(t) can only be linear and suggesting a non-stationary process for u(t). The first issue Keenan shares with Cohn and Lin’s. The second issue is unique to Keenan and — I think– violates known physics. If u(t) is “natural variability”, it must be stationary. (Koutsayiannis, and Conh&Lins do only use stationary processes for natural variabilty.)

  12. Heretic–
    I should add that I have some interest in discussing the significance of the 20th century trend– but I want to avoid doing something that is more wrong that the IPCC analysis Keenan criticized.

  13. Dewitt

    But there does seem to be evidence from longer term proxy series like ice cores that the standard deviation for higher values of k does not fall off the straight line as it would for an ARIMA (Markovian?) series.

    I discussed this with Demetris and…. I’m not convinced! But I would need the ice core data to explain why. Once again, it has to do with the possible proper form of the forced component. I think the conventional view is that milankovich cycles are affected by celestial mechanics and there are interesting things that happen to climacograms if you have (ARIMA + one long periodic function.)

  14. Demetris–

    If you wish to use a more deterministic model for this, I would suggest using GCM trends rather than fittings on data

    I compare trends to those trends in GCM’s. The GCM trends is the “null” hypothesis– so from the point of view of the test, the GCM trend is deterministic. This is a rather conventional way to test null hypothesis against data.

  15. Is disagreeing with the IPCC the ultimate sin?
    On a separate topic:
    I never thought I’d see you use argumentum ad populum. No-one?

    @Doug
    Einstein was right. But never forget ‘No amount of experimentation can ever prove me right; a single experiment can prove me wrong.’
    Tell that to Trenberth.

  16. Heretic

    Is disagreeing with the IPCC the ultimate sin?

    No. What I intend to communicate is that if a figure seems to be presented as refuting or rebutting the popular notion, one ought to consider how the figure would appear if the popular notion actually is true. In this case, the climacogram is entirely consistent with a fairly mainstream view which is that GHG’s cause at least some warming. It is also entirely consistent with “GHG’s cause warming and natural variability is short term persistent”. I may be entirely consistent with all sorts of views.

    Of course the climacogram may be consistent with a wide range of theories, but one must admit that it is also consistent with the conventional view.

  17. Sorry, but this crossed:
    GCM the null hypothesis?
    Mine is ‘no significant trend’ when I’m being fundamentalist, otherwise ‘no CO2 signal’ when I’m not. All the GCMs are flawed, aren’t they? (Clouds, oceans [PDO+AMO &c.], 21st. Century temps, & many, many other reasons)

  18. Einstein was right. But never forget ‘No amount of experimentation can ever prove me right; a single experiment can prove me wrong.’
    Tell that to Trenberth.

    What experiment would that be?

  19. Heretic–
    I always test IPCC models. So, I’m always doing stuff ad populum” in that sense.

    Mine is ‘no significant trend’ when I’m being fundamentalist,

    In frequentist statistics, when testing a hypothesis you always make it the “null”. Then you reject of fail to reject. So, yes to test IPCC trends, I make them the null. If you don’t make it the null for the test, you can’t test it!

    I’ve sometimes tested no warming– in that case, I make that the null. Other than being ‘the hypothesis being tested’, there really isn’t something spectacularly special about ‘the null’. However, it is true that the IPCC trends are in the vicinity of what many consider “true”. So, I am testing what is thought by many as “close true” compared to data.

    All the GCMs are flawed, aren’t they?

    Sure– to at least some extent they could be called either “flawed”, or “imperfect”. But under frequentist statistics, to test their results, you have to make them the null.

  20. BTW: For those wondering: The corrected AIC criterion says “ARIMA + Trend” is “better” than “fractional differencing and no trend”.

    AIC is a fairly conventional test to implement Einstein’s razor (quoted by David L. Hagen)

    “Make things as simple as possible, but not simpler.”

  21. Heretic (Comment #76728)
    June 6th, 2011 at 3:59 pm

    Sorry, but this crossed:
    GCM the null hypothesis?
    Mine is ‘no significant trend’ when I’m being fundamentalist, otherwise ‘no CO2 signal’ when I’m not. All the GCMs are flawed, aren’t they? (Clouds, oceans [PDO+AMO &c.], 21st. Century temps, & many, many other reasons)

    Of course they are ‘flawed’, the scientists have never said otherwise, no stopped research on improving them. You will find that all models are ‘flawed’, in fact, and research continues in all areas of science.

  22. Lucia,

    Interesting post. It seems that more than one noise model can fit the data reasonably well. Long term pseudo-cyclical temperature variation (apparent ~70 year period, apparent ~0.2C magnitude) over the last 140 years is indicative of the problems with this kind of analysis. Occum’s razor aside: we do expect some warming as a response to increasing radiative forcing. Without sufficient data to show otherwise, I am personally more comfortable with a mainly causal explanation… especially when there are clear theoretical justifications for that expectation.

  23. @lucia
    I imagine you hate it when argumentative commenters just disappear. You may not. Anyway:

    OK. I can’t argue.

    I think we disagree from whence we start. Anyroad.

  24. Lucia we have a very interesting post from Dr R Spencer
    http://www.drroyspencer.com/
    Satellite data Channel 5 is showing that there is no significant warming for the last 9 years. There you go the AGW is not happening its now official (my context)

  25. Re previous
    This quote from Dr Spencer
    “For now, though, I think the tropospheric (AMSU ch. 5) data are pretty clear: there are no signs of warming in the last nine years in those regions where the strongest warming in the last 30 to 40 years has occurred, that is, in the Northern Hemisphere mid- and high-latitudes. And, there might even be signs of recent cooling over the last few years in the mid-latitudes, but whether this will persist is anyone’s guess. ” That’s all Folks Disney Tunes!

  26. I imagine you hate it when argumentative commenters just disappear. You may not.

    It depends on what you call “argumentative”. I don’t consider you argumentative and don’t feel any desire for you to just disappear. I like people discussing flaws to what I might write. That said… there are certain people who do so in obnoxious ways. You aren’t one of them.

  27. Rebecca–
    When UAH posts I can discuss whether the cooling is statistically significant — I bet it’s not.

  28. SteveF

    It seems that more than one noise model can fit the data reasonably well. Long term pseudo-cyclical temperature variation (apparent ~70 year period, apparent ~0.2C magnitude) over the last 140 years is indicative of the problems with this kind of analysis.

    Bear in mind that ARIMA (aka “short term persistence”) does not preclude the spectral density function for the natural variability from having energy at 70 years. The “short term/long term” dichotomy has to do with the rate of decay of the autocorrelation function at large lag times. But “short term persistence” can have energy at quite long periodicities.

  29. lucia (Comment #76740),
    A pseudo-cyclical behavior near 70 years does not seem to me to require anything at all of a short term noise model, especially if that behavior is in fact causal in nature, rather than ‘noise’ based on a noise model. For example, variation in thermohaline circulation rate might drive the pseudo-70 year cycle.

  30. Lucia:I did not say cooling, but no warming ie flat etc…AGW means its supposed to KEEP warming as C02 rises, its not. There is no warming for all LTL and Mid troposphere for last 9 years. However if you want to play with the cooling part, you can analyse 400mb mid troposphere since 1979 to current I think you will find quite a significant cooling trend in that area. especially 2011 to date I would say very significant cooling currently but check it yourself
    AMSU site click on ALL years for the 400mb pressure temp band. I think you can get the data text files from site you probably know anyway…

  31. Rebecca–
    Nine year cooling trends are not inconsistent with some degree of longer term warming. They may be inconsistent with some particular projection for warming, but they aren’t inconsistent with, for example, 0.00001C/century of warming.

    Is the mid-troposphere supposed to warm? Or cool?

  32. For example, variation in thermohaline circulation rate might drive the pseudo-70 year cycle.

    Yes, but in my equation (1) this would be the “u” term– natural variability. It’s not caused by anyone changing forcings.

    It’s a bit like vortex shedding in a turbulent flow. These have fairly long time scales (relative to the turbulence). But the vortices aren’t shed as a result of any change in forcing.

    In contrast, something like the earth receiving greater or less insolation as a result of celestial mechanics could be viewed as causal– in the “f(t)” term in (1).

    That’s where I see the “natural” vs “forced” distinction.

  33. Lucia,

    I am not sure how to differentiate between ‘natural’ and ‘forced’ responses with so little historical data. The Roman warm period/Dark Ages/Medieval Warm Period/LIA/20th century variation (70 year cycle) history suggests substantial temperature variation on all time scales. Without other explanation (or a lot more data), I’m not sure what is ‘natural’ and what is ‘forced’; I don’t see how one can draw that distinction.

  34. SteveF–
    First: I think it’s important to separate two issues.
    1) There is a question of whether something is or is not a forced response and
    2) There is a question about whether we and scientists can correctly identify whether something is forced or natural.

    These are two different things. But once we have a conceptual definition of a forcing, then we can argue about whether something is or is not a forced response.

    I think conceptually the division between forced/ natural is based on the existence of forcing that are somehow seen as external to the response of the atmosphere, ocean etc. to forces external. So, if the planet earth moved closer to the sun and insolation increase: That’s a forcing. If it moves away: That’s a forcing. If ghg’s increase as a result of man’s action: forcing.

    Volcanoes are a bit touchy– but when looking at climate models, they are at least forcing on the model.

    Things like mean temperature, thickness of ice caps, the thermohaline circulation are all emergent. Things of this nature would exist even if the earth could somehow magically stick at a particular nearly constant forcing (or constant annual cycle.) El Nino and La Nina type oscillations would happen even if the external forcings were constant. I think the thermo-haline circulation could speed up or slow down or display quasi-period oscillations even if the external forcing stayed constant.

    If I’m correct about those being emergent properties which exists even if the earth’s external forcing stays the same, then they are all natural variability and not forced response of the sort I describe by f(t) in 1.

    On the 70 year cycle: As far as I am aware, people don’t suggest this is due to the earth getting closer or further from the sun or anything external to the climate system. The thermohaline circulation is part of the climate system and the 70 year cycle seem to be attributed to natural variability in that circulation pattern. If so, I would say it’s “natural variability” of the climate system– not “forced”.

    Mind you, I may be mis-indentifying the cause of the 70 year oscillation. If it turned out to be due to the earth’s passing through some region of magic pixie dust every 70 years, then it might be forced. But as far as I’m aware, most people suggest it’s just an emergent response of the climate system– and so natural variability albeit with a long time scale.

    With respect to forecasts of weather: Knowing something like the AMO, or NAO can help us with forecasts of weather. But I don’t think something being long term or helping us predict the future for a while turns it into a “forcing”.

    As for Roman warm period/ Dark ages etc: I don’t think you can figure out what’s forced and what’s natural by looking at the temperature series itself– and certainly not on the basis of data we have. I also don’t think we know the expected value of the response to forcing at any time back in history. This means that to some extent, figuring out the statistical properties of the natural response based on historical data may be well nigh impossible.

    But despite that, I think it’s fair to suggest that we know that right now there exists a positive forcing due to ghg’s, and we expect that will manifest itself to at last some positive trend in temperature. I think it’s also fair to suggest part of the milankovich cycles are forced in the sense that we know the earth travels around the sun, and its orbit and precession affect insolation. So, we do expect to see some contribution to the forced “f(t)” arising from that. I think we can say this even if we don’t know whether the Roman warm periods was “forced” or “natural” or a combination of both. (At least I certainly don’t know if those were “forced” or “natural”. )

  35. Lucia, what I would expect (from experience) is that if you go to low enough frequencies, you enter the “source” region, and the spectrum should flatten out.

    One place that a 1/f distribution (pink noise) (in power not amplitude) differs from a Hurst distribution is that the variance does not grow with integration period.

    As DeWitt observes, it may be the case that we have an insufficiency of data…the proxies suggest this 1/f characteristic continues to very low frequencies (periods of at least 1/55 yr–1).

    See e.g., this, which may be the data he was thinkingof.

    (Reference is here)

    What you’ll notice from the various spectra is that there are well defined peaks (which themselves obey a 1/f like spectrum_ overlying what appears to be a continuous lower-amplitude 1/f climate noise spectrum…. this is typical of what happens if the spectral peaks corresponds to modes (or pseudo-modes) of climate, e.g., the ENSO and PDO components being themselves driven by a 1/f noise spectrum.

    I would think if you model the noise correctly (and if what you are trying to measure is the long-term temperature trend, climate fluctuations are a source of noise), I think you’d get the uncertainty bounds correct.

    I would also think an ARIMA based model would be an appropriate way of analyzing this (all though it wouldn’t get the discrete part of the spectrum right).

    Hope this makes sense….

  36. Lucia:

    Carrick– that pdf seemed to be damaged. My mac wouldn’t open it.

    Then your mac is possessed. 😉 Actually it appears that Dropbox is malfunctioning….

    Anyway, try this.

  37. Without other explanation (or a lot more data), I’m not sure what is ‘natural’ and what is ‘forced’; I don’t see how one can draw that distinction.

    They spend a bit of time on that in the IPCC report. FAQ 9.2 Fig 1. They analyse the forcings using models and the recorded temperature. As several people have shown already, a simple forcing model to reproduce recent global average temperatures does track the record well.

  38. @lucia (#76738)
    You are gracious. I shall continue (as always intended) to lurk, read & learn (& where apposite, to comment).

  39. Carrick

    Lucia, what I would expect (from experience) is that if you go to low enough frequencies, you enter the “source” region, and the spectrum should flatten out.

    That’s what happens in turbulent flows. But I also think it might matters to identifying what is literally called “long term persistence” or “Hurst”. I think (not know) that whether or not it “matters” will depend on whether the very longest cycles present are literally forced or not. (So for example, if you are trying to infer LTP or not based on ice cores– are the milakovich cycles “forced”? Or are they “internal variability”. If they are forced and the spectrum has flattened long before you get to those frequencies, that will affect whether we really think an analysis treating those cycles as “internal” provides evidence of “LTP” in the sense that we should use LTP for planning now in the short term.

  40. lucia (Comment #76749),
    .
    Thanks for that thoughtful comment; it helps to clarify the issue.
    .
    I still wonder, (absent more data) about how to differentiate between a long term “emergent” behavior (say a hypothetical multi-hundred oceanic circulation pseudo-cycle) and a response of the system to a known external forcing (eg, the 120+ year rise in atmospheric CO2). Without data covering a period at least as long as the longest term emergent behavior, the response to an identified forcing and an unidentified emergent behavior seem to me impossible to separate.
    Which is not to say I think all (or even most) warming over the last 120 years was caused by a longer term pseudo-cyclical behavior. OTOH, I do think that credible information about quite long term temperature changes (MWP, LIA, etc) needs to be considered as a source of uncertainty about the measured response to a forcing.

  41. On the ice core record and H-K dynamics: Doesn’t a system with H-K properties have the potential of effectively amplifying small cyclic forcings? Or is that true for ARMA as well? Global annual insolation doesn’t vary much, if at all, over the Milankovitch cycles. It’s all based on changes in insolation at high latitudes, 65N e.g.. The correlation of the change in ice volume with the Milankovitch cycles looks pretty solid most of the time. The exception is the glacial/interglacial transition where the change in volume is much larger than the correlation elsewhere would predict.

  42. Dewitt–

    On the ice core record and H-K dynamics: Doesn’t a system with H-K properties have the potential of effectively amplifying small cyclic forcings? Or is that true for ARMA as well?

    I don’t know. All the “exploratory” montecarlo stuff I do assumes that the natural variability (“u” in eq. 1) is unaffected by the forcing and so independent of f(t) in equn 1.

    My guess– and it’s a guess– is whether or not this is true has little to do with whether the character of ‘u’ is white, STP, LTP. The argument I would advance is this:

    Consider a turbulent system (or one with deterministic chaos) that we can drive somehow. (Say a lava-lamp where we measure the temperature at a specific point and create a time series.)

    In such a system, we can adjust the power. If we set the power to a constant value, we’ll see some sort of “natural variaiblity”, u(t) with whatever character it has.
    If we set it to a different value of power, we know from experience with lava lamps that u(t) will change. In fact, if it’s a lava lamp, we expect that the power in u(t) will increase if we increase the forcings. I suspect this is true irrespective of nature of u(t)– that is, at least some properties of u(t) are affected by the magnitude of the forcings– and so– parametrically, some properties of u(t) must be affected by the magnitude of the current expected response to forcings.

    So, I’ve just explained why my assumption that u(t) is independent of the forcings (and f(t) in 1) is wrong.

    However, if we assume that the changes in forcing, f(t) are small relative to the absolute magnitude, then we might expect that the general character of u(t) is unaffected by f(t). This sort of assumption is not uncommon as a simplification. Obviously, it’s not “right”– and for all I know it could be way “wrong”. But it is one that could be tested against models– and what exploratory test I’ve done over the 20th and 21st centuries suggest that the assumption might be ok.

  43. DeWitt Payne (Comment #76777),
    The glacial/interglacial cycle appears to be a transition from one locally stable climate regime (dominated by high albedo snow/ice) to another locally stable regime (dominated by low albedo open land)….. the dreaded tipping point argument. It is interesting that any even moderately warmer period (say more than 3 million years ago, maybe 2C warmer than today), shows no sign of cyclical ice ages, even though Antarctica seems to have been continuously glaciated. http://www.globalwarmingart.com/images/d/d3/Five_Myr_Climate_Change_Rev.png

  44. Lucia, I think Demetris has eloquently expressed the point that I will now clumsily try to restate: If natural hydroclimatological processes possess LTP (+STP), which seems to be the case (see Demetris’s work and citations therein), then the patterns observed in recent hydroclimatic datasets are not particularly unusual. Choice of alternative hypothesis is irrelevant.

    As a result, I take strong exception to your comment that: “I think there is a flaw in Cohn&Lin’s that makes it not convincing. It has to do with their test specifically assume f(t) in my equation 1 is a linear with time throughout the thermometer record– it is not thought to be linear. So their test involves a null hypothesis (f(t)=0 in my question 1) vs. an alternative hypothesis no one believes to be true. Moreover, unlike assuming linear trends from 2000-2030 where piece wise linear is approximately true for the IPCC forecasts, linear is not remotely close to the IPCC hind case from 1900-2000. So, it’s not as if Cohn&Lin’s alternative hypothesis is even close to what people think might be correct.”

    To repeat: It makes no difference what alternative hypothesis is employed. If the observed data are unsurprising in the context of LTP+STP, then, if you believe hydroclimatological processess possess LTP+STP, we have an explanation for what we’ve observed.

    Could there be a better explanation — for example, one related to some forcing (greenhouse gas; solar; etc.)? Of course! I have no argument with that.

    However, it is still absurd to say “the recent record can only be explained by xxx” when, in fact, the observed record contains nothing exceptional demanding explanation.

  45. Lucia:

    Ok. It looks like the spectrum may flatten above 0.2 1/year. (So 5 years and longer.) If it does, I think this “matters” for the purpose of identifying what we call LTP.

    I believe it is an artifact of how the spectral periodogram was being produced (I was using a shorter time window with a zero pad factor of four, I think it was probably 20-years, but not sure anymore), and more importantly the data are detrended in each window (so that they are constrained to go through zero at f=0).

    Here are the same data (GISTEMP) with a 100-year window and a zero-pad factor of 8. I’ve put a knee-point in at 0.2 years-1, and what I find is the data “on paper” actually have an increased slope:

    Spectrum here.

    However, I would interpret that as a shift from primarily a continuous spectrum in the GISTEMP series (which itself is likely to be partially artifactual…I believe it is obliterating the sub-annual harmonics of he annual forcings) to a spectrum dominated by quasi-stable climate oscillations (and external forcings like the 22-year solar cycle).

    If you treat it as a two different 1/f series with your 0.2 yr-1 break point,
    this is what you see.

  46. Lucia, great article and comments! DK’s stuff is very thought provoking. I wish I had more time to ponder your article and this discussion.

  47. Tim–

    think Demetris has eloquently expressed the point that I will now clumsily try to restate: If natural hydroclimatological processes possess LTP (+STP), which seems to be the case

    I don’t disagree with this. My point is not that the process is not LTP, but rather that climatograms give extremely ambiguous information. This is particularly true if you are trying to determine whether or not a trend actually exists.

    That someone might wish to plan for the future on the basis that the trend does not exist does not bother me. But to suggest that climatograms like the one from RSS actually provides supporting evidence of LTP is inappropriate. It does not. Worse, to suggest it supplies supporting evidence for no trend is absurd.

    The RSS climatogram is perfectly consistent with a trend– whether or not the natural variability is LTP or STP. It is also perfeclty consistent with a trend of comparable magnitudes to those expected to be induced by AGW.

    To repeat: It makes no difference what alternative hypothesis is employed.

    Sorry, but I think you are wrong. Beginning a sentence with “to repeat” is not the same as providing an argument to support the claim you repeat.

    But you now go on to provide a “rebuttal” that does not address what I am trying to explain. What you write is this:

    If the observed data are unsurprising in the context of LTP+STP, then, if you believe hydroclimatological processess possess LTP+STP, we have an explanation for what we’ve observed.

    You do this to an extent. But it’s incomplete.

    You only show that if you both assume
    (1) the alternative to f(t)=0 is f(t) is linear and
    (2) then assume that all deviations from linear are natural variations then you can explain the rise as being due to natural variations.

    However, since no one thinks (1), this means your method is bound to overstate the power in the natural variations (u) and also the relative amount of power at longer frequencies. This will tend to result in more “fail to rejects”.

    You do in fact do (1)– and there is no getting around the fact that if f(t) is not linear, then your method is “overcounting” the power in natural variability. If represented as a demonstration that the conventional view– that f(t) has tended to be increasing temperatures and that it is not linear in time, this is a flaw in your approach.

    This is not to say that what you did is not valuable. It is just as valuable– or more– than the similar proof of statistical significance in the AR1. But both are to some extent flawed– and it would be nice to see a better analysis.

    “the recent record can only be explained by xxx” when, in fact, the observed record contains nothing exceptional demanding explanation

    Well, I hvaen’t said the recent record can only be explained by xxx. But I also think your analysis has a short coming, which you could improve if you had interest to do so, funding to do so and so on. I’m not trying to ream you out when saying it is flawed– I am only saying that when presented as evidence that warming falls within the range of natural variability, my view is your analysis is suggestive but not fully convincing. “Not fully convincing” is all I mean by “flawed”.

    BTW: I think suggestive makes it sufficient to be interesting as a publication and it’s a good thing it was published. The analysis in the AR4 is also just “suggestive”.

  48. Tim–To elaborate a bit.

    Suppose we do separate what we consider “forced” response from “natural” as I suggested in equation (1)

    (1)$latex \displaystyle x(t)= f(t) +u(t) $

    When you use the approximation fa(t)=mt+b, but the “true” forced response is something else, then 1 can be written

    (1)$latex \displaystyle x(t)= mt+b + [ f – f_a](t)] +u(t) $

    When you now throw this is any subroutine, to analyze the residuals, those residuals will be partly the result of internal variability, i.e. u, and partly the result of [ f – fa](t)].

    Because the prevailing notion is that f(t) is highly nolinear, if the prevailing notion is correct the features of the residuals of your fit can be greatly affected by the [ f – fa](t)].

    More specifically, if the prevailing notion is correct then it can be shown that:
    1) The rms of your residuals will tend to be higher than those for true natural variability,
    2) Because the [ f – fa](t)] is positively temporaly autocorrelated and rather persistent, the approximation will introduce autocorrelation at large scales even if that is not present in the ‘internal variability’.
    3) It doesn’t matter whether the internal variability is white, ARIMA or LTP, these two factors combined will tend to result in excess “fail to reject” when the prevailing notion is correct.

    This is why to test the prevailing notion, you need to actually test the prevailing notion, not test an alternative hypothesis that clearly differs from the prevailing notion.

  49. DeWitt Payne (Comment #76782),
    By tipping point, I meant the transitions from glacial to interglacial states over the last million years, when the Isthmus was closed all that time (I think). Maybe the transition from the 41000 year cycles to the 100,000 year cycles represents when Isthmus changed from mostly to fully closed.

  50. Thanks, very interesting discussion indeed. Clear, pointed, and good tempered with it. Super.

  51. Lucia — Just to be clear: I agree that the RSS dataset does (almost) nothing to prove LTP, and we draw no inference about LTP from the RSS dataset. We assume LTP because, as noted by Hurst, Mandelbrot, Koutsoyiannis and countless others, large natural hydroclimatological systems seem to exhibit LTP. In the 2005 paper we note: “Given
    the LTP-like patterns we see in longer hydroclimatological records, however,
    such as the periods of multidecadal drought that occurred
    during the past millennium and our planet’s geologic history
    of ice ages and sea level changes, it might be prudent to
    assume that hydroclimatological processes could possess LTP.”)

    In the 2005 paper we loosely define the alternative hypothesis, “trend,” as “an
    upward or downward tendency in the data over time” — which we then tighten-up in our model as a linear function. That does raise a question: Would we have gotten different results if we had used some other function to represent “trend”? Actually, we did conduct the study on ranks (and zscores of ranks) as well, and got nearly identical results. Does that clarify the point about why the alternative hypothesis does not matter (so long as the alternative hypothesis involves “trend” as defined above)?

    That does not mean one cannot employ a complex alternative hypothesis, possibly based on observed data. If one had an uncalibrated physically based model for the trend, this would be an appropriate way to proceed. However, we are usually looking at models calibrated to the data under consideration, and in that case Demetris’s point about Occam’s Razor becomes relevant — as do other concerns.

  52. Actually, we did conduct the study on ranks (and zscores of ranks) as well, and got nearly identical results. Does that clarify the point about why the alternative hypothesis does not matter (so long as the alternative hypothesis involves “trend” as defined above)?

    It might clarify if that work was shown in the paper. However, it’s not in the paper, so I can’t know the details that would permit me to say whether what you claim about the form of the trend does or does not matter.

    All I know is what is presented somewhere where I can read it. (The paper, a blog post etc.)

    in that case Demetris’s point about Occam’s Razor becomes relevant — as do other concerns

    I think David Hagen’s response about Einstein’s razor is relevant.

    FWIW, It would be one thing to point ot Occams razor if the LTP + no trend fit very, very, very well.

    But the fact is, those climagrams only show a qualitative agreement and equally good or better qualitative agreements happen if we include a trend. It happens that if we seek first to the RSS data, AIC for “ARIMA + Trend” is considerably better than “Pure LTP”. AIC is specifically designed to deal with the fact that the simplest possible explanation may be simple, but can also be wrong. So suggesting Occam’s razor is a rule that somehow strongly points to our preferring LTP for temperature series seems rather odd to me. That water levels in the NILE might look LTP not-withstanding, it would be nice to read a better argument.

    Now… let me go continue to post my slope-o-grams which do provide very strong evidence that many climate models are LTP.

  53. Tim Cohn–

    That does not mean one cannot employ a complex alternative hypothesis, possibly based on observed data.

    You don’t need to construct an the alternative hypothesis based on observed data. You can use a process in the vicinity of what modelers have suggested– the a smoothed version of the multi-model mean from the IPCC. In that case your alternative hypothesis is connected to the form that is thought to exist.

    It’s true the alternative hypothesis might be wrong — but at least in that case you would be rejecting or failing to reject ‘no trend’ with internal variability that has power and properties that would would be true if an alternative that is close to what people believe is true were true. (It’s conventional to estimate the ‘noise’ or ‘error’ based on the alternative being true.)

  54. lucia (Comment #76719) wrote:

    I think Keenan’s WSJ analysis is flawed by both assuming f(t) can only be linear and suggesting a non-stationary process for u(t).

    That is an incorrect characterization of my WSJ analysis. I did not assume f(t) can only be linear; the IPCC and CCSP did. Also, an example of a time series that had to be differenced (and hence was non-stationary) was given in the article–global ice volume. Maybe ice volume has some relation with temperature?

    For my WSJ article, see
    http://www.informath.org/media/a42.htm
    The last line links to R programs, which give a full implementation.

    Lucia, you might study that more.

  55. Douglas–

    . Also, an example of a time series that had to be differenced

    You diferenced once.

    Your method permits the internal variability to be non-stationary. That’s what violate physics. The mean trend can be non-stationary without violating physics. I’ve read your R program– studying more isn’t going to make your analysis permitting the non-stationary portion of the process to be processes to be useful.

    What’s your point about ‘ice’?

  56. Douglass to be specific this violates physics:

    Herein, the IPCC/CCSP model is compared, via AICc, to a driftless ARIMA(3,1,0) model.

    If your point is the IPCC’s model ain’t great, ok. But d=1 with no trend cannot be right. It doesn’t matter how many statisticians you consult, it’s just not in the category of processes that could possibly describe natural variations.

  57. ARIMA(3,1,0), wasn’t that Beenstock and Reingewertz’s model? I’m reasonably sure it was one of the unit root guys.

    VS was the (3,1,0) guy while B&R were (0,1,2). That should be B&V, Breusch and Vahid, rather than B&R.

  58. Dewitt-

    Yes. The slope-o-gram of Brownian Motions (a.k.a. Random Walk) is -0.5. Note above the slope is -0.6. These get very noisy, particularly as the integral time scale explodes (integral of autocorrelation). This happens for Brownian motion.

    (You’ll catch me sometimes accidentally multiplying by 2 because initially, I did fits to variances instead of standard deviations. So…. I’ll try to be careful! )

  59. I have found this thread to be a thought provoking one.

    I agree with DeWitt that we need to look at longer term temperature/climate series to determine whether we have LTP. I have looked at reconstructions and longer climate model series and even there it is not that clear that we can fit LTP or some other model like arima(1,1,1) or (0,1,2). As I recall the fractional differencing (arfima) almost always gave a better fit but I do not recall at this moment if the difference in fits was significant.

    Of course the problem, with reconstructions, besides the other problems associated with being reasonable proxies for temperatures and cherry picking, is that they do not necessarily reproduce the instrumental period very well or do not even cover it well in the period of the accelerated warming.

    Also, Lucia when you argue against assumptions of a linear fit as the null hypothesis with the assumption that the historical instrumental temperature series is not linear are not you under the onus to say how non-linear and with what certainity. A little bit nonlinear is almost linear.

  60. Kenneth–

    Also, Lucia when you argue against assumptions of a linear fit as the null hypothesis with the assumption that the historical instrumental temperature series is not linear are not you under the onus to say how non-linear and with what certainity. A little bit nonlinear is almost linear.

    Sort of yes and sort of no. It’s perfectly fair to mention that something is missing in the analysis or that something that needs to be attended to and discussed was not. I don’t have to actually solve that problem to make this observation, and I don’t have to solve it to remain unconvinced by an analysis that is missing that which is required to convince me it showed much.

    If Tim Cohn says all he shows is that if one assumes that trend is linear, he can show that assuming the alternative to no underlying trend is underlying trend that is linear with the resisduals treated as “natural variability” then, he has shown that can be explained: My response is yes.

    But I’m only saying the prevailing notion is the forced response (i.e. underlying trend) during the 20th century is non-linear. It seems to me the prevailing notion is the red trace below:


    Its not linear.

    Have I quantified how much difference using that as the alternative hypothesis would make: No.

    But I can observe the red line is relevant alternative hypothesis that some people believe (or at least come close to believing). I can also observe that his method will
    1) over state the power in residuals relative to the alternative hypothesis that
    2) introduce long term correlation in the residuals.
    Relative to natural variability.

    I don’t have to quantify or solve the problem to observe this and I don’t need to solve the problem to point out that C&L don’t show that the prevailing view has been shown to fall inside natural variability.

    They haven’t done so. Theyv’e done something interesting– but the have not shown the observed warming falls inside the range of natural variability. If they (or someone) wants to show that, they need to modify the method of analysis.

    I can say this without quantifying. The response could be for someone to quantify and show that the magnitude of the thing I’m worried about is tiny– but until they do, I can certainly continue expressing my opinion that it looks like it would not be tiny and for that reason I remain unconvinced about larger claims. Meanwhile, if you want to be convinced they’ve shown the warming shown falls inside natural variabilty, that’s ok with me. But I’m certainly going to tell you I disagree and state why.

  61. “But I’m only saying the prevailing notion is the forced response (i.e. underlying trend) during the 20th century is non-linear.”

    A prevailing notion without anyone bothering to calculate some CLs?

    “..I can certainly continue expressing my opinion that it looks like it would not be tiny and for that reason I remain unconvinced about larger claims.”

    That is a near certainty. Your point about non linearity and how it can be confused with the residuals in determining a linear trend is well taken and thought through. My point is that scientists continue to talk about linear trends in temperatures and those who do not have not been very clear (in my reading of the literature) about the details of a non linear trend.

    Model trends into the future, as I recollect, tend to be near linear. Why is that?

  62. What about segmented linear trends with breakpoints? Do they fit anyone’s models of the instrumental period of temperatures?

  63. Kenneth–
    Beats me. Why do you ask?

    My point is that scientists continue to talk about linear trends in temperatures and those who do not have not been very clear (in my reading of the literature) about the details of a non linear trend.

    Maybe. Or not. Talking about linear trends can make sense in some contexts. But it’s important to recognizing that a test of the significance of a linear trend is not interchangeable with a test of whether the warming we have seen is explicable by the level of natural variability that we would expect if climate models correctly estimate the underlying trend. (There is an odd assymetry with respect to the answer you get. What the assumption tends to do is make uncertainty intervals too large. So you get too many “fail to rejects”.)

    But generally, as mater of literal truth, I don’t think any climate scientists believes that the “underlying trend” during the 20th century– that is the forced response to the accumulating Tyndall gases along with solar forcing and volcanic eruptions was linear or even close to it. So you have to be careful when you use it.

    Model trends into the future, as I recollect, tend to be near linear. Why is that?

    1) No volcanic eruptions.
    2) Slowly varying and mostly increasing forcing functions. They don’t introduce aerosols to make forcing go negative for a while, then turn positive again.
    3) Current level of forcing above what would be the pseudo-equilibrium level. So, temperatures tend to move smoothly up even if there are slight variations in increases in forcing from year to year.
    4) Possibly other reasons I can’t think of.

    But the fact is, the “underlying trend” in the projections is quite smooth and nearly linear for the first 30 years of this century. The “hindcast” is not linear especially not over the full century.

  64. Oh gosh, more commentary on the blogosphere about DK and I’m missing out on it. I’m a bit late to the party, I think. Also, I think (evidenced by later posts) Lucia has more up her sleeve than this post lets on to, so I might be jumping to conclusions, but here is my 2 cents worth.

    I like to think about things in frequency terms, it is natural to me as I spend a lot of time in frequency space, and it is more intuitive to me. Intuition is a dangerous thing though, so drawing on this too much could be a bad idea!

    In frequency terms, the difference between an autoregressive random series and a HK random series is all captured at the low frequencies. Essentially, an autoregressive random series will have some time constant, and in frequency space, frequencies below the reciprocal of this time constant will be flat, and above this, they will roll off at a constant dB / octave (i.e, a straight line in log-log space). A HK series, to the right of the time constant, looks similar to the autoreggressive time series, but to the left, looks very different (continuing the straight line in log-log space rather than flattening out).

    From this perspective, an autoregressive time series which is shorter than the time constant will be almost indistinguishable from a HK time series. In order to tell the difference between them, you really need to look at a series much longer than the time constant in question. But then, if you see your autoregressive function becomes a poor match at that length, you can always make your time constant longer.

    This is really the problem we run into. If we only have a short time series (say, 30, 100 years), what is the correct time constant to apply? Is it 100-200 years? Or perhap a thousand? A million? Hundreds of millions? Certainly, the ice core records suggest more than 100k years, geographical records perhaps 100Myears.

    Furthermore, as the time constant becomes very large, the confidence intervals and uncertainty become more Hurst-like anyway.

    So our ARIMA model has an implicit parameter, the time constant, which can be used to make the series look Hurst-like, but for which we cannot reasonably place a value on from such a short time series. We cannot be confident that the value is right even to several orders of magnitude. Better, I think, to use a model which does not have the parameter, than one in which we use a parameter that we are so far in error for.

    But this is merely a re-expression of Occam’s razor. And in this sense, it is a strong argument (your data cannot support a meaningful estimate of the time constant necessary for your analysis).

    If you wished to stand ground and say – well, it might be ARIMA with some (unknown) deterministic forcing, or it might be ARIMA with a large time constant, or it might be HK – we just don’t know yet – then I would recommend, for confidence intervals at least, adopting the one with the largest CIs. Which would, most likely, be achieved by assuming HK dynamics.

  65. Spence_UK
    1. One of my main points is that if there is a trend this will automatically make an ARIMA model look Hurst like. Find the dashed blue line in the figure above.

    Better, I think, to use a model which does not have the parameter, than one in which we use a parameter that we are so far in error for.

    What is “better” depends on what one is trying to learn. If you are trying to detect whether or not a forced term exists, it is not “better” to rely on a method that reduces the statistical power of a test by making the residuals a) look hurst like, b) increase the apparent hurst parameter above whatever level it is, c) increase the power in the residuals and basically do everything possible to make it difficult to detect the existence of a true trend.

    In short: if the entire purpose of the analysis is to determine whether the trend is or is not there it is not “better” to use a method that increases type II error and then suggest that “fail to reject” should be taken to mean that the trend does not exist. The purpose of doing an analys should be to try to identify the most likely correct answer not to minimize all instances of type I error to levels well below the stated quantity (e.g. type I).

    Better, I think, to use a model which does not have the parameter, than one in which we use a parameter that we are so far in error for.

    I’m not sure I agree, but,for what it’s worth, Hurst noise has parameters. You need to know the standard deviation at scale (1), the Hurst parameter and if the process has a non-sero mean, you need to know that. All can be mis-estimated.
    My concern is not that the natural variations are not “HK”. My concern is the method will a) tend to over estimate the HK parameter, b) to over estimate the standard deviation at scale (1).

    then I would recommend, for confidence intervals at least, adopting the one with the largest CIs.

    Oddly enough, that’s what I do when I post my confidence intervals for 10 year trends. The uncertainty intervals computed based on 10 years worth of data are often larger when I model the residuals as ARIMA rather than using FRACDIFF ones which assume Hurst (partial differencing.)

  66. Lucia,

    As noted above, it is possible for an ARIMA model to match LTP quite well, so it is entirely possible that you have got a good match here. As the series that you have set your parameters to contains no information about lower frequency terms, that may be through chance alone.

    Comparatively, although the Hurst exponent is indeed difficult to estimate, the evidence (which relies on much more than just the instrumental records) seems to suggest much greater consistency across scales. This is a huge advantage.

    As for the better model, the better model is the one that best captures natural variability, on all scales. As Dr Cohn has shown, getting the scales from 1kyr out to 100Myr (and beyond) right has a surprisingly large impact on what we might expect in the last 100 years. Getting these right is important. I don’t see how your analysis can capture this, although perhaps you have more analyses planned. I am busy just now, but will try to keep half an eye out for further posts.

  67. Lucia:

    Your method permits the internal variability to be non-stationary

    I don’t get it.

    A physical, nonlinear system can have its internal variability scale with amplitude. Can it not?

  68. Carrick–

    A physical, nonlinear system can have its internal variability scale with amplitude. Can it not?

    Hmmm… I guess. But do you think that’s what d=1 is doing? After differencing, as the steps heteroskedastic (i.e. increasing in size.) We could look at that.

    I think he’s just got a random walk/diffusion process. I don’t think that’s the same as internal variability scaling with amplitude. Plus also, he’s got zero trend. If amplitude doesn’t change, then how does scaling with amplitude do anything at all?

  69. SpenceUK

    As the series that you have set your parameters to contains no information about lower frequency terms, that may be through chance alone.

    Maybe. But here’s the thing: no matter what the patter of the persistence, my main issues is what the trend does to the shape of the climacogram. What I want you to focus on is this:

    Notice the blue dashed line which represents the contribution to the trend from the linear trend. That blue dashed line is concave upward. Meanwhile, the ARIMA is concave downward. If we had lots and lots and lots of data, the climagogram, an ARIMA and a very small trend, we would get a climacogram that was initially concave downward, then linear, then concave upward, then increasing linear.

    The concave upward ending with a distinct uptick at the end of the series suggests a trend . OR it could be noise. Nevertheless, I think it is worth nothing that this portion does exist. It exists in the portion of the graph that DK omits in his papers. It’s omitted to some extent for good reason, but I’ve made a whole bunch of these using montecarlo at home and they aren’t as hugely noisy as one might think. So, the turn to “concave up-end with uptick” is a pretty suggestive of “something + trend” rather than “Hurst”.

    I’m clearly going to have to show some graphs of GISSTemp and emphasize the “concave upward” issue, because people aren’t getting that so much.

  70. the evidence (which relies on much more than just the instrumental records) seems to suggest much greater consistency across scales. This is a huge advantage

    Owing in large part to this “not diplaying the last 4 points” issue, I’m dubious of evidence based on figures I have not made myself. If you could point me to the “evidence” along with the underlying data I would like to make my own climacogram and comment. Because I’ve done some “toy” stuff and there are huge potential issues with doing this with vostok cores where one might be mixing a deterministic signal with “noise”.

  71. Lucia, if you add CO2 to the atmosphere, you could argue that the internal variability should increase with increased CO2 concentration. (Since this is a form of parametric forcing, the system will respond nonlinearly to it, regardless of whether the underlying system is linear.)

    This type of increased internal variability might show up for example in the strength of Hadley cell convection…in fact it is seen. Another would be ENSO amplitude. (Exercise left to the reader.)

    Perhaps Douglas could clarify what he was thinking. I was left with some doubts…

  72. Carrick–
    But I’m not sure differencing is the way you deal with that. Isn’t that an issue of heteroskedasticity? Not differencing?
    The differencing is a random walk– like diffusion. The ARIMA function doesn’t permit the character of of the residuals to change over the time series. It only permits it to drift– in a diffusion process.

    So, to turn to a diffusion analysis, your notion of CO2 changing the character of internal variability is the equavalent to letting u(t) have different properties as a function of (t). (Parametrically of course.)

    Here’s a thought problem: Le’ts concoct a Brownian motion problem, where you could suggest that parametrically. Suppose you sprinkled dust on the surface of a bathtub with a temperature gradient. So, you expect the agitation by molecues to be stronger where the water is warm, and less where it is cool. Now, you particles move, noting the variance total distances fro start point E(XX)(t) increase over time. But you might see a departure from what you expect if the temperture in the tub was constant, and you might speculate this is because E(u’u’)(t) vary if the dust particle moved to a location in the bathtub with a different temperature.

    This would be a plausible physical argument. So yes, the properties of u’ can vary with temperature of the bath tub water– just as in your hypo about CO2.

    BUT, is this what d=1 does in ARIMA? I don’t think so. The d=1 in ARIMA would be required in the classic Brownian motion problem with constant temperature. Because d=1 permits us to account for the fact that X(i+1)-X(i) is white noise and so X is a cummulative process (and non stationary).

    What d=1 does not do is to account for the possibility that the properties of of E(uu)(t) vary with t. That is, it does not account for the possibility of “heteroskedasticity”.

    So, even though your argument that CO2 might affect the properties of the noise is plausible, and in fact, CO2 may very well affect the properties of the noise– it does not justify “d=1” and “driftless”. Because that’s not what d=1 accounts for. (Or at least I think it does not.)

  73. Thanks, Lucia. I’ll try process this after I’ve had something to eat!

    Carrick

  74. lucia (comments #76810, #76811, #76864)

    If I understand correctly, you no longer have statistics-based criticisms of my WSJ essay. Rather, your criticism is based on physical realism of the alternative model–a driftless ARIMA(3,1,0). I do not have much competence in physics, but following are some thoughts.

    Your criticism, as I understand it, is that the ARIMA model lacks physical realism because the variance grows without bound as time increases and no physical process is like that. The physics here is correct, but unimportant. Consider the IPCC model: its mean grows without bound as time increases and no physical process is like that; so is the IPCC model physically unrealistic for that reason?

    In reality, both models are only assumed to be valid over some limited time, say centuries. This is like assuming an area of land is flat: a useful approximation over a limited space, e.g. England, but lousy over the whole Earth.

    Moreover, the WSJ essay was not proposing ARIMA as a viable model. It was merely using the ARIMA model to show that the IPCC model is not viable. A driftless ARIMA(3,1,0) is literally 1000 times better that the IPCC model (in a sense made precise in the Supplement). That is enough to reject the IPCC model.

    There is also a more general issue here. Suppose that we had two models, M1 and M2. Model M1 is extremely realistic physically and model M2 is extremely unrealistic physically. Yet model M2 fits the data 10,000 times better than model M1 (more precisely, the probability, as determined by AICc, that model M1 minimizes the information loss is 1/10^4, and the probability for model M2 is 1-1/10^4). Shouldn’t we conclude that there are solid grounds for rejecting both models?

  75. Doug–

    If I understand correctly, you no longer have statistics-based criticisms of my WSJ essay. Rather, your criticism is based on physical realism of the alternative model–a driftless ARIMA(3,1,0).

    First: In context, recall I was responding to Heretic who made claims about what you showed which you seem to not make. Specifically, you say “Moreover, the WSJ essay was not proposing ARIMA as a viable model. It was merely using the ARIMA model to show that the IPCC model is not viable. ”

    My response that your analysis is flawed was in context of Heretic’s suggestion that

    May I suggest that you ignore water resources. My reading (FWIW) – Cohn & Lins(2005), DK & Keenan (see Bishop Hill) – would suggest that the more important thing is to establish whether there exists a significant trend in the temperatures. I believe DK &c. show a ‘not proven’ (i.e. the null hypothesis wins) for ‘annual mean anomalies’ (whatever that means) and I was about to ask (plead with) you to try a similar analysis on monthly data.

    My statement that your analysis is ‘flawed’ was meant to say that if one makes claims your analysis shows what Heretic’s claims above, then, with respect to demonstrating the claim in italics, your analysis is flawed. That’s what I meant– and I continue to think if one thinks it shows what Heretic claims, your analysis falls short of that.

    Also, if someone like, say, you starts a blog post like this
    Two months ago, I published an op-ed piece in The Wall Street Journal. The piece discussed the record of global temperatures, illustrated in the figure.
    and ends it with this:

    In other words, it is unlikely that we will be able to find any empirical evidence for significant global warming. The case for global warming therefore rests almost entirely on computer simulations of the climate.”

    I would say your analysis in the WSJ doesn’t come close to showing this sort of thing either.

    This is not the same as saying the analysis if flawed in all possible ways. If, one makes the much more modest claim that you are showing the IPCC model is not viable, your WSJ article is maybe ok. (There are many ways to show the IPCC model is not viable. I don’t think showing a physically unrealistic model fits better from a statistics point of view is the best way to do that– but the IPCC model is not a good one. And if that’s all that one claims, it can be shown many ways and I guess your way isn’t too bad. )

    But on another thing: on the “no longer” part of you phrasing. I always did have a physics based criticism and it’s an important one.

    Also, I could still advance other statistics based criticisms– like for example: Why pick ARIMA(3,1,0) rather than some others that have better AIC values? Or if you are presenting an argument that at least appears to read as saying more than there is a flaw in way the IPCC went about things, and in fact, gives the impression to people like Heretic (and I would suggest others) that the statistical signficcance of warming is in doubt, why compare driftless ARIMA(3,1,0) to nothing other than a linear model. I know you can complain that my saying the only alternative hypothesis you considered was linear by saying the IPCC picked that one. But it’s still true you chose to compare driftless arima to that model only– and so only considered a linear model as the alternative.

    So, I still have statistics based issues– but I don’t think it’s worth belaboring them because if you resolved every single one of the statistics criticism we would still have the remaining– very important fact that I think driftless arima with d=1 is physically unrealistic.

    The physics here is correct, but unimportant.

    I assume you do not mean to say the physics is unimportant, because that’s just nuts. (I note you go on to then explain that you think the physics are correct or approximately correct. So, in fact, I don’t think you really believe the physics are unimportant.)

    Consider the IPCC model: its mean grows without bound as time increases and no physical process is like that; so is the IPCC model physically unrealistic for that reason?

    This is an inaccurate statement about what IPCC models do.

    The IPCC “model” does not grow without bound forever as time increases. IPCC projections show warming when the forcings are positive relative to what would be radiated away at the current global temperature. This is exactly what happens in real physical systems.

    Second: If you look at model outcomes in general, if forcings are fixed above the pre-industrial level, the earth temperature rise and then level out — just like the temperature in your oven would do. That is: The temperatures do not rise without bound in AOGCM’s. (That is: they do not do so unless forcings rise without inexorably without bound. But that behavior is realistic too. If the earth were drawn inexorably toward the sun, the temperature would rise without bound until the earth was drawn into the sun and melted. Mind you– I don’t think AOGCM’s contain that physics, but that doesn’t bother me so much.)

    Qualitatively, the models do exactly what engineering models for objects like ovens heated by resistive heating elements, homes heated by furnaces and air conditioners and other garden variety objects do. The AOGCM’s may be flawed and they may get the magnitude of warming wrong– or even very wrong. But they don’t do the equivalent of “driftless arima with d=1” and they do not result in unbounded warming unless forcing itself is unbounded.

    It’s nuts to suggest that it’s physically unrealistic for temperature to keep rising if you keep increasing the rate of heat addition because in fact, it is totally realistic for temperature to increase forever if forcings increase forever.

    In reality, both models are only assumed to be valid over some limited time, say centuries. This is like assuming an area of land is flat: a useful approximation over a limited space, e.g. England, but lousy over the whole Earth.

    I think you are mistaken. I think assuming d=1 is not the equivalent of assuming land is flat. It’s not like assuming Illinois is flat. It’s not like assuming England which is less flat than Illinois is flat. I suspect it’s more like assuming the Himalayas are flat, claiming that’s a good model for predicting the direction in which melt water from glaciers will flow and then suggesting we can’t predict water will run down the sides of mountains because… well mountains are flat.

    d=1 means there is absolutely no connection between the temperature of the earth and the insolation from the sun and/or the heat radiated away from the earth is not a function of the earth’s temperature. None.

    (Out of curiosity, do you understand why some people think d=1 is physically unrealistic? This is not a rhetorical question as those are forbidden at this blog. If you do know then maybe you can explain to me why you think d=1 is permissible at short time scales. The analogy that we can assume land is flat is not exactly a convincing argument for letting earth’s surface temperatures act like a diffusion process.)

    There is also a more general issue here. Suppose that we had two models, M1 and M2. Model M1 is extremely realistic physically and model M2 is extremely unrealistic physically. Yet model M2 fits the data 10,000 times better than model M1 (more precisely, the probability, as determined by AICc, that model M1 minimizes the information loss is 1/10^4, and the probability for model M2 is 1-1/10^4). Shouldn’t we conclude that there are solid grounds for rejecting both models?

    Note necessarily, no. It is often possible to find physically unrealistic models that fit data by data mining, curve fitting or other methods. If you hunt around enough you can almost always find a physically unrealistic models that fits data. So gauging how well a physically unrealistic model fits– even using probabalistic measures– is unwise. One of the reasons science end engineering work is that physically unrealistic fits are heavily disfavored no matter how good they might look statistically. (And by heavily– I mean, really, really heavily. )

  76. One of the reasons science end engineering work is that physically unrealistic fits are heavily disfavored no matter how good they might look statistically. (And by heavily– I mean, really, really heavily. )

    If it is possible, I would go further: physically unrealistic fits are always wrong. All proposed explanations for physical behavior MUST be consistent with well known physics… there are no exceptions.

  77. SteveF–
    I agree with you. The reason I say “heavily” instead of “always” is there is the possibility that we are mistaken about what is or is not physically realistic. It is for this reason I have asked Douglass if he understands why many people think d=1 is physically unrealistic. IF he does, then it’s possible he can explain why he thinks permitting the assumption d=1 for “short time” (like slightly more than a century) is ok. I can’t think of a reason why it would be permissible when applied to short time periods– but maybe I’m overlooking something. ( Is someone speculating there might be undetected forcings that also look like diffusion processes, and those rose during the 20th century? Or what? )

    Because I really can’t think of why a system d=1 is permitted over short time scales. But if I’m mistaken about this, then that reservation would go away.

  78. Because I’ve done some “toy” stuff and there are huge potential issues with doing this with vostok cores where one might be mixing a deterministic signal with “noise”.

    Deterministic signal?

    Are we referring to a deterministic signal which we have inferred in-sample by looking at the data? Because that is not a sound basis for science.

    Or are we referring to a deterministic signal that we robustly know has a linear response in climate validated by out-of-sample data? Such behaviour is possible in a complex system but highly unlikely. I’m not aware of strong evidence for anything like that in the climate system. If you are, by all means remove them from the data prior to analysis.

    Or are we considering the real world, in which we have “forcings” which have unpredictable consequences (or at least, a finite – and short – prediction horizon) through a complex non-linear system? Or perhaps unknown forcings? If so, they should be (deservedly) incorporated into the stochastic model, not separate from it. Those are exactly the things we need to capture in our model.

    Or the final option: are we attempting to separate natural variability – which may include, say, non-anthropogenic changes in GHGs – to AGW? In that case, once again, the “deterministic forcings” in the ice ages belong in the null hypothesis of “no anthropogenic effect”.

    The arbitrary separation of components into a linear summation of “deterministic forcing” and “noise” is an unhelpful model for a complex, coupled non-linear system. Unpredictable “deterministic forcing” should be inside our stochastic model, not external to it.

  79. Spence_UK

    Are we referring to a deterministic signal which we have inferred in-sample by looking at the data? Because that is not a sound basis for science.

    Answer to direct question: no.
    Rhetorical question on sentences that follows: since when has that silly rule been in effect? Answer to rhetorical question: That silly rule has never applied to “science”. I have no idea who thinks it applies to anything but they are historically wrong and if it applied science would devolve to Aristotle’s philosophy which never had the power of science. That’s why science took it’s place in a very wide sense. Creating theories to explain observations has a very respectable position in science and has had one since the time the scientific method displaced things like Plato’s theories about spheres etc.

    In that case, once again, the “deterministic forcings” in the ice ages belong in the null hypothesis of “no anthropogenic effect”.

    I am trying to distinguish determinstic from non-deterministic. Some deterministic forcings are not due to humans.

    Unpredictable “deterministic forcing” should be inside our stochastic model, not external to it.

    No. The uncertainty in forcing that might occur in the future can be dealt with. But it’s still helpful to separate the deterministic from stochastic.

  80. If you hunt around enough you can almost always find a physically unrealistic models that fits data. So gauging how well a physically unrealistic model fits– even using probabalistic measures– is unwise. One of the reasons science end engineering work is that physically unrealistic fits are heavily disfavored no matter how good they might look statistically. (And by heavily– I mean, really, really heavily. )

    If it fits the data, then it is physically realistic. Just because you choose a different basis than Nature doesn’t make a model “physically unrealistic.” What is unrealistic is expecting extrapolations to have good predictive capability. Choice of basis is a simple matter of convenience rather than mysticism.

    If you get right down to it, none of the curve fitting on this site (which I enjoy both for the curve fitting and the commentary, don’t get me wrong), is “physically realistic”. You aren’t solving conservation laws, you’re fitting polynomials with different flavors of noise. The series you are using is so highly integrated (aggregated) that almost none of the variation should be “noise”, but a lot of it seems to be complicated bias introduced by the processing methodology. IIRC Carrick has linked graphs of the spectrum for some different GMST products before. I think the differences are a clue that the “noise” aint.

    According to my shool of mysticism the curve fitting exercises that have a claim on “physically realistic” are regressions on forcings or maybe some deterministic nonlinear timeseries analysis (like dimension embedding).

  81. Re: jstults (Jun 10 06:06),

    If it fits the data, then it is physically realistic.

    I believe the classic Fermi quote as reported by Freeman Dyson applies here:

    “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

    Read the link for the full story on why pseudo-scalar meson theory, while it fit the data, was physically unrealistic.

  82. jstults–
    In addition to what Dewitt says– define “fits”. The fact is: If we use AICc and hunt around for the best fit based on AICc, it’s not driftless ARIMA(3,1,0). Also, if we permitted ourselves fits with forced trends that differ from linear we could probably find a better fit that is not physically unrealistic. But Douglas didn’t search for those. So why should we think driftless ARIMA(3,1,0) is physically ‘realistic’ merely because it fits better than AR1+linear trend, when we know the trend isn’t linear? Just because a fit that is known to be physically unrealistic per se, fits better than a fit we know to be implauisible per we don’t go around concluding the physically realistic fit is physically realistic.

    If you want to assess the probability that the excursion we have seen is statistically significant, it’s best to stick to fits that are both a) physically realistic and b) not implausible.

  83. Also, if we permitted ourselves fits with forced trends that differ from linear we could probably find a better fit that is not physically unrealistic.

    I think this is my source of confusion with the discussion. No fit can be physically unrealistic because the fit is just a low dimensional description of the observations. The slope is no more physically realistic than the curvature.

    With a physical model you have parameters that *mean* something (like a density or a collision frequency or a diffusivity). What physical semantics are attached to these ARIMA coefficient values?

  84. DeWitt Payne, thanks for the link; good one as always.

    My question to Lucia is still with me though: why is (b0 + b1 * t) physically realistic, but (b0 + b1 * t + b2 * t**2) unrealistic? Is the fit value for b1 diagnostic for some physical quantity? From the dimensions it’s just dT/dt, and b2 is d^2 T/dt^2. We don’t have reason apriori to think that any of the rates of T are identically zero. In fact, we “know” the rates exist and are just given by the “right hand side” of our conservation laws, which means they can sometimes be zero, for some amount of time, but assuming they are all zero except the first order term is just as unphysical as assuming the first order term is zero!

    (after writing that, it strikes me as consistent with Fermi’s criticisms in that article you linked, thanks again)

  85. jstults–
    To clarify: The fit is being used to estimate something unknown. In the case of Keenan’s analysis and the IPCC analysis, that unknown things is the variability of the amount of warming that might arise as a result of the “internal variability” of the climate system and we are interested in variability over roughly century scales.

    One can fits any-ole’ model to the data– higher order polynomials, white noise, white noise+trend and etc. Every single fit will give an entirely different estimate of the amount of observed warming that might fall inside the range of natural variability. If I fit a 117 order polynomial through 118 data points, the estimate for natural variability would be zero. (Everyone would understand this is nonesense.) If I fit “white noise+ no trend” I’ll get lots of variability– but it might not be maximal. I have a huge range of fits that I cold apply each of which would give different answers for the amount of warming explained by natural variability.

    The fits I could try can have several properties:

    1) Any fit can interpreted as giving (a)explanatory power or it can just be (b) “something that fits the points”. To claim (a), the qualitative nature of the fit for the process must be the sort that would be be permitted by the physics that govern the process. Otherwise it’s type (b). If it’s type (b), it doesn’t make any sense to suggest that the answers obtained using that data can be used to estimate something like the uncertainty inherent in the process. The fit is just the fit.

    2) Any fit can be interpreted as matching characteristics expected of the process based on the alternative hypothesis one wishes to consider against the null. So, for example: if you are testing whether data are “natural variability+determistic trend” with the deterministic trend assumed linear, the analysis of the properties of the natural variability would be based on residuals left over after the linear fit. The, you can test the null hypothesis realtive to the alternative hypothesis. You get an answer. But this answer tells you nothing relative to other alternative hypotheses not considered.

    Cohn and Lins paper considers models for natural variability which may well be physically realistic. We don’t know if they used the correct model (no one knows what the correct model is) but at least the one they pick is plausible. However, their test only involves comparing the null of “deterministic trend is zero” to “deterministic trend is linear”. As far as that goes, what they did is fine. But if someone in comments says they show something more general: Nope. And in fact, their method does not engage the prevailing notion about the trajectory of warming — which was non-linear during the 20th century. To engage that notion,more needs to be done.

    The shorcomings of Keenan’s analysis seem greater. I could be wrong, but I’m pretty darn sure d=1 is physically unrealistic at all time scales. So, even if that model “fits” in the sense of “you can put the points near the individaul data points”, it does fails under standard (1) above. So, we cannot expect it to give realistic estimates of the things we do not know: that is the variability of 100 year (or so) trends.

    Also, whether Douglas tries to transfer the notion of only testing the a linear hypothesis to the IPCC being the one to propose the linear hypothesis– he only tested ARIMA(3,1,0) against a linear trend+ AR1– which was the one discussed by the IPCC. And even if he says that his point is only to show the deficiencies of the IPCC proof (which was I think deficient) he’s certainly giving the impression that his WSJ article should be interpreted to showing more than the IPCC method was deficient when writing a guest post over at Bishop Hill.

    So: If we are to assess the whether Keenan has demonstrated anything about whether recent warming falls inside natural variability, whether physics permit natural variability to be “driftless ARIMA(3,1,0)” is a valid question. I think that process violates physics, and so one must seek some other models. That one is excluded by physics.
    (Or one must show I am wrong about d=1. But I would suggest while I may be wrong, I am not alone in this.)

  86. Jstults

    My question to Lucia is still with me though: why is (b0 + b1 * t) physically realistic, but (b0 + b1 * t + b2 * t**2) unrealistic?

    I haven’t said that. I’ve said that (b0 + b1 * t) is not the prevailing hypothesis, which it’s not. So Cohn and Lins did not test the prevailing hypothesis. FWIW: (b0 + b1 * t + b2 * t**2) is also not the prevaling hypothesis and I have not suggested anyone test that either.

    The prevailing hypothesis is something similar to the multi-model mean in the hindcast in the AR4. A proper method to test the prevailing view requires either
    a) devising a method that makes no assumption about the form of the forced trend or
    b) performing a test that actually uses the prevailing view as an alternative hypothesis.

    (a) is superior to (b)– I think Zorita did this testing the probability of records and concluded warming is statistically significant.

    We don’t have reason apriori to think that any of the rates of T are identically zero.

    No one assumes they are identically zero. Some people are testing this null against and specifically chosen alternative. So the statistical test goes sort of like this:

    At the outset, assume A (the null) is true. Now, under the assumption B (the alternative) might be true, can we reject A? If we reject A, then we make B our operating hypothesis. If we fail to reject A, we stick with A.

    So: in a test A was tested relative to a specific alternative. If you happen to test something no one believes (A) against something else no one believes (B), your test will still give an answer. But… to some extent, who cares?

    For this reason, it is more traditional to test A– the null vs. something someone might believe.
    I think Zorita did this devising a test to of the null of no warming (A) against the alternative of sufficient warming to get a certain number of record breaking temperatures at the end of a series (B). He found warming statistically significant. (But I can’t remember the paper. I’ll have to look.)

  87. If I fit a 117 order polynomial through 118 data points, the estimate for natural variability would be zero

    Here’s part of the reason for my confusion. You are conflating “natural variability” with unexplained variation (commonly called “noise”). When you regress
    against “natural” forcings, you’ve explained some of the natural variability, that doesn’t make it any less natural, and it’s certainly not “noise”. If the only factors in the model are time and the lagged series itself, then you can’t separate “natural” from “unnatural” variability. It would make sense to answer such a question by regressing against forcings, some of which are natural and some unnatural, and then proceed to “allocate blame.” I think sharp hypothesis testing doesn’t make much sense in this case (and the ones you mention certainly don’t answer any question about the source of variability); it’s much more of an estimation problem.

  88. “When you regress against “natural” forcings, you’ve explained some of the natural variability, that doesn’t make it any less natural, and it’s certainly not “noise””

    I agree and am similarly confused. If you want to explain the temperature what seems natural to me is to try and explain it by looking by some sort of regression of temp on all the natural forcings. If, over a given finite sample period, the forcings have wandered about in what looks like a random sort of way, then so will temperature. All that unit root stuff looking at the behaviour of a variable in terms of its own past history was descriptive preliminary work not explanatory.

    But there are lot of deep issues here. There are some provocative musing from one of the world’s leading econometricians here

    http://cowles.econ.yale.edu/P/cd/d17b/d1771.pdf

    Climate gets a look in around pages 7-10. Just to get people started here is one quote, after considering data for the last 420,000 years

    “The temperature graph reveals many well known features: (i) the (relative) stability of temperatures over the holocene (the last 12kyr), considered to be decisive in the neolithic revolution and
    the emergence of human civilization; (ii) the long but variable cycle (with periods between 80-120kyr) between major glacial epochs; (iii) the relatively short interglacial periods; (iv) some less dominant subcycles, also of variable period; and (v) evidence of random wandering behavior between episodes of deglaciation.”

  89. There is broad scientific agreement about human impact on the level of greenhouse gases (GHG) in the atmosphere, manifested in the popular “hockey stick” graphic that shows the trend in greenhouse gases over the last two centuries as a sharp spike against the blade of little change over the previous two millennia.

    Well, that (top of pp6) might hurt his credibility a bit…

  90. jstults

    You are conflating “natural variability” with unexplained variation (commonly called “noise”).

    I don’t think I am.

    When you regress
    against “natural” forcings, you’ve explained some of the natural variability, that doesn’t make it any less natural, and it’s certainly not “noise”.

    Who’s regressed “natural forcings”? I haven’t– and I haven’t suggested anyone do so. I certainly don’t think what’s left is “noise”. So, while you seem to be presenting a counter-argument, I don’t know what argument you are countering. Maybe there is something I can clarify, but I don’t know what that might be.

  91. Who’s regressed “natural forcings”? I haven’t

    I know, so it really makes no sense to say fitting a 117 term polynomial in time has anything to do with what’s natural variability and what isn’t. Claiming that it does is exactly the “conflating” that I’m talking about. I think I’m almost in “violent agreement” with the points you are trying to make, but some of the stuff you say has me confused. Just trying to figure out how not to be confused.

  92. I’m confused because this,

    If I fit a 117 order polynomial through 118 data points, the estimate for natural variability would be zero

    seems to indicate that you equate the distribution of the residuals in your model fit (commonly called “noise”) with “natural variability.”

    I would define “natural variability” as something that happens as the dynamic system responds to variations in natural forcings (as opposed to “unnatural” ones), maybe that’s not what you mean when you say, “natural variability”.

  93. Jstults-

    I know, so it really makes no sense to say fitting a 117 term polynomial in time has anything to do with what’s natural variability and what isn’t.

    We agree on this. I gave that as a counter example to your claim that

    if it fits the data, then it is physically realistic

    That “fit” woudl fit the data. It’s not “physically realistic”. In fact it’s silly. So silly that no one would claim it is physically realistic even if the fits the data. So, your claim that if it fits, the it is physically realistic is false.

    Claiming that it does is exactly the “conflating” that I’m talking about. I think I’m almost in “violent agreement” with the points you are trying to

    I didn’t claim it has anything to do with natural variability. It is, quite precisely a fit that “fits” and yet has nothing to do with natural variability. Likewise, driftless ARIMA(3,1,0) may “fit” well in some sense, yet it is deficient as a description of natural variability– if you are trying to estimate uncertainty in trends.

    Are we in violent agrreement now?

  94. I would define “natural variability” as something that happens as the dynamic system responds to variations in natural forcings (as opposed to “unnatural” ones), maybe that’s not what you mean when you say, “natural variability”.

    I consider the same thing “natural variability”. But “natural variability itself can be subdivided into:
    1) Variability that would exist with not forcings external to the climate system.
    2) Variability in (1) plus ‘natural’ variations in forcings. (These could include ‘annual cycle’, solar forcings, volcanic.

    Depending on what one is testing, either (1) or (2) is the variability of interest, but I don’t think we have a ‘name’ to distinguish between the two.

    That said: driftless ARIMA(3,1,0) is ceratainy unphysical for (1) and unless we postulate really weird inconceivable forcings, also unphysical for (2). (Off hand, I can’t imagine any natural forcings that would make driftless ARIMA (3,1,0) reasonable for natural variability defined either way. )

  95. My reading comprehension must be really bad, because this

    If I fit a 117 order polynomial through 118 data points, the estimate for natural variability would be zero

    sure seems a whole lot like you’re claiming the residuals have something to do with natural variability. It sure seems a whole lot like you’re saying, “you used up all your dof estimating parameters, so you’ve got none left to estimate ‘natural variability’.” Maybe that’s not what you’re saying at all, but you haven’t told me what you actually meant by that, so I’m left in my state of confusion about what Lucia means when she says “this model is physically realistic and lets me make meaningful statements about natural variability.”

    Here’s what I think: no autoregressive/moving-average model lets you make meaningful statements about natural variability. Why? Because the only explanatory variables (the dependent variables) are time and laged/averaged/differenced versions of the original response variable. No natural/unatural factors are included as explanatory/dependent variables, hence no meaningful statements about what variation is “natural” or “unnatural” can be made.

  96. sure seems a whole lot like you’re claiming the residuals have something to do with natural variability.

    Note that the sentence immediately following the bit you quote is

    (Everyone would understand this is nonesense.)

    That sentence was actually meant to convey that the fit with 117 points would.. you know… be nonsense.

    What I mean is: You can postulate a model of whatever sort you like. Based on that model, you estimate natural variability.

    In the case of the 117 point model, you’ve postulated a model that has the property of fitting. Based on that model, you estimate natural variability. Now, previously, you seemed to claim that “fitting” is sufficient to be physical.

    But I’m pointing out that merely fitting can’t be the sole criterion becuase the 177 point fit clearly fits and yet will give absolute nonsense answers that have nothing to do with natural variability. Not one thing. This is what I intended to communicate with “(Everyone would understand this is nonesense.) “. –

    So, despite the fact that the model fits the estimate based on that model– which fits– it is nonsense. Why is that? It’s because contrary to your suggesting that if the model fits, it must be physical, models that fit can without meeting other criteria, in fact be utter nonsense. Or, put another way, fitting is insufficient to assure us the model will be physically realistic.

    The 117 point model is an example of a model that that has the property “fits” but is nonsense.

    If you read the rest of the comment you will see I discuss features other than fitting a model must have to give estimate of natural variability that have some hope of not being nonesense. It is simply the case that “fitting” is not sufficient reason to deem model physical.

    It seems to me it’s pretty clear you recognize that the example of a model that “fits” might be utterly unphysical because you are explaining to me that the 117 point polynomial fit is a nonsense fit. But, I thought I was saying this fit was clearly a nonsense unphysical fit when I wrote “(Everyone would understand this is nonesense.) “.

  97. jstults,

    Because the only explanatory variables (the dependent variables) are time and laged/averaged/differenced versions of the original response variable.

    Well… what did you fit the ARIMA to? If you fit it to a series that is generated by a process that is “natural”, then, presumably, the fit is an attempt to describe properties arising from natural variability over time. If you fit it to a process that is not “natural”, then the fit will come up with properties that aren’t in accord with ‘natural variability’.

    So whether or not an ARIMA estimate has any hope of teasing out the properties of “natural variability” depends on whether it was fit to a series that “varies naturally”. (I actually don’t think we disagree on this.)

  98. What are you calling nonsense, including all those higher order terms, or making statements about natural variability based on a model with only time as an explanatory variable (obviously I should have said “independent” in the previous comment)?

  99. The other thing I’m groping for is what your metric is for physically realistic. It’s not at all clear to me that the model including all those high order time derivatives of temperature would be “unphysical”, since we “know” that temp varies continuously, and the highly integrated metric we’re modeling will have very small measurement error.

  100. What I am calling nonsense is making statement of natural variability based on assuming a statistical model that makes no sense on whatever basis happens to be nonsense. I don’t know where the notion of “only time as an explanatory variable” comes in here.

    I’m not criticizing driftless ARIMA(3,1,0) as unphysical becuase it contains time only as an explanatory variable. I haven’t criticized a linear model for that reason. Time only might be problem– or not. It’s not what I suggested as the main concern. My main concerns for linear is not the same as my main concern with “driftless ARIMA (3,1,0) and in neither case is my concern “time only”.

    My concern with linear: People are suggesting testing the hypothesis of “no linear trend” against “linear trend” with a “fail to reject” result means we should doubt the statisitical significance of warming. But the problem here is that the theory of AGW does not suggest the trend during the 21st century is linear. So that test can’t be a test of the statistical significance of AGW becuase no one thinks the warming would be linear during that period. To test the prevailing notion, you have to test a theory that is remotely close to the theory people believe and the test did not involve that.

    So, this has absolutely nothin to do with “time only”, “Arima” or anythign like that. It has to do with doing a test that does not involve the prevailing notion about the trajectory of warming and then interpreting the result as telling something about a theory that was not involved in the test.if we consider a natural process that obeys conservation of energy and radiative heat transfer, and we watch it evalove over time, it cannot be ‘driftless ARIMA (3,1,0)’. the reason it can’t is d-1 will violate conseravtion of energy-radiative physics unless the forcings are d=1. So unless you want to kick the can to suggesting the energy to the sun follows a diffusive process (which also makes no sense), d=1 is unphysical. (I guess you could also suggest that unbeknownst to use all sorts of secret unnoticed volcanic eruption happened during the 19th and 20th century or something. But to justify ARIMA(3,1,0) you need some majorly weird forcings that I just can’t even begin to imagine or you need to suspend radiative physics (and I don’t just mean the CO2 bits– I mean the whole thing) or you need to suspend the 1st law of thermo.

    This is true even if “ARIMA(3,1,0)” has the property of “fits”.

  101. Jstults

    It’s not at all clear to me that the model including all those high order time derivatives of temperature would be “unphysical”, since we “know” that temp varies continuously, and the highly integrated metric we’re modeling will have very small measurement error.

    You previously claimed that fitting alone meant the model must be physical. I assume you at least recognize the 117 order polynomial might fit and yet be unphysical. Of course, I’m willing to admit that it might, accidentally by pure luck happen to occasionally result in a fit that is not-unphysical.

    But, I should think you would admit that, in many cases, though the model “fits” after doing the fit, we could easily show it has properties that make no sense. If you don’t see this, i’m not sure what I can say. But, I continue to suggest– strongly– that driftless ARIMA(3,1,0) is unphysical in the sense that natural variabilty cannot be governed by that type of model.

  102. re jstults (Comment #76991)

    I agree that most people use hockey stick for temperature. But all that Phillips is saying is that GHGs exhibit a hockey stick shape. This may be confusing to many readers, but it is what he said.

    “There is broad scientifc agreement about human impact on
    the level of greenhouse gases (GHG) in the atmosphere, manifested in the popular “hockey stick” graphic that shows the trend in greenhouse gases over the last two centuries as a sharp spike against the blade of little change over the previous two
    millennia.”

  103. You previously claimed that fitting alone meant the model must be physical. I assume you at least recognize the 117 order polynomial might fit and yet be unphysical.

    Why would a first order polynomial be “physical” and a nth order one be “unphysical”? None of the “models” we’re discussing are “physical” or “unphysical”. They are all simple curve fits with different flavors of noise. They can either fit well, or fit poorly. Be empirically adequate, or not. Be useful, or not. You are attaching semantics to noise that make little sense.

  104. jstutls–

    Why would a first order polynomial be “physical” and a nth order one be “unphysical”?

    This question is a bit off. First order polynomials may also be unphysical. Both can be unphysical. Consequently, unless you provide a specific example, I can’t explain “why” one would be unphysical while the other would not. Detection of “unphysical” is based on something other than the order.

    But many order have a very, very, very strong chance of being unphysical. For example: Suppose you fit a 119 order polynomial to a sine wave and then project into the future? The polynomial can fit very, very well, but it will give incorrect projections. Why does this happen? Often, if one understands the physics, one can examine the fit, notice it as unphysical behavior in limits and detect the perfect looking fit you got is unphysical. There are many examples where you could fit something to data, the fit looks great if evaluated over the data, but the analyst can tell the fit is unphysical.

    I am not talking about “noise” at all. I am discussing Whether the model used to create a curve that to points generated by a physical process could could possibly describe the physical process. I am saying
    a) One must examine whether the model used to fit the data clearly violate the physics governing the process. If it the models violates the physics governing the process the model is unphysical. So that fit– no matter how good– should be ignored. Also
    b) You cannot gauge whether the models used to create the fit is physical or unphysical by seeing how well it fits. Lots of people create perfectly appearing unphysical fits all the time. You look outside the fit itself to gauge if the model is unphysical.

    A model that is unphysical will not result in a useful fit– no matter how good the fit.

    This has nothing to do “noise”– despite the fact that you keep using the word. (I mostly avoid it unless I am discussing synthetically generated problems where the process I use to create data is white noise. Noise is one type of process, but I’m talking about processes., not noise specifically.)

  105. Detection of “unphysical” is based on something other than the order.

    Yes, so what is that something? Your further comment seems to indicate that it is poor extrapolations that make a fit “unphysical”, maybe you’d also include large oscillations between data points (like you’d get with a high order polynomial).

    There are many examples where you could fit something to data, the fit looks great if evaluated over the data, but the analyst can tell the fit is unphysical.

    Is the analyst relying on some metric of “physicality”? Are they using the “eye-ball norm”? How do we distinguish physical/unphysical?

    I don’t think any of the models we’re discussing (not just the particular flavor of ARIMA you find distasteful) will give “good” extrapolations, because they’re all “unphysical” (by which I mean not based on conservation laws).

  106. Yes, so what is that something?

    That depends on the way in which the model or fit from the model are unphysical. In the case of “driftless ARIMA(3,1,0)” it cannot describe natural variability for the earth’s climate system because that process must be governed by both a) radiative physics and b) the first law of thermo. The combination precludes a random walk, that is it precludes the d=1 in “driftless ARIMA(3,1,0). (Or more specifically, it precludes it unless “natural forcings” are also a random walk– which is both a) inconceivable and b) we have no evidence this is so anyway.)

    Is the analyst relying on some metric of “physicality”?

    Yes. They have to describe why they think a statistical model or the fit from that model is unphysical. (One or the other or both can occur.) The explanation will involve mentioning things we believe we know about the law of physics governing process that created the data.

    I don’t think any of the models we’re discussing (not just the particular flavor of ARIMA you find distasteful) will give “good” extrapolations, because they’re all “unphysical” (by which I mean not based on conservation laws).

    Could you be more specific, describe a model we are discussing and explain why that specific model is unphysical. I have repeatedly provided my explanation for why driftless ARIMA(3,1,0) is unphysical and mentioning what that physical laws that would be violated if natural variability was governed by that model.

  107. I have repeatedly provided my explanation for why driftless ARIMA(3,1,0) is unphysical and mentioning what that physical laws that would be violated

    Yes, you keep saying that. I think you said ARIMA(3,1,0) would violate conservation of energy? I think any flavor of ARIMA could be an acceptable empirically adequate description of physical observations. Rather than keep talking past each-other, could you just show that a particular flavor of ARIMA violates a particular conservation law (maybe in general, maybe in this particular case of a temperature series for the globe).

  108. would violate conservation of energy?

    I didn’t say alone, I said combined with radiative physics. (Actually, all we need is combined with the 2nd law of thermodynamics which requires heat transfer to go from hot things to cold things.)

    I’ve also said that the exception is if forcings under go a random walk.

    Eventually, I might show why d=1 violates this using math– but probably not.

    The qualitative answer is: If temperatures get high, an object will radiate away more energy. If too low, it will radiate away less. (That’s the radiative physics part– or more generally, 2nd law of thermo requires this.).

    If it radiates more energy, it tends to cool. If it radiates less, it tends to warm. (That’s the conservation of energy part.)

    The result is that the equations governing the evolution of temperature are such that — as long as forcings are not themselves under going a random walk– the response of temperatures cannot under go a random walk. That is: unless forcings have d=1 (or greater), the temperature response cannot have d=1.

    But no, I’m not going to type all the equations and proof into a wordpress blog post.

    Similar arguments can be made to indicate natural forcing from the sun can’t be a random walk. Random forcing from volcanic aerosols can’t be a random walk. So, d=1 seems unphysical, and I would suggest is unphysical. I’m not going to do it to a greater extent than this at the blog because a) I don’t think it’s worth it, and b) I want to devote time to examining problems with climacograms.

    But there are lots and lots and lots of problems where if the feed back for the system is negative, then you can’t get d=1. That is: If the system is such that you don’t get runaway greenhouse effects then d can’t be greater than 0.5!

    If you want to believe d=1 is physical, ok. I’m not going to devote more time to show it. I’ll admit I didn’t show you a full and complete mathematical proof. But I’m darn sure you are wrong and I will continue to say so. I’m also darn sure that your principles like suggesting that if the fact that a statistical model looks like it “fits” means it must be physical is flat out wrong– and people know this.

    BTW: I haven’t read DK’s thermodynamics paper yet, but I wouldn’t be at all surprised if he shows d can’t be 1. 🙂

  109. Ok… I read DK’s 2011 paper. He just assumes at the outset d<0.5. That's required by this:

    "Let x
    0i
    be a stationary stochastic process at discrete time i = 0, 1, …”

    Stationary processes have d<0.5. Period. Random walks (i.e. d=1 are non-stationary. )

    I guess i"m not surprised that the paper doesn't waste time actually proving d=1 is unphysical since pretty much everyone accepts this. But likewise, I'm not going to do convolutions to prove it. It is very widely accepted that d=1 is unphysical for things like temperature series for the earth's surface -- and for sound reasons.

  110. Eventually, I might show why d=1 violates this using math– but probably not.

    It is very widely accepted that d=1 is unphysical for things like temperature series for the earth’s surface — and for sound reasons.

    If it’s so widely excepted, the reasons so sound, a trivial, even obvious result, then I’d really appreciate if you (or any of the lurkers) could point me at an actual demonstration of this unphysicality. I don’t mind being the slow student in class.

  111. I’m also darn sure that your principles like suggesting that if the fact that a statistical model looks like it “fits” means it must be physical is flat out wrong– and people know this.

    If it “fits” then it describes the data well, by some metric. If the data is physical, then the description is as well (by some metrics the description may be poor, by some it may be good). All of the models you are discussing are unphysical. They take a step away from the deterministic equations and model some of the dynamics as a stochastic component. We *know* the dynamics are deterministic, and the stochastic approximation is poorly supported theoretically, and can only be an empirically adequate (Box’s wrong but useful model) description of some particular data sometimes. The particular flavor of ARIMA you find objectionable is non-unique in this failing.

  112. jstults–
    I’d love to do that. But unfortunately, sometimes the most difficult thing to find proofs or demonstrations for are obvious. No one writes papers on these things and people don’t bother to cite a paper for the obvious because it’s obvious. Sorry.

  113. jstults-

    All of the models you are discussing are unphysical.

    Mine is a model or fit that if true would violate known physical principles governing the system. Based on my definition, many models are perfectly physical as descriptions of the process that generates ‘natural variability’ but ARIMA(3,1,0) is unphysical. What’s your definition of “unphysical”? Unless you tell me what it is, I can’t begin to know what you are talking about.

    We *know* the dynamics are deterministic, and the stochastic approximation is poorly supported theoretically, and can only be an empirically adequate (Box’s wrong but useful model) description of some particular data sometimes.

    The fact that the dynamics are determinstic at some level doesn’t make a stochastic approximation unphysical. Even as an approximation driftless ARIMA(3,1,0) is unphysical. It’s failing is of a different nature from whatever failing you see for AR1 etc.

  114. jstutls–
    I’m not sure what you’ll find in the intertubes. But you could hunt for articles on random walks, diffision processes etc. Compare physics governing the velocities and displacements of particles suspected in turbulent in a turbulent diffusion process to the displacements. d=1 works for displacements but would be unphysical for velocities. The reason is there is a restoring force causing particles velocities to move toward the velocity of the surrounding fluid. In contrast, there is no restoring force for ‘location’ of the particle. Similar physics will apply for heat transfer problems, and all sorts of problems. If there is a restoring force that tends to pull ‘something’ (i.e. velocity, temperature whatever) toward a particular level, d=1 isn’t going to be physical.

  115. If there is a restoring force that tends to pull ‘something’ (i.e. velocity, temperature whatever) toward a particular level, d=1 isn’t going to be physical.

    Sure, if the “level” were constant. When the level is dynamic, and results from the combination of various forcings which are the result of loosely-coupled, nonlinear dynamic systems themselves (the sun and clouds come readily to mind), then it’s far from obvious (to me at least) that d=1 couldn’t be an empirically adequate description.

    I appreciate you making the attempt to explain things to me.

  116. Sure, if the “level” were constant.

    d=1 is not possible if the forcing stationary.

    When the level is dynamic, and results from the combination of various forcings

    We are discussing distinguishing between results arsing because forcings are perpetually increasing (as with ghg’s) or natural variability (where they are not perpetually increasing.)

    But anyway, even if forcings are increasing d=1 is not possible unless the forcings have d=1.

    nonlinear dynamic systems themselves (the sun and clouds come readily to mind), then it’s far from obvious (to me at least) that d=1 couldn’t be an empirically adequate description.

    If the sun had d=1, it would be possible for natural variability to be d=1. However, note in comments above I discussed the sun and said it’s possible to argue the sun also can’t have d=1 (thank heavens!) Basically, it seems unlikely to me that the power emitted to the sun doesn’t have some sort of restoring force toward a current ‘preferred’ level. That is: We don’t expect the sun’s energy to wander from that of a blue giant to a white draft etc. just aimlessly.

    If the forcing level is just increasing (as with ghg’s), that goes into the forced trend and you still can’t get d=1 for the natural variability.

  117. jstults–
    Stationary doesn’t mean steady state. What Dan wrote has to do with whether or not we can reach a true steady state; it doesn’t engage the point about ‘stationarity”.

    Lucia

  118. it doesn’t engage the point about ‘stationarity’

    Nope; it is exactly on point. It has to do with whether the moments of the distributions (the temporal statistics) are constants with time (stationary). Nonlinear dynamics allows nonstationarity without any need to violate “the physics”. In fact, that is often the case. This is similar to the closure problem of turbulence modeling (based on your comments and the notation of your previous “basics” post on averaging this is something you seem to be familiar with).

  119. Thanks DeWitt, I was waiting for that one (check’s in the mail) ; – )

    Both a linear trend and a random walk are unbounded if you extrapolate to infinity, but we’re talking about finite times.

  120. Jstults.

    HA! I just started reading your blog. So funny that you mentioned TASSM.

    I walked off that project after being shown the surviveability analysis and being asked to “justify” what they done. The analysis was, of course, unjustifiable. They assumed stupid counter measures deployed in sub optimal fashion. About the only thing supportable was the probability of clobber.

  121. In another life I had a bunch of old dudes who worked for me that did the live fire testing for that program, lots of “fun” stories. The buffoonery continues, only the names change…

  122. Re: jstults (Jun 13 13:22),

    Stock market adage: nothing moves in a straight line forever. We’re also not talking about a linear trend. We’re talking about the underlying process. Absent an unbounded forcing, the underlying process cannot realistically be described by ARIMA(3,1,0). Also, there’s a finite amount of fossil fuel so that forcing isn’t unbounded even if you can fit the recent atmospheric CO2 concentration time series with an exponential function.

    Assuming stellar models are correct and the sun will continue to brighten over time, a runaway greenhouse is in the cards eventually. I’m not holding my breath

  123. We’re talking about the underlying process.

    If that were so, we wouldn’t be using linear stochastics. Once you’ve made that deal with the devil you’ve lost any grounds for excluding a particular value for p,d,q on a physical basis. They’re all approximations over finite time periods with varying levels of “goodness.”

    I’m a simple guy. If I can’t boil it down to addition and subtraction, then I’m not likely to understand it. Attempt at Latex follows (someone might want to check my math):

    ARIMA(3,0,0),
    $latex {x}_{i}={\alpha}_{1}\,{x}_{i-1}+{\alpha}_{2}\,{x}_{i-2}+{\alpha}_{3}\,{x}_{i-3}+\epsilon $ (1)

    ARIMA(3,1,0),
    $latex {x}_{i}=\left( {\alpha}_{1}+1\right) \,{x}_{i-1}+\left( {\alpha}_{2}-{\alpha}_{1}\right) \,{x}_{i-2}+\left( {\alpha}_{3}-{\alpha}_{2}\right) \,{x}_{i-3}-{\alpha}_{3}\,{x}_{i-4}+\epsilon $ (2)

    ARIMA(0,1,0),
    $latex {x}_{i}={x}_{i-1}+\epsilon $ (3)

    So 1 is “physical” but 2 is not (3 is an actual “random walk” for reference)?

  124. Sorry, been busy and I see the thread has kept busy since then. Waaaay back when, Lucia said about my comment about a posteriori inferences:

    Rhetorical question on sentences that follows: since when has that silly rule been in effect? Answer to rhetorical question: That silly rule has never applied to “science”.

    Hmm, well first I didn’t say it was a rule. To me, it is just common sense. Dr von Storch explains it much more ably than me in a link I often refer to, section 2.2 (the one about the Mexican hat) in his paper “misuses of statistics in climate science”. Slightly different example, same basic principle.

    No. The uncertainty in forcing that might occur in the future can be dealt with. But it’s still helpful to separate the deterministic from stochastic.

    I think my meaning of unpredictable deterministic forcing is different to what you have understood. I am referring to the situation when the outcome of a deterministic forcing becomes decorrelated with the forcing itself. This happens when the forcing – even one which should have a linear effect, such as an increase in insolation – is small compared to the natural variability. Because climate is sensitive to initial conditions, the “deterministic forcing” causes the system to follow a different trajectory. But that trajectory, due to sensitivity of future outcomes to current state, will be quite different to the trajectory without the forcing, but the delta between these trajectories will be uncorrelated with the original forcing.

    So, even if you know what the forcing is, and even if the equations are linear, the consequence for the system is unpredictable and therefore belongs as part of the stochastic element, not part of the deterministic element.

    And, of course, if your system exhibits LTP, then no amount of scale averaging will improve the accuracy of your estimate of the population mean, in other words, the forcing and its outcome is decorrelated on all scales.

  125. jstults–
    Your blog posts don’t seem to even discuss moments being or not being stationary. It just seems to show things like recurrance plots etc.

    So 1 is “physical” but 2 is not (3 is an actual “random walk” for reference)?

    Correct.

  126. Spence–
    The hypothesis that there is warming is not based on the data in the way of “the mexican hat”.

  127. Re: Spence_UK (Jun 13 15:19),

    This happens when the forcing – even one which should have a linear effect, such as an increase in insolation – is small compared to the natural variability

    Yes, well that’s the question isn’t it: How small is too small to see? We know that relatively small forcings, Milankovitch cycles, e.g. do cause correlated behavior, rate of change of ice volume for one.

  128. Lucia, “time average” is another term for “first moment”. I also didn’t really get into the distinctions between time averages and ensemble averages, but that doesn’t mean those aren’t important to what I show in that post : – )

  129. jstults–

    * In ensemble averaging, “time average” isn’t the first moment.
    * No one has said the first moment of the forced component is stationary. The whole purpose is to separate the forced component from the unforced (natural variation.) It’s the natural variation that is stationary. It’s the natural variability that can’t be ARIMA(3,1,0). (The forced component isn’t even stochastic.)
    * As far as I can tell, your post does not show the first moment for a chaotic process is not stationary– not even if you assume the process is ergodic and the you can switch time averaging with ensemble averaging!

    As far as I can tell, all you are doing is showing things are steady state.

  130. Lucia – I never said the warming was based on the Mexican hat. The Mexican hat is an example of why statistical tests on a posteriori inferences are meaningless, which was the original point I was making (and Prof. Koutsoyiannis was making upthread), and that you were unconvinced by. I guess if the three of us can’t convince you, we’ll have to agree to disagree on this one.

    DeWitt – I’m not convinced by the Milankovitch cycles being solar-driven. It doesn’t make much sense. Why would the orbital cycle with the smallest forcing dominate over such a large period? Why are the matches (generally) so poor? Why do they suddenly switch? Why are the centre frequencies of the peaks slightly off what they should be? Of course, if you have LTP present in natural variability, and your time series is (some small multiple) of the length of said orbital cycles, then you expect the time series to be dominated by some peaks, close to the orbital cycles. Also, the LTP explanation doesn’t require the positive feedbacks so large even climate modellers shy away from them.

    Carl Wunsch discusses this nicely in this paper. Curiously, he fits an autoregressive series (as I note above, an AR series with sufficiently long time constant makes a fair approximation to LTP). But his Fig 2. is absolutely spot on what I would expect from Hurst stochastics. It would be quite a coincidence if forcing happened to match this. Not impossible, but quite a coincidence.

  131. Spence–
    Von Stroch doesn’t say you can’t make a posteriori infernces.
    Warming is not an a posteriori inference.
    The test of the trend is not a statistical trend based on an a posteriori inference. It is a test based on a theory that there will be warming. The warming is not postulated based on the appearance of the data.

    the original point I was making (and Prof. Koutsoyiannis was making upthread), and that you were unconvinced by.

    And my response to Koutsouyainnis included

    The magnitude of the trend is derived from data a posteriori. But the knowledge that the trend exist is a priori –not theorized on the basis of the data itself.

    I’ll elaborate:
    In his paper, Koutsouyainnis gives examples of test on Mexican hat type identifications. If doing statistics on “mexican hat” identifications is the definition of doing a posteriori statstics on features identified in the data, then I agree you can’t do that.

    But if the theory pre-exists the data (as the theory that ghg’s cause warming does), one is not forbidden from doing a test on the data merely because it appear the data agree with the theory. If someone is going to define that as a test based on a theory developed from the data, that’s nuts. In fact, it’s flat out insane. The rule does not exist. All science would grind to a standstill if you were not permitted to do a statistical tests on a feature that was predicted by a theory that predated by the data merely because the data turned out to give good support to the theory!

  132. Re: Spence_UK (Jun 13 16:09),

    I’m not convinced by the Milankovitch cycles being solar-driven. It doesn’t make much sense. Why would the orbital cycle with the smallest forcing dominate over such a large period? Why are the matches (generally) so poor?

    See this powerpoint presentation, particularly slide 19. The fit of June 65N insolation anomaly with rate of change of ice volume is excellent. Obviously there’s more than that going on, particularly for the glacial/interglacial transitions, but the evidence of an effect is compelling to me.

  133. It’s the natural variability that can’t be ARIMA(3,1,0).

    The whole point of the post is that linear transformations of a chaotic series (which the n’th moments are: linear in the response, the integral is a linear operator and the kernal is just $latex t^n $) are chaotic as well. There is no guarantee that the $latex n^{th}$-order moments
    $latex \int_{t0}^{t1}{\left( t-\mu\right) }^{n}\,\mathrm{f}\left( t\right) dt $
    are independent of the averaging window (i.e. stationary). In fact, it’s a rather special circumstance for the averages not to depend on the averaging window for nonlinear dynamics.

    I guess I’d really be interested in a demonstration that either an $latex n^{th}$-moment must not be a function of t, or that the update equation I posted for ARIMA(3,1,0) results in a violation of a conservation law (that isn’t also violated by all the other models we’re discussing when $latex t \rightarrow \infty $).

    Also, wiki claims that ARIMA with $latex d > 0 $ means not wide sense stationary, so I’m not sure why you’re fixated on $latex d=1 $. It seems especially weird since ARIMA(3,1) is just a special case of ARIMA(4,0).

  134. jstults

    are independent of the averaging window (i.e. stationary).

    What you are showing is not based on the definition of stationary in statistics. The statistic averaged over t1 would be stationary if

    The $latex n^{th}$-order moments need to be expected values. That’s what’s independent of time that is:
    $latex E[\int_{t0}^{t1}{\left( t-\mu\right) }^{n}\,\mathrm{f}\left( t\right) dt ]$
    is independent of time. The fact that the I(t) varies with time along a trajectory is entirely unimportant to identifying stationarity. (And, more over, if the value of the intergral was independent of time, it wouldn’t even look stochastic. So we wouldn’t even be discussing this!)

    I guess I’d really be interested in a demonstration that either an n^{th}-moment must not be a function of t, or that the update equation

    Well…. then why don’t you try to take your post a step further and take expected values. First: Pick a bunch a series of t’s using some sort of random number generator. Then evaluate your integral.

    $latex I(t)= \int_{t0}^{t1}{\left( t-\mu\right) }^{n}\,\mathrm{f}\left( t\right) dt $

    Plot $latex I(t) $ as a function of t. See if you can find any dependence of the integral on your values of t– I’d suggest spacing the t’s out quite a bit. Does the integral increase with t? That would mean the mean is not stationary. If you were to compute an autocorrelation of of $latex I(t) $ is that a function of t?

    After you actually check for stationarity, get back to me. But right now, what you have does not reveal anything at all about whether or not the process you plotted out is stationary.

    I’m not sure why you’re fixated on d=1 .

    I’m “fixated” on 0.5<d.

    It seems especially weird since ARIMA(3,1) is just a special case of ARIMA(4,0).

    Is it? But even if the answer is yes, so? It’s this special case that is not physical. A special case can be unphysical even if other cases in the general case aren’t.

    What the heck: While you are at it, why don’t make a climacogram of $latex I(t1) $ of the process in your blog post and report back the value of d based on fitting the slope.

  135. You’d agree that $latex E\left[ \right] $ is just another linear operator, right?

    The fact that ARIMA(3,1,0) is a special case of ARIMA(4,0,0) really is obvious when you scribble it in the margin like I did above (see equation 2). I may have mucked up the application of the lag and difference operators, but I’m pretty sure that’s right.

  136. Lucia, I agree for the most part with your main point, but I just think you’re bordering on the sort of silliness the Antarctic boys were guilty of when they thought they had sound physical reasons for their truncation procedures. Being so certain of what’s allowed from these gross linear approximations (I mean the models are all linear in the parameters, not that the model has only a first order term) to nonlinear dynamics is a recipe for fooling yourself. I’ll try to keep my disruptions of your continued climacogrametry to a minimum.

  137. 1) Of course E is a linear operator. So apply it and see what you get. Or don’t and keep arguing my mis-defining the meaning of “stationary”. (I don’t know why you are hung up on “linear” anyway.)
    2) I have no idea which “Antarctic” boys you are talking about or which truncation procedure nor why you think someone sometimes being incorrect of decreeing something unpysical means it’s not allowed in cases where the declaration makes sense.
    3) The natural variability should be stationary and bounded. Finding ‘no statistical significance’ with silly models for natural variability is not something I’m going to pay any attention to. It’s not a useful test.

  138. “It seems especially weird since ARIMA(3,1) is just a special case of ARIMA(4,0).”

    I have not been following these threads closely and additionally I am not near being versed in the use of ARIMA models, but what little experience I have using them and reading the literature about using them, I have frequently seen that when the orders get past 2 in these models a close analysis shows over fitting with near unit coefficients. What does the acf and pacf say about these higher order models? I get my analyses from the link:

    http://www.duke.edu/~rnau/411arim.htm

  139. I don’t know why you are hung up on “linear” anyway.

    Because the interesting parts of nonlinear dynamics are invariant to linear transformation. This fact can be used to “test” whether a signal is stochastic or deterministic (unfortunately we don’t have enough data to establish the dynamics for climate empirically, certainly not from a single scalar series), but more important to this discussion: your intuition about averaging (a linear operation) is misleading you when it comes to nonlinear deterministic time series (maybe climate isn’t deterministic?). I thought this might be of interest to someone who says they’re interested in separating the “deterministic forced response” from “something I prefer not to call noise.”

    I bet you could show that the fit of ARIMA(3,1,0) isn’t really all that good compared to other similarly complex models (maybe it would be worth comparing it to ARIMA(4,0,0) that adds a degree of freedom and removes the constraint on the parameters), that would be far more credible than “decreeing” it “silly” or “unphysical”, because so far those are your opinions which you haven’t given anyone good reason for sharing if they weren’t already so inclined.

    I’d love to see a demonstration that certain types of ARIMA violate conservation laws (or even the second law), that would be pretty neat because it would probably result in a useful model selection criteria. Right now I’m left with “anything but d=1, and Lucia is not alone in this”, and I’m fine leaving it at that, I don’t expect you to discover new model selection methods on your blog. I’m sorry for being a bit abrasive in my previous comments.

    Kenneth Fritsch: you can easily show this to yourself by grouping the terms of the ARIMA(1,1,0) model on that page, you get an ARIMA(2,0,0) model with a constraint on the parameters (one less degree of freedom).

  140. Jstults–
    I think you are confusing a things like “There is no guarantee that that X is invariant to averaging window” with showing something about stationarity of statistical moments. The fact that time averaging happens to be linear doesn’t mean you’ve shown chaotic processes don’t have stationary moments.

    I bet you could show that the fit of ARIMA(3,1,0) isn’t really all that good compared to other similarly complex models (maybe it would be worth comparing it to ARIMA(4,0,0)

    For some value of parameters, ARIMA(4,0,0) is also non-stationary, unbounded and so unphysical. I haven’t said d=1 is the only thing that will make an ARIMA model unphysical.

    If you are saying I havent’ fully explained: That’s right. I said I didn’t. I said I’m not going to. I didn’t say you are required to accept what I say– but I know that driftless ARIMA(3,1,0) is not physical and I won’t accept estimates of uncertainty intervals based on that. But you are seriously confusing yourself with your chaos stuff.

    If you wanted to spend the time, you could go through your various chaotic analyses on your page selecting anything with an attractor. Plot climatograms and see if you ever get d=1 (or even d=0.5). Or don’t do it.

  141. …you are seriously confusing yourself with your chaos stuff

    You’re probably right. I’m sure you’ll grant it’s easy for simple folks like me to get confused when you write things like
    $latex E \left[ \int_{t0}^{t1}{\left( t-\mu\right) }^{n}\,\mathrm{f}\left( t\right) dt \right] $
    as if it is something different than the integral by itself. Because of course it is just the result of the definite integral. I know you already know this, because as you pointed out on your points post, “the average of the average is the average”. I’m left confused yet again by what you may have meant by all those words you put around that snippet of math.

  142. jstults

    it is something different than the integral by itself.

    It is something different than the intergral itself.

    An ensemble average of an ensemble average is the ensemble average. The ensemble average of a time average is not the time average prior. In my previous post I had not discussed any other sort of average so it didn’t occur to me to insert “ensemble” everywhere to distinguish it from “time averages”.

    (Oddly, the time average of a time average is not the itself. Odd… but true.)

  143. There’s no ensemble, we have a single realization to model. The distribution in question is that of the residuals.

    Make it easy for me, the math you and I both wrote says, “what is the expected value of a constant?”. To which I answer, “that constant”, and you answer, “I’m taking the expectation over something I’ll never actually have, and that we weren’t actually discussing.”

  144. jstults–
    There is an ensemble. We have only 1 sample drawn from the ensemble but that doesn’t mean there is no ensemble.
    Well… that’s not a quote of anything I said. Also, I certainly discussed ensemble averages, particularly in the post from which you took my quote.

    FWIW: The expected value of a constant is the constant itself.

  145. So you’re not talking about the moments of the residual distribution when you talk about things being “stationary”?

  146. jstutls–
    The moments of the process are estimated from the observations. So, they moments of the process are estimated from the residual distribution. But that’s an estimate of the actual moments based on the sample.

    I am saying the moments of the actual process must be stationary. It doesn’t make sense to talk about whether moments of the residual distribution being stationary or not. It’s just a sample. The observed values are what they are . But if your goal is to learn something about the moments of the process you need to use a statistical model whose properties do not violate the physics governing the unfolding of the real process.
    (And, for what it’s worth, if you are trying to gauge statistical significance, you are, by definition, thinking of your sample as being one drawn from an ensemble of possible samples and you are thinking of the process that created the residuals being governed by something. In the case of surface temperatures, that should be physics. It’s the process that dictates whether the moments of the process must be stationary, bounded, normal, poisson or whatever features seem to be required of the process.)

    So, selecting statistical models you first restrict yourself to models that are not unphysical for the process that generated the residuals. Then you fit a model to those residuals get your estimate for whatever feature it is you wanted to estimate.
    Driftless ARIMA(3,1,0) does not qualify as a process that could govern the actual process that generated the residuals. So you can’t use that.

  147. The process for the residuals for these fits (including all the lagged/averaged predictors too) is assumed to have zero mean, fixed variance, and no dependence on the predictors right? One of my points is that for nonlinear dynamics, no set of linear transformations (which is what you’re applying to the data to arrive at the residuals) will give you such stochastic residuals. The left-overs will always be chaotic (deterministic) if that’s what you started with (unless you get lucky and subtract off the correct dynamics, but that won’t happen treating things in a linear stochastic framework, it’s not even all that likely otherwise because of misspecification).

    I read your time average quip as “average of constant isn’t constant”, so I’m left confused again.

  148. zero mean, fixed variance, and no dependence on the predictors right?

    Not zero mean. But also, you need to state these under ensemble averaging, not time averaging.

    One of my points is that for nonlinear dynamics, no set of linear transformations (which is what you’re applying to the data to arrive at the residuals) will give you such stochastic residuals.

    Why do you think this? Under ensemble averaging don’t even need the linear transformation to get these things for the Lorenz system.

    The left-overs will always be chaotic (deterministic) if that’s what you started with (unless you get lucky and subtract off the correct dynamics, but that won’t happen treating things in a linear stochastic framework, it’s not even all that likely otherwise because of misspecification).

    Look, whether the climate residuals are chaotic or truly stochastic, if you are using ARIMA anything you are already accepting using a stochastic framework to model a chaotic system.

    But you still can’t pick a model that does things like permit the temperatures of the system to wander to infinity with no forcings, that have unbounded standard deviations in temperatures etc. This physics don’t permit this. Driftless ARIMA(3,1,0) has unphysical properties and saying “chaos” doesn’t get around this fact.

    Look: Take your Lorenz model start it with some particular randomly chosen inital condition, run it for 2^26 points and make a climacogram. Look at the value of d for one of the lorenz parameters. Repeat this with another initial condition. See what value of d you get for that realization.

    Do this 100 times. See what d’s you get.

    Then, afterwards, compute standard deviations for one of the lorenz parameters computed on the 2^26 point for each realization. See if you get an infinite standard deviation (as you would with d=1). That’s the estimate of the ensemble average of which ever parameter you are trying to estimate.

    You aren’t going to get d=1.

  149. zero mean, fixed variance, and no dependence on the predictors right?

    Not zero mean. But also, you need to state these under ensemble averaging, not time averaging.

    Huh? What model are you actually fitting in R? The “ensemble” is the iid draws from the distribution for what I called $latex \epsilon $ when I wrote out those ARIMAs. It really seems like you’re BS’ing me with nonsense.

    One of my points is that for nonlinear dynamics, no set of linear transformations (which is what you’re applying to the data to arrive at the residuals) will give you such stochastic residuals.

    Why do you think this?

    Because it’s a “well known result” ; – ) James Annan actually has a paper that mentions it in passing, but this margin is awful narrow…

  150. See if you get an infinite standard deviation (as you would with d=1).

    It’s difficult to demonstrate numerically that the integral won’t converge, this is a “well known result” in the MCMC lit; you’re right about not having that problem for the toy.

  151. Like I said, it’s easy to get confused when you says things like

    What you are showing is not based on the definition of stationary in statistics. The statistic averaged over t1 would be stationary if

    The n^{th}-order moments need to be expected values. That’s what’s independent of time that is:
    E[\int_{t0}^{t1}{\left( t-\mu\right) }^{n}\,\mathrm{f}\left( t\right) dt ]
    is independent of time. The fact that the I(t) varies with time along a trajectory is entirely unimportant to identifying stationarity.

    While my understanding of what we’re talking about is based on things like (my emphasis added)

    The weakest but most evident form of stationarity requires that all parameters that are relevant for a system’s dynamics have to be fixed and constant during the measurement period (and these parameters should be the same when the experiment is reproduced). This is a requirement to be fulfilled not only by the experimental set-up but also by the process taking place in this fixed environment. For the moment this might be puzzling since one usually expects that constant external parameters induce a stationary process, but in face we will confront you in several places in this book with situations where this is not true. If the process under observation is a probabilistic one, it will be characterised by probability distributions for the variables involved. For a stationary process, these probabilities may not depend on time. The same holds if the process is specified by a set of transition probabilities between different states. If there are deterministic rules governing the dynamics, these rules must not change during the time covered by a time series.

    […]

    A strictly periodic modulation of a parameter can be interpreted as a dynamical variable rather than a parameter and does not necessarily destroy stationarity.

    Unfortunately, in most cases we do not have direct access to the system which produces a signal and we cannot establish evidence that its parameters are indeed constant. Thus we have to formulate a second concept of stationarity which is based on the available data itself. This concept has to be different since there are many processes which are formally stationary when the limit of infinitely long observation times can be taken but which behave effectively like non-stationary processes when studied over finite times.
    Nonlinear Time Series

    I suppose having a different definition of stationary could be part of the reason why I’ve had such trouble understanding what you are saying.

  152. What data set is being model with ARIMA(4,0,0)? I would like to analyze that result.

  153. I have not read in detail the thread discussion of stationarity and the unphysical nature of a first difference in an ARIMA model and therefore my comments here may miss some of the more subtle parts of the discussion.

    I was under the impression that if a process contains a trend that the first step in fitting an ARIMA model to the series is to do a first difference (obviously if one is doing an ARFIMA model the differencing is fractional and I think assumes there is not a trend with d<0.5) and only then would one look at the coefficients and orders for the autocorrelation and moving average part of the model. I guess I am not following what is meant by the first differencing not being part of a physical process. A trend in a temperature series would need to be removed with a first difference for ARIMA modeling, I think. Would not the autocorrelation of a series with a trend without first differencing be inflated?

  154. Kenneth–
    As far as I am aware, no one is modeling anything with ARIMA(4,0,0). For some reason, jstults seems to be suggesting I do this. I don’t find the suggestion attractive.

    JStults-
    I don’t think my definition of stationarity differs from that books.

    The reason for dividing the system into forced + natural variability is to treat the system as one where the natural variabilty is stationary. The forced component doesn’t have to be stationary. It’s just there. There is no problem with this. All that happens with your book is that all the things that book says apply to a stationary probabailstic function only apply to the natural variability.

    But you seem to want to resist this division and seem to keep forgetting that I make it. This division is extremely common in engineered systems– and it works. If we couldn’t do it, we couldn’t predict things like pipe flow, aerodynamics of even describe how food might cook on a stovetop.

    I am saying the model for the natural variability needs to be stationary under this model.

    With respect to the driftless ARIMA(3,1,0) issue:In Keenan’s result, he ends up decreeing the forced component is zero in favor of using driftless arima(3,1,0) for the natural variability. But if so, under his model natural variability is not stationary, is not bounded and has all sorts of properties that are impossible for natural variability. So, it doesn’t work for natural variability.

    So the statistical model he favors violates physics. That means it should be discounted.

    In whatever description you have, the part you identify as natural variability has to be stationary, bounded etc. But the deterministic (i.e. forced) doesn’t have to be. It could be driven by ghg’s. This isn’t a matter of our definitions of stationary being different. It’s a matter of connecting the model components to the physics.

  155. If you’ve decided on using linear stochastic models like ARIMA, then you should be open to the possibility that your sampling period is too short with respect to important low-frequency variations in the parameters governing your process (in the words of the book snippet the data is “effectively nonstationary”, the sampling period isn’t close enough to infinity relative to all the period of all the important variations). If you exclude ARIMA models a priori that would indicate that you need a longer sampling period (i.e. the fit you get is nonstationary or close to it), then all you’ve done is made an implicit assumption that your sampling period is long enough that you’ve adequately resolved any important low frequency behavior. Since the whole point is that we’re uncertain whether we’ve got enough data to rule out low frequency natural oscillations, this seems like a dishonest analysis procedure, a way to cook the books behind a psuedophysics fig leaf.

    If you fit an ARIMA(4,0) and got roughly the same parameter values that would be some evidence that the ARIMA(3,1) wasn’t overfit (since you removed the constraint and let things “float”).

    I still find your handwaving about what’s physical and what’s not to be unsupportable nonsense (granted you did say you weren’t going to support it in the first place so I guess I should just shut up about it).

  156. If you’ve decided on using linear stochastic models like ARIMA, then you should be open to the possibility that your sampling period is too short with respect to important low-frequency variations in the parameters governing your process
    I am open to that possibility. That has nothing to do with the physicality of d=1.

    Since the whole point is that we’re uncertain whether we’ve got enough data to rule out low frequency natural oscillations, this seems like a dishonest analysis procedure, a way to cook the books behind a psuedophysics fig leaf.

    I have no idea why you it’s dishonest.

    If you fit an ARIMA(4,0) and got roughly the same parameter values that would be some evidence that the ARIMA(3,1) wasn’t overfit (since you removed the constraint and let things “float”).

    This does not follow. The evidence that ARIMA(3,1,0) isn’t going to be a good model is that it is unphysical a priori.

    Some ARIMA(4,0) can also be unphysical for certain values of parameters. I’ve said this several times now but for some reason you seem to be ignoring that and somehow encouraging I should fit with ARIMA(4,0,0). If you want to check that you you go do whatever it is you think you could learn with ARIMA(4,0,0). ( Heck, AR(1,0,0) is unpysical if the ar1=1.01. So, after you do the scut work to fit the ARIMA(4,0,0), come back and we can discuss if the one you found is physical.)

    (I assume you know ARIMA’s aren’t uniques anyway, you can find this in any book on ARIMA. I don’t think anything about over fitting by showing that ARIMA(4,0) also fits but if you want to do it, and think it makes your point you do it. )

    I still find your handwaving about what’s physical and what’s not to be unsupportable nonsense (granted you did say you weren’t going to support it in the first place so I guess I should just shut up about it).

    It’s not unsupportable nonesense. I’m just not going to go to the effort to write a math filled post to discuss why ARIMA(3,1,0) is unphysical. But I should note that you also don’t want to bother to look at all your chaos stuff and see whether d=1 could ever fit your chaotic processes which you seem to think somehow show d=1 is possible. But for some reason, you don’t want to see if those have d=1. In all the time you are coming back and saying things like “non-linear dynamics” and intimating this somehow means d can =1, you could have gone ahead and seen if those chaotic processes have d=1. But you don’t seem to want to do that.

  157. “With respect to the driftless ARIMA(3,1,0) issue:In Keenan’s result, he ends up decreeing the forced component is zero in favor of using driftless arima(3,1,0) for the natural variability. But if so, under his model natural variability is not stationary, is not bounded and has all sorts of properties that are impossible for natural variability. So, it doesn’t work for natural variability.”

    Using a first difference implies a trend in the data series and if a trend is forced it cannot be natural and therefore the proposed ARIMA(3,1,0) model is unphysical if it is being used to describe a system with no (forced)trend. Am I interpreting this correctly? Are you further suggesting that with no forcing, Keenan should or could have proposed an ARFIMA model with a fractional difference and LTP?

    What series was Keenan using? That’s the one I want to analyze.

  158. Re: Kenneth Fritsch (Jun 15 17:29),

    As far as I can tell, he’s doing the same thing VS did during the whole unit root fiasco. He tests the data for the presence of a unit root and finds he can reject the hypothesis that d = 0. So then he fits an ARIMA model and found the “best fit” to the raw data was (3,1,0). When you do that, the 95% confidence limits for the data include the entire data set. There’s lots more detail on Bart’s blog (start here, I think ). There were also several articles here on unit roots. The conclusion being that the underlying physical process isn’t consistent with a unit root and the tests don’t rule out a fractional root, not to mention that if there’s a non-linear deterministic trend, you have to remove it before testing for a unit root. So it amounts to begging the question. The tests used assume that there is no deterministic trend other than a linear trend (which is also ruled out) so the conclusion that there isn’t a trend isn’t surprising.

  159. Using a first difference implies a trend in the data series

    No. Using first difference does not imply a trend in a data series. Driftless arima– which is what Keenan finds– is trendless, but wandering aimlessly. Its true that d=1 takes out deterministic trends when they are present (they would appear only in the drift term.) But it does more than that. It’s the ‘more than that’ part that is the problem.

    and if a trend is forced it cannot be natural

    A forced trend could be natural if you had evidence that the forcing increased. But driftless ARIMA(3,1,0) isn’t an approriate model unless the forcing has random walk elements. And that just bumps things back to: what part of natural forcing could be a random walk? Not the sun. Not volcanoes etc.

    Are you further suggesting that with no forcing, Keenan should or could have proposed an ARFIMA model with a fractional difference and LTP?

    I’m only saying he should restrict the processes to describe natural variation to things that are not unphysical. ARFIMA with d<0.5 can be physical. Fractional differencing with d<0.5 can be physical. So, those choices could be valid.

    What series was Keenan using?

    His pdf says:
    “The annual global temperature data was downloaded via http://data.giss.nasa.gov/gistemp/
    on 2010-11-17. The available data was for years 1881–2009. It is given as differences from the mean, in hundredths °C. (The mean used in Figure 1 is from NASA’s Earth Fact Sheet; the accuracy of that mean is irrelevant for the analysis herein.)”

    He gave this data:
    > # Assign the annual global temperature data (source: NASA)
    > gistemp<- ts(c(-21, -26, -27, -32, -32, -29, -36, -27, -17, - 39, -28, -32, -33, -33, -25, -14, -11, -26, -16, -8, -15, -25, - 30, -35, -24, -19, -39, -33, -35, -33, -34, -32, -30, -15, -10, - 30, -39, -33, -20, -19, -15, -26, -22, -22, -17, -2, -15, -13, - 26, -8, -2, -8, -19, -7, -12, -5, 7, 10, 1, 4, 10, 3, 9, 19, 6, - 5, 0, -4, -7, -16, -4, 3, 11, -10, -10, -17, 8, 8, 6, -1, 7, 4, 8, -21, -11, -3, -1, -4, 8, 3, -10, 0, 14, -8, -5, -16, 12, 1, 8, 19, 26, 4, 25, 9, 4, 12, 27, 31, 19, 36, 35, 13, 13, 23, 37, 29, 39, 56, 32, 33, 47, 56, 55, 48, 63, 55, 58, 44, 57), start=1881)

  160. Kenneth–
    I also don’t really understand his criteria for picking his model. If it’s purely AICc, and we just use that and nothing else– no physics noting and just fire up “auto.arima”, I find ARIMA(0,1,4) with trend first better than ARIMA(3,1,0). Now, I’ll admit,I think ARIMA(0,1,4) has the same problem as ARIMA(3,1,0). But if the argument is “pick the one with a better AICc, why not pick ARIMA(0,1,4)– which has drift. Not only does it have drift, the drift is statistically significant.

    So, there you go: Statistically significant drift in a model with a better AICc than ARIMA(3,1,0). Why doesn’t Keenan pick ARIMA(0,1,4)? (I know why I don’t pick it. But why doesn’t Keenan?)

    Of course, I can also get different answers using monthly data.

  161. Folks seem to be getting really hung up about what a good fit for a nonstationary model means. All it means is that there’s probably a slowly varying (wrt the sample period) parameter for the process. What could that possibly be? Hell, if I were an EarthFirster I’d be spinning the unit root thing as, “Inexorable Growth in Emissions Forces Use of Unstable Models!!!111!!eleventyone!”

    Lucia, I think you’re right, much of our disagreement comes down to your bookkeeping approach; adding forcing based predictors (xreg) is likely to do away with the AR parts you find objectionable and statements about what’s forced and what isn’t would make much more sense with that sort of analysis.

    I’m flattered that you’d delegate ‘scut work’ to me, maybe someday I’ll take you up on it. For now I’ll just point out that if parametric or external forcing is allowed, it’s possible to generate arbitrarily long periods where a d=1 model is the “right” approximation.

    I think if we keep the discussion constrained to the AR parts as we have been, then we shouldn’t have too many problems with uniqueness… In fact, lets drop the MA all-together, an infinite number of terms shouldn’t be too much indulgence to expect from a crowd worried about $latex t \rightarrow \infty$

  162. jstults– If adding forcing based predictors would solve the problem, then it needs to be done. Leaving it undone and estimating uncertainty in trends with ARIMA(3,1,0) is a sign that the model is wildly deficient, and so it should not be done.

    I’m flattered that you’d delegate ‘scut work’ to me, maybe someday I’ll take you up on it.

    You’re the one who is proposing looking at AR(4,1,0), suggesting I do it, suggesting I show proof &etc. I’m only suggesting that if you are sufficiently interested in something mysterious about AR(4,1,0) that you are suggesting I look into, you do whatever thing it is you think needs to be done. Also, you seemed to be trying to suggest those chaos things somehow or another permit AR(3,1,0) — but the chaos stuff you did doesn’t touch on anything to do with translating examining those in terms of stochastic processes. I’d suggest if you want to promote what you did as telling us anything about how to model that stuff as stochastic processes (or learning anything about the distribution functions) you do the work. To learn whether it could possibly justify d=1, just go look at the climacograms. You don’t have to do so, but I have other interests right now.

    What I’m seeing in your comments is you claim sufficient interest in d=1 to ask me to write proofs or look at AR(4,0,0) etc. So, if you really are that interested, I would think you would have interest in looking at the stuff you think suggests d=1 would be ok and seeing if it ever actually gets d=1. But if you aren’t interested, you aren’t. That’s ok. But clearly, I’m not going to be interested in doing them nor do I think that I need to do a whole bunch of work because you have developed some incomplete notion you don’t really want to do the work to explain (either– if you wish that modifier.)

    external forcing is allowed, it’s possible to generate arbitrarily long periods where a d=1 model is the “right” approximation.

    So…instead of ‘suggesting’ it, go do it and show what you mean. Then we can discuss what the heck you actually mean and whether it makes any sense in the context of what Keenan did. I would suggest it won’t.

    But your doing the work to flesh out your suggesting clarify the discussion. But once again: it’s up to you to figure out the form of the external forcing that would give d=1. (I’ve always says some forms would–but those kick the problem back to another place. So you revealing the forcing could let us discuss the feature of that forcing.)

    So… go do it. Reveal your forcings and then we can discuss them.

    On the last paragraph: Huh? Why drop MA altogether? Using Keenan’s method of preferring based on AICc alone, AR(0,1,4) with drift fits better than Keenan’s fit. It also has a statistically signficant trend– so even if we weren’t arguing about d=1, that would kill Keenan’s “proof” and raise the question: Why did he pick AR(3,1,0)? It’s got a lower AICc than AR(0,1,4). He clearly doesn’t mind complicated functions– so is (3+1) parameters his magic threshold? What?

    I don’t like either– both have d=1. But if we got past that, we’d be at: So, why not AR(0,1,4), which does show statistically significant warming? Is the ‘magic’ of (3,1,0) that it doesn’t have warming?

    Anyway: I’m worried that d=1 has weird properties even at finite times. This has nothing to do with mixing AR and MA.

  163. Lucia @ Post #77359:

    Thanks for the reply and the data set that Keenan used. I see now in detail what your point is about unphysical and random walk. I agree that the walk can be with or without growth and that an ARIMA (p d q) model would need a d term to stationarize either form of a random walk.

    I want to look at the ARIMA modeling for the Keenan data (while await my grass to dry before cutting). What struck me as a novice in ARIMA modeling was that you can get reasonably close fits to the data with 2 and perhaps more models for the same data. The Duke link I gave above points to problems with fitting ARIMA models without reference to the ACF and PACF analyses and the over fitting that can result.

    AIC or AICc is criteria for determing the best model but it is not a test for statistical significance as I recall. Are there ways of determining statistically significant difference using maximum likelihood as I recall Cohn and Lins used in their paper discussed at the Blackboard.

  164. What I’m seeing in your comments is you claim sufficient interest in d=1 to ask me to write proofs or look at AR(4,0,0) etc. So, if you really are that interested

    You misunderstand my intent. I’m not interested in either of those simple models. I’m not really even interested in global mean surface temperature. I’m interested in model selection. You claimed to have a unique insight into that based on physics. I thought maybe you had an interesting result that I hadn’t heard of, but you’re not willing to do more than handwave. That’s fine. It’s your Blackboard.

    Gems like this keep me coming back,

    I’m not going to be interested in doing them nor do I think that I need to do a whole bunch of work because you have developed some incomplete notion you don’t really want to do the work to explain

    Point taken, I’ll stop calling an analysis bunk because it doesn’t follow my magical model selection criteria.

  165. I used the GISS annual data from 1880-2010 and attempted many ARIMA models using the arima function in R. First off I have to state that R will not do an arima function on a series that it knows is not stationarized and will stationarize it (first difference in this case) and then do the arima. Therefore, for the GISS series d had to be at least 1. I found that the AIC rating for the Keenan suggested arima model of ARIMA(3,1,0) was the best and to my surprise do not show any signs of over fitting.

    I can show my results if anyone is really interested. I can also attempt to use an ARFIMA function to model the GISS data even though we know it is rather too short to properly do that.

    My current take away from Keenan’s publication and Lucia’s critique of it is that while Keenan may have shown that the IPCC’s AR1 model is not necessarily the best fit for the data, his interpretation of a random walk (d=1) without forcing cannot be correct because it does not describe a physical process.

  166. jstults

    You claimed to have a unique insight into that based on physics.

    There is nothing unique about my insight about d=1. It’s widely held the temperatures governed by conservation of energy and radaitive heat transfer can’t go on random walks. My explanation is verbal, not mathematical. Is that handwaving? Maybe. But there nothing unrespectable in handwaving. It’s done all the time.

    You don’t buy it– fine. If you wanted to learn more about model selection and wanted to see if what I’m saying holds water, you could set up a bunch of simple models and see what you get. You don’t want to do that fine. You want to say your interested in model selection but you don’t want to explore stuff about model selection– also fine.

  167. Kenneth–
    I can get R to do all sorts of fits on that GISS data. I can get it to fit AR1, ARMA(1,0,1) etc. Keenan also got it to fit.

    Auto.arima claims ARIMA(4,1,0) has a better aicc criteria than ARIMA(3,1,0). (But auto.arima may be estimating. I’m puzzled about inconsistencies in the aic critiria out of auto.arima and arima.)

  168. Lucia, I have not used auto.arima in R, but arima in R gives a better score for AIC for ARIMA(3,1,0) than ARIMA (4,1,0) but not by a much. Maybe the difference is between AIC and AICc. Actually the difference for several ARIMA models I tried was small for log likelihood and AIC scores and standard deviation of log likelihood. I think Cohn and Lins used the log likelihood ratio to deternmine statistical significance between models.

  169. jstults,

    Any process at steady state where a restoring force exists cannot be a random walk (d=1 or an AR coefficient ≡ 1). The restoring force for the climate is the negative feedback from the increase in radiant emission with temperature. If you have a tank from which or to which you randomly remove or add a bucket of water at regular intervals, the level in the tank remains constant between additions or subtractions. The level in the tank will be a random walk. But the Earth bucket has a hole in it. Water must be continually added to maintain a constant level. The level in the tank isn’t a random walk, it’s bounded because the flow out of the tank increases as the level increases. An unbounded increase or decrease in level is only possible if the flow into the tank isn’t stationary or the flow out of the tank isn’t stationary. On a billion year time scale, the flow into the tank, TSI, is thought to increase with time, but this is unimportant on a millenial time scale.

    You have proposed no physical process that contradicts these facts. A random walk model then is simply an arbitrary fit and has no physical significance. It cannot be used to argue that the variation in temperature for the last 150 years is purely stochastic.

  170. “lucia
    It’s widely held the temperatures governed by conservation of energy and radaitive heat transfer can’t go on random walks.”

    I have always wondered about that.
    We know that Pressure = Temperature x Volume.
    We know atmospheric pressure is variable.
    We know that volume is static, but elastic, as water can undergo a phase change from gas to liquid/solid.
    We know that the phase change from gas to liquid/solid is subject to nucleation, which can be caused by atmospheric particulates or other causes (‘cosmic rays).

    So overall, I find that a sweeping statement. I have seen huge clouds disappear in minutes (Texas) and seen large white fluffy clouds form in 10’s of seconds (Scottish Highlands).
    Can you have a random walk in a system that includes pressure and water phase changes?

  171. Re: DocMartyn (Jun 16 17:59),

    Atmospheric pressure locally is variable. Globally, not so much. Only a small fraction of the total water in a cloud exists as droplets so it only takes a small change in temperature to make clouds appear and disappear.

  172. “DeWitt Payne (Comment #77433) June 16th, 2011 at Only a small fraction of the total water in a cloud exists as droplets so it only takes a small change in temperature to make clouds appear and disappear”

    Quite wrong, a mole of water can change in volume from 0.018 L to 22.4 L with no change in temperature, a volume change of 1:1250 fold occurs between the two states. There is a heat change, but can be no temperature change.
    The whole of thermodynamics was kicked off by people using the phase change in water from liquid to gas, you might note that in a typical high pressure water boiler the temperature of the water is essentially constant, dispute steam being removes or cold water being pumped in. Energy fluctuation in the system are manifest as pressure changes.

  173. DeWitt,

    You have proposed no physical process that contradicts these facts.

    I have not, and I won’t. Your facts contain assumptions about steady state, and equilibrium, and dare I say it, stationarity. All of the pieces are here in this thread already. Look at Carrick’s psds; look at this graph and read the words around it.

    All the linear models you guys are using are “just fits”. Not a one has diagnostic value. They’re all just low-dimensional descriptions, codings, compressions, however you want to say it. For a bunch of engineers, I really can’t understand how you guys aren’t comfortable with “wrong but useful.”

    It cannot be used to argue that the variation in temperature for the last 150 years is purely stochastic.

    Of course I agree with this (especially since I think it’s deterministic); knaves will always twist the truth to make a trap for fools. I hope you don’t think that’s what I was doing.

    Lucia,
    You are right, temperatures governed by hyperbolic conservation laws and all the mighty gods of radiation can’t go on a random walk, but they aren’t linear stochastic processes either, sometimes it’s just useful to pretend they are and they do.

    I’m tired, sorry to bother you guys, I’ll go back to quietly watching people play with R (it is better than anything on TV ; – )

  174. Jstutls–
    Carricks psd’s don’t imply d=1. Notice prior to that we were dicussing fractional differencing– specifically the possibility of d=0.5.

    I really can’t understand how you guys aren’t comfortable with “wrong but useful.”

    Everyone is comfortable with “wrong but useful”. But fact that some models are wrong but useful doesn’t preclude the existence of “wrong an useless”.

    sometimes it’s just useful to pretend they are and they do.

    Just because some approximations are useful doesn’t mean others automatically are. Some approximations are abominations.

  175. Lucia, I loaded the library(forecast) and used auto.arima to give the best fit model. The results are below and the summary shows the best fit is ARIMA(3,1,2) with drift. I have not tried that model in my previous modeling selection methods and I have not looked at the acf graph for the ARIMA(3,1,2), but will now.

    Thanks for mentioning this new tool (for me) to use and abuse.

    Series: GS[, 2]
    ARIMA(3,1,2) with drift

    Coefficients:
    ar1 ar2 ar3 ma1 ma2 drift
    -0.6906 -0.0470 -0.2689 0.2205 -0.5489 0.6695
    s.e. 0.1601 0.1904 0.1058 0.1565 0.1561 0.2765

    sigma^2 estimated as 85.79: log likelihood = -474.25
    AIC = 962.51 AICc = 963.43 BIC = 982.58

  176. Lucia, when I used the trace in auto.arima, I obtained the following. The selection process is supposed to be carried out in a sequence, but this trace would seem to indicate that the process had skipped over other possible combinations.

    ARIMA(2,1,2) with drift : 960.9171
    ARIMA(0,1,0) with drift : 988.862
    ARIMA(1,1,0) with drift : 980.6944
    ARIMA(0,1,1) with drift : 961.8989
    ARIMA(1,1,2) with drift : 961.6892
    ARIMA(3,1,2) with drift : 959.2775
    ARIMA(3,1,1) with drift : 962.0665
    ARIMA(3,1,3) with drift : 961.2424
    ARIMA(2,1,1) with drift : 959.841
    ARIMA(4,1,3) with drift : 963.354
    ARIMA(3,1,2) : 962.216
    ARIMA(4,1,2) with drift : 961.5593

    Best model: ARIMA(3,1,2) with drift

  177. Kenneth–
    Do stepwise=FALSE. Then it won’t skip.

    The other thing I noticed fiddling with auto.arima yesterday is that the results differ depending on whether I center my time or not! The problem is that if you set stepwise=FALSE, it will check “with zero mean”, “without zero mean”. For some choices you get the zero mean value as best– because it fits just as well as ‘with mean’, and you have 1 fewer parameters to knock down the AIC value.
    If you redefine the time (or the baseline) this changes the answer because you’ll kick zero mean out of the choices.

    I think given the arbitrariness of the baseline, I think we want ‘without zero mean’, but auto.arima won’t let us insist on computing the mean! That’s a bit of a pain in the neck.

    Or course it also makes a difference how high we are letting auto.arima go. The defaults are: max.p = 5, max.q = 5,max.order = 5. So you must have changed the order since you are getting 6.

  178. When I make stepwise=FALSE in auto.arima the process goes through all model combinations and the trace=TRUE will give a list. In the following result ARIMA(0,1,4) with drift gave the best aic score, and the method now selects that model. Notice the difference in aic scores with/without drift.

    ARIMA(0,1,0) : 987.3992
    ARIMA(0,1,0) with drift : 988.862
    ARIMA(0,1,1) : 964.1158
    ARIMA(0,1,1) with drift : 961.8989
    ARIMA(0,1,2) : 961.4206
    ARIMA(0,1,2) with drift : 957.9285
    ARIMA(0,1,3) : 963.3635
    ARIMA(0,1,3) with drift : 959.9282
    ARIMA(0,1,4) : 958.9829
    ARIMA(0,1,4) with drift : 956.9333
    ARIMA(0,1,5) : 960.9035
    ARIMA(0,1,5) with drift : 958.468
    ARIMA(1,1,0) : 979.5233
    ARIMA(1,1,0) with drift : 980.6944
    ARIMA(1,1,1) : 964.1523
    ARIMA(1,1,1) with drift : 961.6007
    ARIMA(1,1,2) : 964.492
    ARIMA(1,1,2) with drift : 961.6892
    ARIMA(1,1,3) : 961.902
    ARIMA(1,1,3) with drift : 958.3664
    ARIMA(1,1,4) : 961.7214
    ARIMA(1,1,4) with drift : 959.3938
    ARIMA(2,1,0) : 971.6828
    ARIMA(2,1,0) with drift : 972.1901
    ARIMA(2,1,1) : 963.2175
    ARIMA(2,1,1) with drift : 959.841
    ARIMA(2,1,2) : 964.0394
    ARIMA(2,1,2) with drift : 960.9171
    ARIMA(2,1,3) : 1e+20
    ARIMA(2,1,3) with drift : 960.0243
    ARIMA(3,1,0) : 962.3245
    ARIMA(3,1,0) with drift : 961.4936
    ARIMA(3,1,1) : 963.7946
    ARIMA(3,1,1) with drift : 962.0665
    ARIMA(3,1,2) : 962.216
    ARIMA(3,1,2) with drift : 959.2775
    ARIMA(4,1,0) : 964.9767
    ARIMA(4,1,0) with drift : 964.0146
    ARIMA(4,1,1) : 966.237
    ARIMA(4,1,1) with drift : 962.7137
    ARIMA(5,1,0) : 963.729
    ARIMA(5,1,0) with drift : 961.1846

    Series: x
    ARIMA(0,1,4) with drift

    Coefficients:
    ma1 ma2 ma3 ma4 drift
    -0.4677 -0.2644 -0.1115 0.2006 0.6642
    s.e. 0.0862 0.0930 0.0950 0.0872 0.2956

    sigma^2 estimated as 87: log likelihood = -475.12
    AIC = 962.23 AICc = 962.92 BIC = 979.44

  179. So, with drift is nearly always lower (better).

    ARIMA(3,1,0) : 962.3245
    ARIMA(3,1,0) with drift : 961.4936

    Keenan’s preferred model would fit better with drift.

    I still don’t like d=1. But it’s interesting that AIC always prefers “with drift”, which is saying “Trend more likely than not”.

    This is hardly the message you would get from Keenan’s WSJ article which makes it sound like the ratio of {driftless AIC(3,1,0)} vs {AIC(1,0,0)+trend } so favors no trend that we should seriously doubt the trend. But it’s better with DRIFT!

    I’m going to have to elevate your comment to a post. Probably tomorrow. 🙂

  180. DeWitt–
    I’m aware of that, but I hadn’t explored how that interlaces with “auto.arima” which returns answers “with drift” and “without drift”. I’ll have to run the examples and see.

  181. DeWitt–Auto.arima seems to deal with ‘drift’ properly. You still have to be careful to understand whether you are getting back “drift” rather than ‘intercept’, but auto.arima seem to give better labels. In arima ‘intercept’ is really the mean value (I think.) In auto.arima, the term called ‘drift’ seems to really be the drift. I’d have to confirm that– but the values for ma1, ma2, AIC, standard errors in ma’s etc. auto.arima gives for (ARIMA(0,1,2) with drift) are the same as if I difference and then do (ARIMA(0,0,2).)

    So, it looks like auto.arima doesn’t share the weirdness of ARIMA. (Why arima doesn’t let someone add “allowdrift”, I don’t know. It should.)

  182. The R function, arima, gives the model without drift and arima.auto will give it either way. If you set allowdrift=FALSE in auto.arima you obtain the same results as you obtain from arima. I do not see a parameter in arima that will allow modeling with drift.

    My other concern is that the trace parameter in auto.arima gives a list of aic values (I think it is aic) for all models but that aic value does not match the one that is computed when one uses arima or uses auto.arima without the trace. I think it has to do with relative aic values being useful in a specific evaluation of models but not as an absolute value that can be used in other evaluations. That still does not explain (to me) what the aic value in the arima function is really comparing.

  183. Kenneth–I’ve noticed the same problem with the aic values. Sometimes the aic match and sometimes they don’t. I’ve been trying to diagnose the shared features of cases where they don’t match and I haven’t quite figured that out. I have noticed that auto-arima permits “with zero mean” “without zero mean”, and this can make a difference in some cases. I also notice the differences seem to be more frequent with higher order arima models.

    For now when I say “model X” has a better AIC than model Y, I’m checking with ARIMA afterwards!

    I also noticed that the AIC for “AR1+trend” in Keenans was worse than if I use auto.arima or arima. He used one module to get the aic for (AR1+trend) and a different one for AR(3,1,0). Why use different ones? I find that a bit of a mystery. (Using ARIMA to get the AIC for both cases compared doesn’t result in driftless ARIMA(3,1,0) model being nearly times more probable than the other one. Guess in which direction the value moves? 🙂 )

    Also, one reason I’ve been noticing some of the same things you have been is I wanted to run a script to answer this quesiton:

    Suppose I KNOW data was generated with AR1– so that’s the ‘true’ model. I generate 100 years of ‘data’. Then, I use AUTO.ARIMA and find the ‘apparent best model’ based on AICc for that 100 years of data. Sometimes the ‘best’ model will be AR1– that’s correct. But sometimes the ‘best’ model will be something entirely different. (AR2, ARM11, whatever. Just Wrong.)

    Now, if compare the probability of the ‘best’ model to the konwn correct one using the AIC comparison Keenan did, how often does that test decree ‘best’ (but wrong) model is 10 times more likely that the known ‘right’ model. Or 100?

    I’m getting it happens more often than one might suppose. But I’m having trouble wrapping it up because of the weird features we are both seeing in ARIMA and auto.arima.

  184. Lucia, you remember that Cohn and Lins used the xreg setting in arima to obtain an arima model that assumes a linear trend and used that to compare for statistical significance with an arima model where a linear trend was not assumed. The xreg setting merely allows you to model the residuals of the regression. I attempted this with auto.arima and obtained a best fit with ARIMA(1,0,4). Obviously d=0 is no surprise since the series was detrended. Also that models with and without xreg give different results and probabilities should be no surprise either since a random walk with drift (growth?) is a more general case for a linear trend but not necessarily the same. The linear trend varies significantly from a straight line for the GISS annual temperature series and I assume that must detract additionally from an ARIMA fit – even when it assumes a linear trend.

    Finally, I think, if I recall how to do this with the log likelihoods, one can show that the general ARIMA model for a random walk with drift and 4 orders of MA is a better fit to the GISS data than an ARIMA model assuming a linear trend. What I would really like to do would be to determine what the fit would be if we assumed a linear segmentation with a breakpoint (or more) in a series.

  185. Kenneth–
    Did I send you Cohn and Lin’s code? The code snippet shows. I think their paper explain too. My memory is fuzzy, but I *think* the specified a fractional difference (or arfima?) model for residuals from the line. Then they interated over a range of assumed trends to find the trend that had the maximum likelyhood. Then, I think, they took the ratio of the maximum likelihood case to the case with no trend. But, I’m not sure– I’d have to re-read.

    On my to do list is “check that the test means the same thing” as otherways to do it. (I’d be surprised if it doesn’t because their paper includes graphs of power and false positive rates.)

    Anyway, there is an element that is similar to what Keenan did– take ratio’s of likelyhoods. But there is also a difference. Cohn and Lins constrain the comparison to situations where whatever is not attributed to the trend is explained by the same form of model.

  186. DeWitt,
    I thought about it some more; how about this:

    If human economic activity influences the temperature, and the temperature influences human economic activity, then the temperature series we’re discussing is little different than the prices on futures in Chicago… where random walks are king ; – )

    Probably not an argument most skeptics would embrace, but fun to think about.

    Lucia,
    What would the low-frequency portion of the spectrum look like if you have a longe enough sampling period? What would it look like if the sampling period were too short?

  187. jstults–
    Are you trying to argue by rhetorical question? The answer is: These questions are irrelevant to the issue of d=1.

  188. “Are you trying to argue by rhetorical question?”

    Delicious, lucia. 😉

    Andrew

  189. Lucia, you did send me the code and Cohn and Lins used the ratio of log likelihoods for measuring statistical significant differences for both arima and fracdiff models. The ratio is part of a chi square distribution and probabilities are found from that distribution.

    For arima they compared models with and without the xreg argument. I’ll have to look further at what exactly they did for the fracdiff (ARFIMA) analyses.

  190. Are you trying to argue by rhetorical question?

    No.

    These questions are irrelevant to the issue of d=1.

    Why don’t you think the low-frequency part of the spectrum matters for a question about stationary/nonstationary?

    Andrews and DeWitt: High Five!

  191. jstults–

    Why don’t you think the low-frequency part of the spectrum matters for a question about stationary/nonstationary?

    Why do you think I think the low-frequency part of the spectrum doesn’t matter to the question of stationary/nonstationray?

    Stop trying to argue by ‘asking questions’ and advance your own positive arguments for whatever point you are trying to make.

  192. Hi Lucia,

    Not sure where I should post this question – apologies if this is an inappropriate thread?

    I have been trying to understand whether the change in Global Mean Temperature (whatever this parameter might actually be) can be said to be “statistically significant”?

    Clearly if to do a “Line of best fit” you get a significant slope (which is often quoted as XXdegC per decade etc.). But for this to be actual you need to assume that the actual data points are statistically independant and nor correlated. Put another way we could say that the change in Global Mean temperature is completely consistent with a random process and a forcing (presumed to be C02 related).

    However, if the data is actually corelated (e.g. the system cannot adopt “any” value the following year but is constrained by the value of the previous year etc) then is not the process also consistent with a “natural” process (some form of red noise) with little or no statistical significant trend?

    As such given the current limited data that we have then we are unable to distinguish between these?

    Many thanks for any incite you may be able to give me.

    N.B. Apologies for my probable lack of appropriate terminology which I guess will make any response harder?

  193. Neil

    But for this to be actual you need to assume that the actual data points are statistically independant and nor correlated.

    This is halfway right. If the data are not uncorrelated, you need to properly account for temporal correlation.

    As such given the current limited data that we have then we are unable to distinguish between these?

    That’s what all the arguing is about. The IPCC has a section where they account for autocorrelation by using a simple statistical model that assumes the autocorrelation can be described as something called “ARIMA(1,0,0)” and test for a linear trend. This can be criticized on two counts:
    1) No one thinks the forced trend since 1880s is linear.
    2) ARIMA(1,0,0) may very well be inadequate as a model to account for lag-1 autocorrelation of the unforced (or the natural, which is a slightly different thing) portion of the time series.

    The combination of (1)&(2) clearly results in the IPCC analysis being flawed. Really, I don’t think anyone can disagree with that.

    But the question then is: How to overcome the flaws to figure out out if the temperature rise (looked at in some particular way) is statistically significant. In my opinion, papers fining no significance are either

    (a) overlooking the fact that no one think the trend is linear. By doing so, they are over-inflating our estimate of the uncertainty. (I wouldn’t mind this so much. And argument can be made that we should use the upper bound if we don’t really know the forced trend. But don’t seem to even want to try to apply the same analysis repeating using the prevailing assumption about the temporal evolution of the response to the forced trend.)
    or they are:
    b) suggesting statistical models for the unforced or natural portion of the temperature variability that violate what we know about physics. (This is what Keenan does when he advances a solution based on suggesting residuals be modeled using ARIMA(3,1,0.). It’s the d=1 part that violates physics.) In addition to violating laws of the physical universe, the reason for picking ARIMA(3,1,0) vs ARIMA(4,1,0) is left unexplained. Both violate physics equally, but at least ARIMA(4,1,0) fits the data better by the AICc criteria. So, if the only criteria for picking the mode is “best AICc”, why not pick ARIMA(4,1,0)– which says the trend is statistically significant? This gives the appearance of picking a model based on the answer one ‘likes’.

    I think the best paper discussing the statistical significance gets around the trend issue. That paper is by Zorita. They evaluate the probability of the series with no warming but whose natural variability is described by “fractional differencing) having a whole bunch of record highs at the end and conclude that would be a very rare event. This suggests that the cluster of record highs is a statistically significant event and so the recent spate of highs is plausibly explained as forced warming.

Comments are closed.