Thinking about uncertainty intervals for smoothed curves.

July 13, 2009 lucia 54 Comments

Over the past week, many of us have been thinking about how to interpret smoothed curves. Recently, David turned his attention to Rahmstorf’s projection of sea level rise which is, evidently, based on projections from a smoothed curve. Rumor has it SteveM will be posting a discussion of uncertainty intervals, which will ultimately matter to the confidence one places in any projections using smoothed curves.

Today, I wanted to ponder how I might think about uncertainty intervals for smoothed curves. Thinking about how to think might seem odd, but it’s the first step to determining whether a) someone else’s uncertainty intervals could possibly be large enough and b) figuring out how to include all important factors necessarily to compute the uncertainty intervals. So, today, I’ll discuss the barebones minimum uncertainty intervals for ‘predicting’ the future value of smoothed curve for the final 174 data points in a M-175 month curve. To make the thought process concrete, I’ll doing this using Had2SST an example.

Best fit up to end points

Applying a triangular filter to data far from endpoints is straight forward. I create a set of weights; an M=3 point filter would look like this: (1,2,3,2,1). Then, if I have a string of 10 data points, (y1, y2, y3,y4,y5…,y10), I compute the ‘smooth’ value for the 3rd data point as Y3= (1*y1+ 2* y2+ 3* y3+ 2*y4 +1*y5)/(1+2+3+2+1). If y1-y5 are numbers, this is just a number. Likewise to compute the value for the fourth data point, I compute: Y4= (1*y2+ 2* y3+ 3* y4+ 2*y5 +1*y6)/(1+2+3+2+1). This works Y3-Y9. But when I hit Y9, I can no longer apply the rule, beause I don’t know the value of y11.

Some wise people might suggest I simply avoid smoothing past y9; more often than not this is the more rational course. However, let’s assume I am bound and determined to smooth because… well.. just because.

Estimating the best fit for the extrapolation

Now, there are many possible ways I could deal with the end point. But one of them is to pad the end points. That is: Make up data after y10. The question is: What’s the best way? Of course there is no “best” way, so a better way might be to ask: “What’s a not too crummy way?”

One possible “not too crummy” way is to assume the trend near the end of the data persist. Of course, this itself is a guess, but it often seems less addled than assuming a trend will not persist. So, in this case, I might a) apply a least squares fit to the final M data points and b) assume the upcoming M future data points will fall on the extension of that line. I’ve illustrated this below. (In the graph, blue squares are HadSST2 observations, the thick orange trace is smoothed HadSST2 data using an M=175 triangular filter, the lime green line is the least squares fit to the final 175 observations, and the pink extension is the “guessed” data that will be used to make the smooth curve:

Figure 1: Smooth up to end points; least squares fit.

Of course no one really thinks the future data will fall on the pink line, but those specific values will be used to create a smoothed curve that begin at the end of the orange “real” smoothed line, and end in June 2009, the last month for which I have data.

I get this:

Figure 2: Sort of "best" smoothed curve in end region.

The smooth curve is shown in fuschia. Notice it approaches the least squares line as time increase from 1995 to 2009.

Now, depending on my proclivities I will either go ahead and start making conclusions about what that smooth curve shows me about what HADSST2 did recently, or I can fret about uncertainty interval. If I do the former, I will tell people that the multi-decadal trend in HADSST2 slowed in recent times. See how the pink curve is concave downward? (And maybe, if temperatures rise next year but I want to avoid showing that, I will increase M=175 months to M=200 months, right?)

Before diagnosing any slowdown in the multidecadal trend in HADSST2, it might be wiser for me to realize that the smooth fuschia line will change next month when July’s real data arrives, replacing my guess. That smooth curve is, oddly enough, not an “obsevation”. It is, in some sense, a prediction of that portion of the M=175 smooth curve as it will appear 175 months from now.

Since it’s a prediction of the future form of the smooth curve, it might be useful to begin thinking about the uncertainty in that curve. It turns out there is uncertainty (and flat out errors) from several sources. An incomplete list of these sources includes:

When creating the pink extension and guessing future values, I did not account for the fact that the residuals from the best fit line (shown in green) exhibit temporal autocorrelation. Because the lag-1 temporal autocorrelation for observations is strong, I know that the pink line is not the best guess for next months temperature based on knowledge of the HADSST2 observations. In reality, I expect the July 2009 HadSST2 data will be a bit higher than the pink line. I could account for this. I looked at it and it makes very little difference in this specific example, so I’ll defer further discussion. But, it is something I might need to consider in some cases.
When using the pink extension to create the smooth fuschia line, I have assumed that there is no uncertainty in my knowledge of the “true underlying trend”. That is: I have assumed that the final M=175 observations were generated by a process with an unknown, constant, linear trend which is the “true underlying trend”, but that trend that each data point contains some sort of “noise”, resulting in a deviation from that trend. Even if this assumption is true, because I observed the trend over a finite period of time ( 175 months), there is some statisitcal uncertainty in my estimate of the trend. That is: The trend shown in green may not equal the “true underlying trend”.
If my estimate for the trend and intercept is wrong, then the pink extension does not represent the actually most likely value of future data. The pink extension represents the most likely value based on the past data. That’s a slightly different thing. So, I need to consider the uncertainty in where the correct “pink” line ought to be.
Assuming my pink line extension happens to match the “true underlying trend”, I know that the next 174 data points will contain noise. So, the best fit trend to future data may not fall on that line, and so the future smooth fuschia M=175 curve will shift. This is important but I’m not going to discuss this today.

The uncertainty range in the best fit for the extrapolation

So, now, I’m going to discuss the second issue above: How much uncertainty is there in the fuschia line due to the uncertainty in my estimate of “true underlying trend” only? To estimate this, I computed the red-corrected uncertainty intervals for the mean slope and intercept associated with the final M=175 observations using the Nychka method (and Lee & Lund correction). Then, assuming the uncertainty in the slope and intercept are uncorrelated, I concocted 95% uncertainty intervals for possible trends. (This is a big hokey. What I did was take the square root of 95% to use to compute the multiplier.)

The are the uncertainty intervals for the possible trends are illustrated with green dashed lines. The maximum and minimum possible trend lines at p~95% are have been added in pink:

What the graph means is that even if the “true underlying trend” really is linear in the region where we fit the curve and if we use p=95%, we must consider the possibility it has might have a slope as large or small any of the three pink extensions shown. That’s a pretty large range!

(Mind you, we could narrow the range by fitting the curve to more than 175 observations. The difficulty with this is that the only reason we might filter instead of just fitting a straight line is that we “think” the true underlying trend may vary with time. So, to reduce our estimate of the uncertainty in the computed mean and trend, we might, at least in principle, pollute our estimate of the trend with observations from a time when the rate of warming was either faster or slower.)

The smoothed curves based on the uncertainty ranges

I can now determine the smoothed curve associated with the Â±95% uncertainty bounds on my knowledge of the “true underlying trend”. These are shown in fuschia below:

So, based on these three curves, notice that once the upcoming 174 monthly data are actually observed rather than guessed based on past data, the future M=175 smoothed curve will suggest HadSST was increased slightly or decreasing fairly briskly from 1995-2009.

Of course even that diagnosis will depend on whether or not we believe we can learn a whole lot from a smoothed curve.

Bear in mind: The uncertainty intervals I showed above represent the lower bounds on the uncertainty intervals in the future location of the M=175 smoothed curve associated with the specific realization of weather on the real earth. Like annual averaged surface temperature, the value of temperature along the M=175 smoothed curve is, itself, an observation. The temporally smoothed average will have different properties from the annual averaged surface temperature, but it’s still just an observation of something.

Things not even included yet!

Now, I bet some of you are already thinking, “But she didn’t even account for the fact that the future 174 data points will not fall on any straight line because they are noisy”. Nope. I haven’t. If I account for that, the uncertainty intervals for the future location of the smoothed M=175 curve from 1995-June 2009 is even more uncertain.

So, despite the very smooth noiseless appearance of the first “best fit” prediction of the future location of the M=175 smoothed curve, that location is quite uncertain.

I mention the “smooth and noiseless” issue because it conceals one of the dangers of smoothing up to end points. When we don’t smooth data using a filter, the scatter in the data naturally alerts us to fact that any trend is uncertain. If the data were smooth before applying a filter, we might have confidence the future is fairly predictable. One danger with smoothing is that we risk believing the post-filtered smoothing has revealed the “signal”. It does so to some extent, but it is very easy to misjudge the amount of uncertainty in a smoothed curve.

Some might wonder: Is it possible to figure out the the full uncertainty intervals? …. Yes. I can think of a way, but it’s a bit of a pain in the neck (and also involves some assumptions. ) It also involves running monte-carlo simulations. I’m trying to think of an analytical short cut. If someone happens to know one, let me know.

In the mean time: Should you ever see anyone diagnosing a hypothesis using the tail end of a smooth curve, and they show now uncertainty intervals at all, be wary. There is a good chance they don’t understand what smoothing does. Ask them to add uncertainty intervals of some sort.

If they have added the uncertainty intervals ask at least these two questions: a) Did they account for the randomness of the future data, and b) Did they account for the uncertainty in the estimate of the “true” underlying trend.

If the answer to either is “no”, you can be certain their uncertainty intervals are too small– possibly much too small. Be wary of any predictions they make.

54 thoughts on “Thinking about uncertainty intervals for smoothed curves.”

fred says:

July 13, 2009 at 2:09 pm

Another outstanding one, this. Many thanks. Specific enough to illustrate the point at a concrete level, but with a very clear exposition of the underlying logic of what is going on. Great stuff, a model of how to explain this sort of thing clearly and without either oversimplifying or getting lost in endless detail.
John F. Pittman says:

July 13, 2009 at 3:08 pm

I know a real dirty shortcut. It has the assumption that you do not know what the trend is, and assume all “noise” in m. There are several ways to do it. One is just take the absolute values of the 2nd highest trend for a shortrun and the second lowest trend for a shortrun, add then divide by 2. You can program this with z=9, 5 whatever or choose it by “eyeball” method.

If you can add uncertainty to this, use sqrt(summation of i^2). It also assumes each month has been standardized. I would add that uncertainty in the sqrt function as well. This is how the data you use is published in your smoothing routine?
lucia says:

July 13, 2009 at 3:34 pm

John

This is how the data you use is published in your smoothing routine?

Is this a question? Or are you trying to tell me something about HadSST2 data?
John F. Pittman says:

July 13, 2009 at 3:42 pm

It is both. IIRC, most, if not all temperature anomoly metrics are standardized by month. But I could be wrong, so I ask. If months have been standardized, they will have an associated error, I think should be included. It may not be much; it may not be ascertainable; but I think it should be included.
lucia says:

July 13, 2009 at 3:54 pm

John —
I don’t know how the uncertainty you suggest I add relates to the uncertainty I want to compute. (i.e. the uncertainty in the future value of the M-175 smoothed trend.)

It’s true the individual observations contain errors, but they also vary due to variability associated with weather. I don’t think we can estimate the measurement uncertainty in an individual monthly value the way you suggest because it’s unlikely the variability over 175 months tells us very much about measurement errors for individual months.
John F. Pittman says:

July 13, 2009 at 4:07 pm

In an mx+b equation, it would be the b noise is how I veiw it. I had assumed a method for greater uncertainty intervals for m and now want to do that for b.

It is probably not a good choice. But one of the considerations of a linear least squares fit, is well, the assumption that it is a linear fit.

As far as each month, a possible, if not humorous approach, would be to use the dCO2/dt of Matt Briggs http://wmbriggs.com/blog/?p=122 to give you an estimate of how much this variance is.
lucia says:

July 13, 2009 at 4:11 pm

I estimated the uncertainty in “b” for the fit using the normal method for least squares, and then red corrected. That’s why the dashed green lines form V’s but don’t touch. That was easy to do, so I don’t need a substitute method for that.

What I’m trying to figure out is if there is an easy way to figure out how much uncertainty there is in the future M-175 smooth line based on future scatter in the data. I know how to do this using monte-carlo, but that’s a pain in the neck.
John F. Pittman says:

July 13, 2009 at 4:46 pm

I can’t think of a way. In fact, by smoothing the data the way it is done, it would seem any other method, unless it had standard tables somewhere, just wouldn’t answer the question you are asking.

I wonder if these considerations start approaching LTP phenomena. I think that is a research question that is interesting, but a whole can of worms in and of itself.

I imagine if I knew R, it would be easy. Claims of R’s ability are quite good. Easy to have an opinion, without a surfeit of knowledge.
Nick Stokes says:

July 13, 2009 at 4:59 pm

Just some thoughts on end trend estimation, which are more connected than they mught at first appear:
Prop 1. For any of the linear filters you speak of, the smooth points are just weighted averages of a set of known data points, and you can write down the weights.
prop 1a. The weights shoild be centered – zero lag.
Prop 2. Any smoothing technique to estimate trend should leave a straight line unchanged. That’s not just because it sounds right. As you say above, you’re thiunking of the data as a smooth change plus noise. You want to quieten the noise to see the change, but that won’t work if the quietening alters the change.
Prop 2a. If the line is unchanged, the filters are zero lag.

Now I’ve been enthusing about Mann MRC. It’s easy to apply, and has all these properties. It has the apparent down side that the filters get lop-sided towards the end, and the final point is pinned; it must be just the latest data point. And I have said that there is a reason why that must be so. Your version, and Grinsted’s, have these properties and avoid pinning. At what cost?

The answer is that some weights become negative. Now negative weights in averaging lead to counterintuitive results. A value at one point goes down – the average goes up! In PDE solution, it’s a sure sign of instability. But more practically, in your case it boosts the variability of the result.

OK, let me give a demo. Just 3 endpoints (y1, y2, y3), with a 5pt triangle filter. The least squares extrapolates are y4=(8y3+2y2-4y1)/6 and y5=(11y3+2y2-7y1)/6. The resulting wt average that you apply to get the smooth at y3 after applying the triangle filter is ((4y3+2y2-y1)/6.

Maybe it doesn’t make a huge difference. But it’s there.
Nick Stokes says:

July 13, 2009 at 6:21 pm

Typo – that last filter is (5y3+2y2-y1)/6.
Nick Stokes says:

July 13, 2009 at 6:46 pm

So answering your question as to how to compute the full uncertainty intervals of the smoothed endpoints, here’s a way. If you’ve identified the noise parameters of the residuals, then it’s just the uncertainty of a weighted average. If the residuals are iid normal, the variance of that mean is just the residual variance multiplied by the L2 norm of the weights. If the residuals are correlated, you’ll have to make the sort of adjustments you did for linear regression.

That’s why the negative weights hurt. The weights have to add to 1, so the L2 norm gets worse with negative values.
lucia says:

July 13, 2009 at 7:49 pm

Nick–
1a) So what if can do algebra and make the extrapolation of future data points implicit? One must make the assumption before doing the algebra. Algebra cannot magically erase assumptions like fairly dust.

On the final comment– I’ll think about that and see if it reproduces the right answer. If so, I think the uncertainty bands are going to increase by root(2).
Nick Stokes says:

July 13, 2009 at 9:17 pm

Well, let’s just say that if you do the algebra to find the weights for the smoothing average, it brings it back into the world of things you can analyse.

I may need to revise my preference for Mann MRC. In the example I gave, the L2 norm of the weights is sqrt (5/6), while Mann’s is 1. So least squares is better. In fact in that case it is the minimum L2 value you can get while preserving centering. Grinsted’s method is in between. So while positive weights seems like it ought to be a plus, your least squares may have lower variance.
UC says:

July 14, 2009 at 2:11 am

Lucia, you are on right track. First, the easy question is “What happens to whitenoise if it goes through this tfilt?” (as in ssatrend.m ) . In the ssatrend.m they made apparently a mistake when computing this, after the mistake is corrected variance increases near the endpoints (as you also noticed) :

http://www.climateaudit.org/wp-content/uploads/2009/07/s2_comp.png

( see http://www.climateaudit.org/?p=6533#comment-348882 )

This figure is easy to update with other correlation structures. Note that LS-smooth yields the smallest variance for these smoothers that preserve the trend (e.g. S(Xb)=Xb). Proof is left to the reader. So why to pick any other smoother? Because they want to allow the trend to bend. Can we now find a smoother S for time series t, such that S(t)=t, i.e. no distortion on the signal ? Should we know something about t (other than it is not a straight line) before proceeding? If we know that t and noise are stochastic processes, we can continue with
http://en.wikipedia.org/wiki/Wiener_filter or Kalman smoother. If nothing is told about t, then the handwaving continues ( allows this kind of nonsense http://www.climateaudit.org/?p=6473#comment-348430
).
lucia says:

July 14, 2009 at 5:30 am

Nick–How does doing algebra to hide the assumptions underlying padding do anything but make the analytical process more opaque and difficult to understand? If you want to do the algebra to write this method in close form, go ahead. But I think it’s much easier just to admit that we are extrapolating instead of trying to hide what we are doing.
Nick Stokes says:

July 14, 2009 at 5:58 am

Lucia,
I had a try at the algebra. The weights come out explicitly for a triangle filter with your least squares fit.
With N=M-1, and n the point of the smooth, numbering back, I believe this R function gives the weights:
wts<- function(N,n){
m<-N-n # number of extrapolates
N1<-N-1
N2<-N1/2
i<-0:N1
a1<-m*(m-1)/N/N/N/2
a2<-12*(N2+(m+1)/3)/N1/(N+1) # a1 and a2 are numerical factors
v1<-(N-abs(i+1-m))/N/N # start with the points before extrapolation
v10)v1<-c(1:n/N/N,v1) # complete with the tail of the symmetric filter
v1 #return the filter, indices from -N1-n to 0
}
I’ve checked that the filters are centred and the wts add to 1.
The L2 norm of the weights is then the factor which would multiply the sd of the residuals (if they are iid) to get the se of the smoothed points as a weighted average. I found that in your case of M=175 it didn’t grow by that much. In the interior it is 0.0619, and at the endpoint it has grown, gradually, to 0.1510.

But your residuals do seem highly correlated, so a version of your Cochrane-Orcutt would be needed.
Nick Stokes says:

July 14, 2009 at 6:07 am

Lucia, algebra is just algebra – it doesn’t hide assumptions. However it’s derived, you are actually calculating those smooth points as weighted averages of data.

The philosophical issue doesn’t add anything helpful here, but I would put it like this. You have to determine the weights, for which equations are needed. It’s reasonable, as you suggested, to require that the trend calc gives a consistent result for a continuing trend. That doesn’t mean that you’re assuming (or determining) that the trend will continue – it’s just the condition that you use to construct the filter.
Nick Stokes says:

July 14, 2009 at 6:24 am

Alas, html swallowed a line of the R code. The line
v10)v1<-c(1:n/N/N,v1) # complete with the tail of the symmetric filter
should have been:
v1 <- v1+(a1*(a2*(i-N2)+1)) # add in the extrapolates
if(0<n) v1 <- c(1:n/N/N,v1) # complete with the tail of the symmetric filter
lucia says:

July 14, 2009 at 6:50 am

Nick–
I don’t use R.
Now that you gave me R code, I have to admit I misinterpreted what you meant by this

Well, letâ€™s just say that if you do the algebra to find the weights for the smoothing average, it brings it back into the world of things you can analyse.

Obviously, I already knew how to compute weights for a triangle filter and used it to smooth. I thought you were trying to make some point about your asymetric filters and the lack of need to extrapolate. I thought you were proposing that we write down a close form solution for the weights so we can pretend we never extrapolated.

It’s nice to see someone requesting Cochrane-Orcutt. 🙂
I’d thought that might be the best way to deal with estimating the errors due to random weather in the future.

I also did suspect that the major uncertainty is the bit I already posted, and not so much the uncertainty due to the random variations of future weather. But, I’ll do the numbers in a few days.

That doesnâ€™t mean that youâ€™re assuming (or determining) that the trend will continue – itâ€™s just the condition that you use to construct the filter.

Of course you are assuming this. You are making this assumption in order to compute the weights. If you made a different assumption, you would obtain different weights. Calling “making an assumption”, “imposing a condition” does not change the fact that you chose which condition to impose by “making an assumption”.

We could make the assumption the trend over the last M points is real and impose it. We could make the assumption the trend is false and revert to the mean. We could assume the autocorrelation is real and use Cochrane-Orcutt to guess the future. We could use ARIMA and use that to predict the future (as discussed in the paper you described.)

You can’t apply the concept of reducing L2 errors in the future without first making assumptions about the functional form that will underly the trend in the future. Imposing the condition is inextricably linked to making an assumption about what will happen in the future.

Of course the weights are all computed based on the current value.
BarryW says:

July 14, 2009 at 9:14 am

Speaking of R code. Did you ever get your copy of “The R book”? Do you think you might try using it?
John F. Pittman says:

July 14, 2009 at 12:01 pm

An article with discussion at RC and WUWT is interesting in the assumptions as realted to this thread. You may have seen this already. I have thought on what Nick has but can only conclude wrt to smoothing, the smoothing for variance and trend smoothing when used past the real data points (triangular, or any other) is just fraught with assumptions. Bad assumptions, I would say. And it keeps coming back to something you have pointed out, the assumption of linear with AR(x) noise, is well, just an assumption. The problem is that though the variance is called “noise”, it characteristics do not appear to match the assumption. When you apply it to use a smooth to predict, if you don’t go back to the unsmoothed data to predict variance, you can only be right if and only if your assumption(s) are correct. Otherwise, sooner of later, chance says it will HAVE to deviate from from smoothed variance predicted envelope.
Carrick says:

July 14, 2009 at 1:00 pm

Nick Stokes:

In PDE solution, itâ€™s a sure sign of instability

Could you expand on this?

At face value, this statement is pure nonsense.
Nick Stokes says:

July 14, 2009 at 2:03 pm

John C. On PDE solution. Suppose you’re doing a simple explicit solution of the simplest diffusion equation T_t = T_xx with unit x intervals. The result after a timestep dt is the result of a smoothing:
T(t+dt,x)=T(t,x)+(dt)*(T(t,x-1)-2*T(t,x)+T(t,x+1))=(dt)*(T(t,x-1)+(1/dt-2)*T(t,x)+T(t,x+1))
As is well-known, the upper limit on stability for dt is 1/2, after which the weight on the central term becomes negative.
For more general pde it is still true that explicit solution is equivalent to a filter operation, and if there are negative coefficients, some oscillation can grow.
Nick Stokes says:

July 14, 2009 at 2:21 pm

Lucia,
I’ll try once more with the assumption issue, although it’s a distraction from the use of the filter coefs in error analysis. Thinking about numerical integration. Say Simpson’s rule – it’s a filter with weights ((1,4,1)/6. Where did those numbers come from? Well, Simpson got them by saying – assume the integrand is polynomial. What would give an accurate answer for orders up to quadratic? But that doesn’t mean he’s requiring the function to be quadratic (or polynomial). It’s a good general rule, and you can derive it in other ways (eg trapezoidal).

Carrick – sorry, that last response was for you.
Carrick says:

July 14, 2009 at 3:07 pm

Nike Stokes #16237, sorry I don’t see the relevance of this comment to your original remark “In PDE solution, itâ€™s a sure sign of instability”.

Are you talking about a series solution to a PDE? Approximation of the solution with a polynomial?

For neither of these cases, is your statement in general true (there are classes of problems for which it holds of course), so I am assuming you are referring to something else.
Carrick says:

July 14, 2009 at 3:22 pm

Lucia

You canâ€™t apply the concept of reducing L2 errors in the future without first making assumptions about the functional form that will underly the trend in the future. Imposing the condition is inextricably linked to making an assumption about what will happen in the future.

I agree completely with this statement. Unless you are imposing additional physical constraints, once you get within N/2 of the endpoint, there is essentially are arbitrary assumptions that must be made that allows you obtain values in that interval.

I’ll point out though that if you have additional physical constraints on the system that you are trying to find the inverse solution to, then one can do better than this. For example, with Kalman filtering you can impose physical constraints on the solution outside of the region for which you have data (using e.g., conservation of energy, inertial constraints etc, requiring the solution follow a specific set of physical laws, etc.).

An alternative approach when you are lazy (like me 🙂 ) is to use a recursive filter, which works better because N can be made much smaller in general for the same qualitative filter behavior than with a traditional FIR filter. You still still have to dress up the data at the ends to avoid large transient ringing (the main disadvantage as you probably know with recursive filters)…and this is done for example in MATLAB’s FILTFILT implementation…but I find the resulting smoothed data to be much better behaved with fewer ambiguity’s than are typically present in FIR filter implementations.
Nick Stokes says:

July 14, 2009 at 3:25 pm

Carrick,
It was a remark in passing, and I didn’t spell out everything that perhaps I should have. I meant in timestepping (single step) numerical PDE solution. Thinking of explicit, altho the ideas work for implicit too, and one space dimension. Every new timestep approx is the result of applying a weighted average to the previous approx. If the weights are positive, you can show that any high space frequency oscillation is attenuated in amplitude. If not, some oscillation is very likely to grow.

Incidentally, I raised the PDE analogy earlier, and the use of ghost particles for boundary condition enforcement in fluids, which is somewhat like extrapolation here. In fact, smoothing a time series is very like numerical solution of a diffusion equation, as in the example above. And this issue of smoothing to the boundary without distorting the trend is very like trying to implement a neutral boundary condition in pde. Something like an outflow condition in fluids, or a non-reflective condition in acoustics.
I’m not trying to prove anything there – just an analogy.
oliver says:

July 14, 2009 at 4:04 pm

Nick Stokes (Comment#16240) July 14th, 2009 at 3:25 pm

In fact, smoothing a time series is very like numerical solution of a diffusion equation, as in the example above. And this issue of smoothing to the boundary without distorting the trend is very like trying to implement a neutral boundary condition in pde. Something like an outflow condition in fluids, or a non-reflective condition in acoustics.

The analogy is well made, but if you do something like place a sponge layer in your simulation you won’t report the propagation of the acoustic wave or whatever through the damping region as part of your “result.”

For the more apt diffusion example, you usually have some physical reason to choose the form of the right boundary — but for this particular time series I think the choice basically hinges upon what result you expect (which I think has been Lucia’s point all along).
Carrick says:

July 14, 2009 at 4:08 pm

Nick, thanks for the explanation of what you meant.

Let me throw out this example to see if we’re on the same wavelength (so to speak). Take the 1-d homogeneous Helmholtz equation:

y”(x)+ k0^2(x) y(x) == 0

subject to the BCs

y'(0) = f(t)
y'(L) = 0

0 <= x <= L is a real function of x (known as the “wavenumber function”).

Discretizing this ODE involves letting x -> n dx, y(x) = y(n dx) -> y[n], k0^2(n dx) -> k0^2[n] and y”(x) -> (y[n+1] – 2*y[n] + y[n-1])/dx^2.

You end up with an recurrence relation for the interior solution of the form

y[n+1] = (2 â€“ dx^2 k0^2[n]) y[n] â€“ y[n-1]

Since k0 and dx are positive definite, for small enough dx, you have a case where you have both positive and negative signs.

Whether this iteration is numerically stable depends on the behavior of the original k0(x). If k0(x) = constant, it is always stable. If k0(x) has a form like k0 exp(-k1 x), you can run the iteration forward in x (and it is numerically stable), but not backwards in x.

(If you don’t like the fact I’m using position rather than time here, just let x->t mutatis mutandis and you have your time evolution version instead.)
Carrick says:

July 14, 2009 at 4:10 pm

Sorry glitch in my comment. I meant to specify the domain of x as

0 <= x <= L
Carrick says:

July 14, 2009 at 4:20 pm

oliver, wow, you do sound propagation?

But I agree with you, one doesn’t report the solution in the absorbing layer, just as I don’t report the smoothed version of a solution within N/2 of the boundaries.
Nick Stokes says:

July 14, 2009 at 4:23 pm

Carrick, I think that example is something else. The pde case does create a filter with clear repeated averaging. You’re creating a 2nd otder recurrence relation. The reason for the instability there is that you have two linearly independent solutions, one growing rapidly relative to the other. Any numerical error will be interpreted as a component of the growing solution, so you can’t track the non-growing solution for long. When you reverse, the solutions swap roles.

Incidentally, I think it is still unstable, generally, with constant k0, except in the special case where the characteristic quadratic has roots of near-equal magnitude.
lucia says:

July 14, 2009 at 4:44 pm

Nick–

although itâ€™s a distraction from the use of the filter coefs in error analysis.

Sure. But for some reason, you felt the need to bring this issue of thinking about the use of extrapolation as not involving extrapolation but just recomputing weightings that are applied to existing data. I don’t mind going a bit off topic, so I’m trying to figure out why you thought the topic you brought up was important.

Of course when solving PDE’s numerically and applying methods to discritize the equations, one is not assuming the solution takes any particular functional form. The discretization methods assume your grid size is small relative to the scales of the function you trying to resolve. So, it makes sense to think of expanding the solution field in a Maclauren series locally. In this case, you specifically want to capture all the smallest scales in whatever field you are trying to resolve.

But I think it’s a mistake to look at filtering as analogous to methods of figuring out how to discretize equations. There are many differences. For example, with filtering, we specifically want to smooth out what is happening at scales the are small relative to the overall filter width (here M=175 months). So, when we extrapolate with the goal of later smoothing (or do algebra to make the extrapolate implict) we are specifically trying to guess a “smoothed” property about the future.

With MRC of any type, we are trying to ensure the recent trend is maintained.

I also think if you are troubled by the negative weights you obtain after you do algebra to hide the fact that we are extrapolating, and you think something horrible will happen as a result of these things you should sit down and show examples of the bad thing that happen.

Suppose we use a 3 point filter with weights (w_-1,w₀,w₁) and (w₁=w_-1 for interior points. Next suppose we number final 3 available observations, (-1,0,1) and then use least squares to smooth the last point (1). We compute the smoothed value of Y as

Y ₁ = (w_-1y_-1+w₀y₀+w₁y₁) =
w₁(y_-1+y₀+y₁)/3
+w₀*y₁ +
w₁(y₁-y_-1).

So what if, after shifting everything around, the weight on y_-1 is negative? The goal was to maintain the trend, The w₁(y₁-y_-1) does that. The w₀*y₁ makes sure the local point contributes to the smooth value and the (y_-1+y₀+y₁)/3 smooths out variations due to noise.
(I guess I’ll see these equations after I hit submit! heh.)

We aren’t going to use these values to march forward in time, so… why are you worrying about things that go wrong when solving PDE?
oliver says:

July 14, 2009 at 4:53 pm

Carrick (Comment#16246) July 14th, 2009 at 4:20 pm
oliver, wow, you do sound propagation?

Ha, more like “attempting” than “doing.” 😉 The acoustics are mostly an excuse to look at internal gravity waves. (Or is it the other way around?).
Nick Stokes says:

July 14, 2009 at 5:00 pm

Lucia, I backed off on the negative weight issue in #16219 above. It sounds like something to be avoided, and could give problems with the L2 norm. But in fact your filter seems to do better in that respect.

Your last example is a good way of looking at it. You apply an integrating filter which gives good smoothing and lag. Then you add a differentiating filter, which reduces the smoothing, but moves the centre forward.
Carrick says:

July 14, 2009 at 7:31 pm

oliver:

The acoustics are mostly an excuse to look at internal gravity waves.

Ah! A man after my own heart.

I go at it from the other side, mostly acoustics, infrasound and gravitoacoustic waves (I have a wavenumber integration code that solves this problem based on Allan Pierce’s original papers and inspired by Joe Lingevitch’s more recent work), but internal waves are of great interest to our group (in part because they modify the propagation path of acoustic waves).
Carrick says:

July 14, 2009 at 7:40 pm

Nick, negative weights (I call them “coefficients”) are a common feature of FIR filters, the windowed sinc filter being a classic example of this:

They are very stable, produce few surprises, and can be pushed to incredible performance levels

Off topic, perhaps you could explain why you would expect the recurrence relation I wrote down to be unstable for k0=constant. As far as I’m aware, it is stable both forward and backwards. (I think there is a problem for k0 = imaginary, but I’ve explicitly excluded that in my statement of the problem.)
Nick Stokes says:

July 15, 2009 at 12:07 am

Carrick,
Recurrence relations with constant coefficients have as solutions just powers of constants (and linear combinations). Just substitute c^n. After cancellation, you have a quadratic in c with two roots, c=a and c=b. If |a| is greater than |b|, then any solution which has a component of a in the linear combination will become dominated by that component. You can only stably track b^n using exact arithmetic. Backwards, it is a that you can’t track.
Jorge says:

July 15, 2009 at 5:00 am

Carrick –

“You still still have to dress up the data at the ends to avoid large transient ringing ”

If the IIR filter is causal, I donÂ´t quite understand why one would have to pad the data at the end point to avoid ringing. The start padding is clearly a separate issue as not only are the values of X(n-1) missing but also the values of Y(n-1).

If I remember correctly from previous discussions on Butterworth filters in Matlab, they do use the trick of passing the filtered data a second time, in reverse time order, through the filter.

As I understand that, the order of the filter is doubled and the phase lag from the first pass is removed in the second pass. In this particular version of a recursive filter I can see why you have to pad the end as well, because the end becomes the beginning on the second pass. In addition, it seems that it is no longer a causal filter.

From the point of view that one is trying to sort out error bars, I would have thought that an acausal IIR filter is even more of a nightmare as the assumptions of the endpoint padding propagate back to the very beginning of the series.

My experience is that of an old fashioned analogue electronics guy and I claim no expertise in the new world of digital filters. 🙂
Carrick says:

July 15, 2009 at 7:43 am

Nick, for k0 > 0 being real, the exact solutions are exp(+/- i k0 x).

If k0 -> i k0, the solutions are exp(+/- k0 x)

You see now why it is stable for k0 > 0?
Carrick says:

July 15, 2009 at 7:50 am

Jorge:

If the IIR filter is causal, I donÂ´t quite understand why one would have to pad the data at the end point to avoid ringing. The start padding is clearly a separate issue as not only are the values of X(n-1) missing but also the values of Y(n-1).

Try applying a Butterworth to data that has a large DC offset. You get massive transients at the endpoints, if you simply zero-pad the data (both x and y as you point out has to be done).

For one method on how to pad a recursive filter, take a look at Matlab’s implementation filtfilt.m (basically reflect the data about the endpoints, then start the filter at n = -N, end it at N points past the end point, toss out first and last N data points).

The filtfilt implementation is also acausal, because it runs the Butterworth both forward and backwards in time. I do this with my own code too, because I prefer the linear phase response one gets.
oliver says:

July 15, 2009 at 12:43 pm

Carrick (Comment#16254) July 14th, 2009 at 7:31 pm

I go at it from the other side, mostly acoustics, infrasound and gravitoacoustic waves…

Way cool, maybe we can exchange contact info sometime/somehow.

Carrick (Comment#16268) July 15th, 2009 at 7:50 am

For one method on how to pad a recursive filter, take a look at Matlabâ€™s implementation filtfilt.m…

Hey wait, isn’t that the Minimum Roughness thing again? 🙂
Nick Stokes says:

July 15, 2009 at 3:08 pm

Carrick,
Well, k>0 can’t matter, since you’re squaring it. But yes, the recurrence relation generally behaves like the de that it approximates. Your Helmholtz eq has 2 sinusoidal solutions – neither is growing relative to the other. For the recurrence relation, the two quadratic solutions that I referred to (a and b) are close in magnitude. So it’s fairly stable.
I don’t think this changes so much if k0 varies (but stays real). Your exponential makes it a bit like a Matthieu equation.
Carrick says:

July 15, 2009 at 11:58 pm

Nick Stokes:

Well, k>0 canâ€™t matter, since youâ€™re squaring it.

You think? LOL. Stating that k0 > 0 is a convention here that helps define left versus right moving waves (in addition to of course pinning k0 on the positive real axis). Of course I know that k0^2 is a positive real number if k0 is real, silly-type person.

Not to beat a dead wookie, but how would you propose testing for “fairly stable” versus “stable”?
Nick Stokes says:

July 16, 2009 at 1:21 am

Carrick, I don’t propose to go on with this. It’s way off topic, and for some reason, seems to have strayed into gotcha land.
fla says:

July 16, 2009 at 5:43 am

Nick,

That’s a quite ungracious concession. Instead of playing the “off topic” card, why not just openly admit you were wrong (which seems clear from the discussion) and then everyone can move on?
Jorge says:

July 16, 2009 at 6:00 am

Carrick –

“Try applying a Butterworth to data that has a large DC offset. You get massive transients at the endpoints, …”

I hope you do not mind if I ask for further clarification on this as I still cannot see that the problem of a transient exists at the end of the series, as opposed to the start, with a causal IIR filter. It is certain that a Butterworth filter, particularly of high order, will ring on both rising and falling edges of a square wave. In my experience, a Bessel filter is the only choice if one wishes to maintain the fidelity of pulses in a waveform.

I can clearly see the problem at the start of the series as the filter tries to deal with the sudden appearance of the DC offset and this can only be dealt with by some form of tapered padding to preceed the arrival of the first real data. What is causing a problem for me is the need for any padding at all at the end of the series.

By definition, a causal filter has no knowledge of the data beyond the end of the series. Certainly, if one placed data values of zero as the potential next values one would get ringing if one tried to process them. If in fact, the observed values do include a sudden change approaching the end point, this undesirable property of the Butterworth will undoubtedly show up.

Provided the DC offset is still present at the last observed value, there is no problem and no need to pad with invented tapered data as it would not be used in any case.

I have the feeling that the transients you find at the end are really coming from the particular observations themselves. Clearly, if we are talking about the two pass acausal filter, tapered padding at both ends is essential.
Nick Stokes says:

July 16, 2009 at 6:18 am

fla
In what respect do you think I was wrong?
fla says:

July 16, 2009 at 8:23 am

Nick,

So you stand by this statement: “Incidentally, I think it is still unstable, generally, with constant k0, except in the special case where the characteristic quadratic has roots of near-equal magnitude.” ?
Carrick says:

July 16, 2009 at 8:36 am

Jorge:

I hope you do not mind if I ask for further clarification on this as I still cannot see that the problem of a transient exists at the end of the series, as opposed to the start, with a causal IIR filter.

I apologize for missing this from your original comment, which was regarding padding past the ending point of the series in a causal IIR filter, not “end points” in general.

You are of course correct in your comment that you only need to worry about transients at the beginning of the series, unless (as I usually do) you time-shift the series to remove the delay introduced by the filter.

Since I often overlay the original data with the filtered data, time shifting is usually necessary to make sense of it.

In general (meaning there are causal IIR filters for which this isn’t much of an issue), if you want the filtered series time-aligned with the original one, you still need to do something with points past the end of the original series and a problem with transients still exists, or you have to truncate the series.

Clearly, if we are talking about the two pass acausal filter, tapered padding at both ends is essential

We are in complete agreement there. Matlab’s FILTFILT as I mentioned implements acausal (foward+backwards) IIR filters so it has to pad both ends.
Carrick says:

July 16, 2009 at 8:59 am

BTW, the MATLAB code uses reflect and flip rather than tapered padding… There’s some discussion of it here.
Jorge says:

July 16, 2009 at 11:11 am

Carrick –

Thanks for the explanation. We are now on the same page 🙂

What does seem clear is that every smoothing/low pass filtering method has problems at the end point. Either one has to effectively truncate the series or invent some future data to avoid a lag.

This is a big problem when your time series is from 1990-2008 where the smoothed curve for the last 14 years is somewhat dubious because of invention. The alternative of truncation would not leave one much of a series at all.

Anyway, if this kind of nonsense can be published in a prestige journal and then be cited hundreds of times it must have something going for it. As they say, itÂ´s a funny old world.
Nick Stokes says:

July 16, 2009 at 2:01 pm

fla,
Yes, that statement is correct, and relates to the general discussion I gave of stability of recurrence relation solution. I hadn’t looked at the differential equation in detail, because I thought C was just asking about reasons for a particular instability. He then pointed out that the de solutions were sinusoids, which implied a stable de. I then said “the two quadratic solutions that I referred to (a and b) are close in magnitude. So itâ€™s fairly stable.” This relates it to the exception in the quote that you gave.

He then started querying the diff between “fairly stable” and stable, and at that stage I decided it was pointless gotcha stuff. But yes, for the record, I could have omitted the cautious qualifier “fairly”.
Carrick says:

July 16, 2009 at 8:13 pm

NIke Stokes, asking you to explain your own terminology is hardly playing gotcha. Also “fairly stable” is more like “weasel words” than a “cautious qualifier”: Either it’s stable or it’s not.

Also the (full) Helmholtz equation is one of the more standard PDEs in science. I’m surprised you haven’t encountered it yet.

I’ve reached my quota on beating dead wookies for the day, so I’ll stop with that.

Comments are closed.

The Blackboard