One sign of an important paper is to count the number of immediate responses. By that measure, Schwartz 2007 “Heat capacity, time constant and sensitivity of the Earth’s climate system ” appears to have made quite an impact.
The impact is not surprising as this paper represents the relatively rare attempts to estimate climate sensitivity empirically rather than using models. The empirical value turned out to be much lower than generally predicted by climate models.
It appears that three comments on Schwartz’s paper will soon appear in J. Geophys Res. along with Steven Schwartz’s has also published his response. The various papers are a) Schwartz’s response, Scafetta’s comment, Foster, Annan, Schmidt and Mann and Knutti (for which I have no link). I’ve looked at all but Knutti.
Since these things are likely to be discussed, and I have posted my initial thoughts in comments, I thought I’d comment on Scafetta’s very good paper, and Schwartz’s response to Scafetta’s comment. Later I will comment on FASM (though I have already said much of what I think in comments on Friday’s blog post.)
Comments on Scafetta
Scafetta’s approach to Schwartz’s paper was to use the idea in the paper as a springboard for further analysis. In some ways, I see the initial Schwartz paper as “the great idea”, that idea being “Let’s try to estimate the climate time constant based on data. Here’s my first shot!” I see Scafetta’s paper as one that takes the great idea and improves method of applying the idea to get a more accurate answer.
This is the way things work in science. Mistakes may be made along the way, but we need to start out with ideas that permit further analysis.
What’s the great idea?
So, Schwartz’s idea was that we could treat the earth’s climate as a simple “lumped parameter” with one time constant and futher assumed that this “climate lump” was being heated at a rate that, on average, increased linearly with time, but also varied randomly. The random variation in heating was assumed to be “white noise”.
Under this assumption, Schwartz concluded the random component of surface temperature of the planet would be something called “red noise”. Moreover importantly, if one detrend the average temperature, calculated an correlation, ρ of lagged residuals, the correlation, ρ would be a constant. Based on that constant, we could estimate the time constant of the planet.
So, basically, one does some calculations, plots, and read a number off the plot. Schwartz concluded the time constant of the climate was about 5 ±1 years.
But there were problems.. .
Scafetta’s paper noted that Schwartz’s analysis of the time series for Global Mean Surface Temperature is imperfect and might not give the correct result for the time constant for three reasons.
- One is that the autocorrelation (ρ) as a function of lag time in Schwartz 2007 is not a constant function of time, as would have been suggested if the temperature series obeyed the simplified equation for the earth’s climate system described in Schwartz.
- Scafetta also noted that assuming the underlying trend in temperature over time did not suggest a linear increase in temperature over time.
- Finally Scafetta also discussed the known biases that arise when estimating the correlation in lagged residuals from data.
To deal with these issues, Scafetta did three separate things:
- He treated the temperature series as having two time constants: one faster, one slower. This would make sense if, for example, we thought the air or ocean made up two systems. (Alternatively, one could image the air, ocean and deep ocean make up three interconnected systems, or break up the planet in more and more pieces.)
- He de-trended the temperature series two different ways. First, he assumed a linear trends, as did Schwartz. But, recognizing the shortcoming of this method, Scafetta also detrended using the estimate for the mean surface temperature as a function of time as predicted from GISS model results.
- He performed an analysis to estimate the possible bias the estimating the autocorrelation.
Scafetta’s results for the time constant are: a) τ1=0.4 ±0.1 years and τ2=8.7 ±2 years under linear detrending and b)) τ1=0.4 years and τ2=8.1 ±2 years detrending using GISS model data. Scafetta then considers the effect of bias in estimating the autocorrelation from short data samples, and concludes the bias could be 3 years. Therefor, accounting for bias the time scale for the climate may be 12 ± 2 years. (Note: the bias may be smaller than 3 years, as its magnitude depends on the magnitude of the actual time constant.)
All three extension by Scafetta appear to me to be improvements on Schwartz.
Afterwards, assuming Schwartz’s estimate for the heat capacity is correct, Scafetta estimates the time constant for the climate most consistent with the data is 1.7 C for a doubling of CO2. Accounting for the possibility of bias the sensitivity may be as large as 2.7C for a doubling of CO2.
How does Schwartz respond to Scafetta?
Schwartz seems to more or less endorse Scafetta’s observation.
Schwartz discusses the physical basis for the possibility of time scales of approximately months and accepts the two time scale system as both plausible and an improvement on Schwartz’s first analysis. After accepting this possibility, Schwartz then explain that, owing to the shortness of the 4 month time scale, we could assume the time scale varies as a function of lag as follows:
where ρ(Δt) is the autocorrelation measured as a function of lag time, δt, and ρo is the zero intercept at Δt = 0.
Rearranging this equation, and plotting as a function of lat time on semi-log paper, we would expect to see that at lag time large compared to 0.4 months, ρ should fall on a straight line. This is illustrated in figure 2a from Schwartz, reproduced below:

From Schwartz 2008: In the figure above, Schwartz shows series of data both before and after correcting for bias but uses a different method of bias correction than selected by Scafetta.
To obtain time constants, Schwartz plots the time constant τ calculated using equation (1) above as a function of lat time Δt. and then (I think) he eyeballs the data to obtain estimates of 8.8±2 years and 7.2±1.5 years.
Ultimately, Schwartz revises his estimate of the time constant to 8.5±2.5 years, which is noticeably larger than 5 ±1 year reported in the original paper.
How does this relate to my earlier blog post?
Those familiar with my first blog post discussing Schwartz will recall I observed the even if the earth’s climate obeyed the equation described by Schwartz, the data should obey the relationship discussed above and performed a linear fit on ln(ρ) vs time. I obtained 7.2 ± 2 years based on annual average GISS land ocean data and 8± 2 years based on annual average MetStation data. These magnitude are comparable to the values obtained by Schwartz– and well should be as the two methods are equivalent.
However, I did not account for bias in anyway. (This is because I was entirely unaware of the issue.)
Because I seem compelled to always look at data myself, I will be examining different methods of accounting for bias to see how they might affect my estimates and my uncertainty intervals.
However, as my analysis currently stands my estimates are comparable to those of Scafetta and Schwartz, and were obtained entirely independently. I’m reasonably sure neither read my blog back in December, (when the blog was experiencing 2 visits a day, mostly from me.) I had certainly not read their papers, which were posted on the web quite recently. So, it’s clear that people with different backgrounds are all making similar (though slightly different) arguments and coming forward with similar results.
What does a time scale of 8 years mean?
First, this estimate for the time constant of the earth and the climate sensitivity is much lower than for models. So, there is an inconsistency here.
All other things being equal, the lower time constants means:
- Less “heat in the pipeline” than predicted by models.
- Possibly lower total climate sensitivity– if we also believe Schwartz’ estimate of the climate heat capacity.
Obviously, the method used by Schwartz, and Scafetta involve a great deal of approximation, but approximation is common in science. However, despite the huge amount of simplification, when the very simple model in Schwartz first paper is corrected by recognizing either noise or very short time scale weather noise, the simple model seems to give fairly decent representations of the major features autocorrelation in lagged residuals for measurements of GMST.
Given the reasonable agreement with data, it is difficult to wave away the estimate for 8 years as simply “wrong”.
So, what are we to make of the mismatch between models and this estimate?
Steven Schwartz discussed four possible explanations for the mismatch: a) The 8 year estimate could be in error due to uncertainties in the measured data, b)The time record may be too short to use this method to estimate the time constant, c) The method used to infer the time constant could be inappropriate and, d) Climate models may be inaccurate.
Figuring out which of the four applies will require thought and work. Obviously, researchers will continue to strive to resolve the mis-match between models estimate for the time constant and Schwartz estimate. In the meantime, it appears a time constant of 8 years falls in the range of “plausible” based on 125 years of measurements of the Earth’s mean surface temperature.
Hi Lucia – reasonable analysis. But realize this kind of analysis only provides a lower bound on climate sensitivity, not an upper bound, because there certainly are slow processes that are not accounted for. The variations of temperature within the 100+ year record up to now do not include any substantial amplitude (aside from the long-term trend, which is subtracted out!) of low-frequency variation. Think of what you’re doing with the correlation calculation in the frequency domain: if you do a Fourier transform of the temperature data then the amplitude of the fast variations is large, and the amplitude of the slow variations is small. When you look at the correlations, any relaxation processes on a particular timescale are weighted by the associated frequency-domain amplitude, so with the data series we have, you’re going to strongly favor time-scales similar to the largest amplitude components. In particular, there’s certainly seasonal variation which would weight relaxations around the 1-year timescale, and there’s known variation with the 11-year solar cycle with would weight relaxations around that length. But I doubt there’s enough data of sufficient quality at this point to detect any relaxation time longer than about 10 years…
Arthur–
Yes. There are difficulties, though I’m not entirely sure this gives a lower bound, particularly not after correcting for bias in the autocorrelation and figuring out what it is.
I’m not entirely sure what you are suggesting with the spectral discussion. I think I might know though.
Ordinarly, in a lab we use an instrument with a very small response time. We often even know that response time. Then, if we wish to know the amount of energy at long wavelengths, we need to sample a long long time. Preferably many cycles of the longest wavelength. But in this case, we know the instrument time constant, but want to know the features of some other system we are studying. (Possibly turbulent flow in a wake of a bluff body? Whatever.)
But here, we are doing the opposite problem. We are trying to detect the time constant of the instrument.
In this case, what we need is many multiples of the time constant of the instrument. Then, we need to have some clue what the forcing function applied to the instrument might be.
Believe it or not, sticking an instrument in a known controlled ‘forcing’ field and watching the response is something we actually do. Depending on the “forcing”, you don’t always need to take data for very long.
That said, I think there is a “time” problem– but it’s one that is discussed by Scafetta and by Schwartz.
I’m going to be testing with manufactured data to test the question if the response of the system is very long, what do we get as an answer. (And the question is, how much do we underestimate this?) I’ll tell you what I find after I do it. I’ve set my spreadsheet up to do this.
It’s sort of repeating what Scafetta did, but in my own “lucia” way, because I guess I like doing this sort of thing.
Lucia: I believe Nicola Scafetta is a she. If you discover I’m right and make the pronoun corrections, feel free to delete this.
Regards
Bob– I emailed to ask Nicola. I’d done a google search, and can’t find photographs or any use of he or she in articles! So, I don’t know if “Nicola” is the equivalent of Nicolas or Nicole. It’s very embarrassing to be wrong on this. 🙁
Lucia,
thank you for a very good summary of my paper. I could not have done a better job! 🙂
BTW, I am “he”. “Nicola” is the Italian male name for English “Nicholas” 🙂
My mistake.
Steve McIntyre wrote in http://www.climateaudit.org/?p=2451: “I’ve not spent as much time on Nicola Scafetta’s work as I would like to have. I spent about an hour with him at last year’s AGU and he gave an impressive explanation of his ideas.”
I should’ve checked beforehand. Sorry to waste your time.
Lucia could you please delete the above. The conumdrum above was solved in writing and my attempt to try and delete rather than edit my comment obviously failed. Remove this comment as well would you. Thanks.
Lucia says: I deleted it.
Do we have a dataset where we are sure all the heat island effect has been excised? If we had such a dataset and we went through the same processes it would be interesting to see what the supposed effect of a doubling was then. We wouldn’t want to have this 1.7 getting out there and being quoted as gospel if its just a function of the way they rig up their data at Goddard.
A puzzle to me with Schwartz’ original paper was the lack of attention given to his query on p 14
It seems to me that the answer is clearly yes, it is a high-pass filter, and takes out longer term processes. Before detrending, SS got a relaxation time Ï„ of 15-17 years, which would have returned a sensitivity right in the usual range. Afterwards, the time reduced to 5 years, leading to a low sensitivity.
Pliny–
Scaffetta adddressed the detrending issue by detrending two ways. First, he did a linear detrend as in Schwartz’s first paper. Second, he detrended using GISS model hindcasts for the trend. The time constant was shorter using the non-linear detrend, but only slightly so. So, the linear detrend seemed to lengthen the time scale. (That is, the opposite of what you suggest.)
Actually, it makes sense that the linear detrend lengthens the time scale because the slowly varying deviations from the trend associated with the slow variations in GHG’s etc look like low period noise. A climate with a short time scale tracks those. So, if you assume the climate is red, but it tracks a “slow” period, the time scale of the temperature signal looks slow.
The Schwartz method, as in paper 1, really needed external noise to be white, not white with a huge slow cycle added to it, not white with measurement noise, but white!
The reason the newer estimates for the time scale are higher is because Scafetta, Schwartz (and me in my blog) correct for the high frequency noise in the data.
Graeme: I could be wrong, but I don’t think the urban heat island effect is a big issue to this method, except in so far as it affects detrending. The urban heat island is important to estimating the precise underlying trend from the data.
The discussion of Nicola’s gender was a neat example of the scientific method in action. First there was the assumption of male, this was questioned by a skeptic, evidence was sought but only via proxies, finally the matter was settled when the subject spoke for himself. Empiricism rules! Cheers everyone.
Well, Lucia, the first thing to say about Scafetta’s using two detrending methods is that, as he says, the difference in results is insignificant. I think the reason is that they really do the same thing. Whenever you subtract out a smooth function which has been fitted to the signal, you will tend to reduce any low-frequency signal present. That is because that signal will be identified (in part) with the smooth function. That is why detrending acts as a high-pass filter. If you did a linear detrending of a half-sinusoid, from its minimum to its max, you would greatly reduce the amplitude of the signal.
So I’m not reassured that Scafetta tried two methods of detrending; I think any method is likely to suppress the longer time constant processes. Redness or autocorrelation is not the issue here; it is just frequency.
I have to say that the second thing that bothered me about Scafetta’s approach is that he introduces two time constants, and then after fitting, arbitrarily takes the larger one to be the one that determines sensitivity. I’m still trying to work that one out.
Pliny–
On the issue of the two time constants in Scafetta, I agree there are “issues”. The reason I didn’t go for a two time constant process in my Dec. comment on Schwartz is that I strongly suspect the system has quite a bit of measurement noise. (In fact, if we subtract GISS from HadCrut on a year by year basis and remove, then we get back an amount of “noise” of the order of magnitude I find in my analysis.)
It’s possible the “measurement noise” has color for a variety of physical reasons related to how the instruments are deployed. But I don’t think it’s possible to resolve the difference between my guess that the extra process is measurement noise or climate physics.
If the shorter time constant is climate physics, the physical mechanism does need further exploration. Neither Schwartz nor Scafetta did that fully. This leaves that issue “open”. But that’s no real criticism of either the comment nor the response. Comments and responses to published papers are necessarily short, and more incomplete than full papers. Editors want these things to be short.
So, we may see more from both Schwartz and Scafetta in that regard.
On the detrending: Your comment brings up an issue similar to that discussed by Arthur. I could be wrong, but I’m pretty sure this issue doesn’t much affect the accuracy of measuring the time constant of the climate. It is a well known issue when measuring energy spectrum of the system using an instrument with a known problem. One is sort of the “inverse” problem compared to the other. (I don’t mean inverse in the pure mathematical sense; hence the scare quotes “”.)
However, I admit I could be wrong on my opinion about this, and you and Arthur could be right. So I’m going to be doing some informal “blog quality” testing. (Using EXCEL!) It will take me a little while, but afterwards, I’ll be able to state an opinion based on actually looking into the issue specifically.
My blog philosophy is: I post back of the envelop stuff people often do and shove in drawers. Many of these things don’t merit writing up formally, but they are worth making available. (Soon, I need to organize my categories so people can find them!)
It seems to me that using the GISS model to detrend really amounts to assuming what the forcings are. The linear detrend assumed a linear forcing (say exponential CO2 with logarithmic response).
That said, there are two things that one should keep in mind. First, nothing says that the forcings are only GHG. It’s the combination of all forcings, including solar. The sensitivity one gets is K/(W/m2), and is really quite agnostic in terms of what forcing gives the W/m2 part.
Secondly, the GISS model itself makes a lot of assumptions regarding forcings. In particular, it assumes aerosol forcings that litteraly mimmick the temperature evolution of the 20th century, to the point that it’s ridiculous, given that the uncertainty of our knowledge on aerosols pre-1990 is about 1000%. But that’s the only way that Hansen can get a good agreement with observations. In the end, therefore, it is really a high-pass filter, disguised as the “actual” forcing.
The choice of two time constants seems a bit arbitrary. Why two and not three or four? Actually, I’ve had to deal before with effects in disordered media (like glass or polymers) that involve a distribution of time constants. For example, when one looks at the relaxation curve, it’s no more an exponential, but rather a “stretched” exponential, of the form exp(-a*t^b), with exponent “b” being smaller than one. I wonder if that could be applied here.
A final comment. It would be interesting to use either ocean temperatures, or ocean heat content, as the basis for the analysis. Since oceans are where the heat is actually stored, maybe the result would have more physical significance.
Francois O
Yes. Detrending this way is basically following the assumptions that
a) the mean forcings are well understood and correctly applied to the AOGCM code, and
b) the climate model results in the correct result for the ensemble average for the temperature variation over time when driven by the mean forcings.
On the one hand, there are those who don’t believe either (a) or (b). But those people also don’t believe the climate models estimate of the time constants near 20-30 years. So, that leaves no estimate whatsoever. In which case, 8 years would at least represent some sort of empirical estimate. It might be wrong, but we are left with no estimate at all. (Which is ok. Sometimes we just don’t know.)
On the other hand, there are those who believe both (a) and (b). In that case, detrending using the model estimate seems plausible. So, what Scafetta did is precisely correct under the assumptions that both (a) and (b) are correct.
This observation will be very important when discussing the FASM response to Schwartz. Applying the initial Schwartz method to GCM data would only be plausible if the AOGCM runs included “noise” in their forcings. The TSI has loads of noise. It’s not exactly white but there are “shitwads” of energy at high frequencies. There are other external factors making the external forcing term noisy. These include farmers in Nebraska plowing their field in spring, people in various countries spewing more or less aerosols depending on the state of the economy, the occasional minor volcanic eruption that washes out quickly. All of these things are smoothed in the input file for GCM’s. So, the assumption that extrenal forces are white is not replicated in GCM’s.
On the choice of two time constants: I am of the opinion the shorter one is measurement uncertainty. So, that makes it non-arbitrary, and I set mine to zero. But I could be wrong. Also, I could easily concoct a physical argument for the short one and the long one. But there will be some arbitrariness.
On the general issue: I agree that it would be interesting to do more analyses and use ocean temperatures or ocean heat content. I bet we’ll see some soon– provided the data are available.
The reason I think Schwartz paper was mostly brilliant is he’s got people trying to get empirical estimates to test models. His first paper was very approximate, but he admitted most of the “flaws” the critics are bringing forward in the paper itself.
Hi Lucia – yes, definitely further analysis is warranted – I’ve been looking at FFT’s of the HADCRU temperature series (since it’s the longest) – 1899 points, so close to 2048 which works for Excel’s “Fourier analysis” procedure. That comes to just over 170 years time-span, so the minimum frequency interval you can look at in the analysis is 1/170 years, and realistically you don’t get a reliable curve until you’re at several multiples of that, at least with this procedure. So it’s hard to tell anything about frequency patterns in the temperature data at longer than 50-year time scale.
Do you know if anybody else has just looked at the spectrum of temperature variations like this in the past? It seems a pretty obvious thing to do.
I’ll need to spend a bit more time before I can show a graph, but the main features seem to be:
(1) there’s a very robust peak at a frequency of 1 year^-1: you see that on the 2048 graph, and on FT’s of 1024 and 512 points corresponding to 43 and 85-year time spans in the CRU data. I haven’t looked at the other data sets yet, but so far the strong annual peak (and subsidiary peaks at multiples of that frequency – i.e. 6 months, 4 months, 3 months…) is clear.
(2) There’s another robust peak at about 0.28 year^-1 (3.6 year period)
(3) Another robust peak is at about 0.12 year^-1 (8.3 year period). However, it’s missing in one older data series I looked at (1922-1965, 512 points), where it seems to shift to about 0.094 year^-1 (10.6 year); the full series seems to show two peaks, one at 0.12 and a second (about half the height) at 0.094.
(4) There appears to be another robust peak around 0.05 year^-1 (20 year period)
(5) The low-frequency pattern is very unstable when analyzing the data different ways.
In particular, the raw and de-trended temperature series differ considerably at low frequencies – of course the 0-frequency component is 0 for the de-trended graph (it’s just the average value across the series) but detrending also removed considerable weight from the next 4 points in the 2048 FFT, i.e. I wouldn’t consider any of the spectrum of temperature variation at lower than 0.025 year^-1 (40 year period) reliable from this data.
Correction to my feature #3 – the two peak frequencies in the 2048 FFT’s are at 0.094 and 0.111 year^-1 (10.6 year and 9 year period, respectively). Of course the resolution is only 0.006 year^-1 so that would be 9 +- 0.5 years, and 10.6 +- 0.7 years – anyway, just wanted to note that the shorter (stronger) period is probably slightly longer than the 8.3 I said originally.
Arthur–
I absolutely agree that you need more time to accurately get the correct amount of energy, and identify the correct shape of the spectrum particularly at the large scales. What I’m not sure of is the amount of error that propagates into the estimate of the response time of the system under this particular method.
Post the graphs, and I’ll add known deviations at different frequencies to my blog quality exploratory uncertainty estimates. That hasn’t been done, and it will be interesting to see what happens. (If you don’t have a place to store the graphs, I’ll post.)
I think empirical estimates are important– and I think the most positive contribution of Schwartz is to get the ball rolling and motivate people to get them. Even he doesn’t claim his numbers are beyond reproach– and he says so in his article!
(I’m also going to have to dig out “Lumpy” again!)
Oh– Arthur, If we can think of a better approach to the estimate, it could be worth writing paper describing the method and providing the results. As the short comings of the current approach are identified and method to correct them implemented, we may actually eventually find the correct value.
Lucia,
Great to hear that you’re thinking of experiments. I’d like to do some too, but won’t be able to for a day or so. But what I would suggest is this. Make up some synthetic data with an IPCC-type time constant (just one) of about 15 years and some noise. Put it through the detrender, and try to recover the time constant.
Arthur (again, heh!)
Oddly enough, peak near 1 year has been noted in the difference in anomalies between measurement groups. Atmos first noticed it here:
http://atmoz.org/blog/2008/04/21/interannual-divergence-in-satellite-temperature-records/#more-506
And because I was intrigued I toss out my theory explaining it here:
http://rankexploits.com/musings/2008/uha-vs-rss-temperature-differences-why-energy-at-1-year/
I don’t know if my explanation is correct. But I think it will be worth comparing the amount of energy at different frequencies by doing the test Atmoz did. This will give us insight into whether that energy is more likely “climate physics” or “measurement uncertainty”. (Go tell Atmoz he was brilliant for looking at this in the first place. )
But as you can see, my first inclination when I see odd things is to ask, “Could it be measurement uncertainty?” And when testing theories I always ask myself: “How would measurement uncertainty affect my confidence in the answer?”
Depending on the analysis method, even small amounts of measurement uncertainty can really screw things up in ways people who always use models and synthetic data sometimes don’t anticipate. These things are important to know when you are designing an experiment because you want to make sure you could conceivably answer the question you asked before you build your test rig and purchase your equipment.
pliny– Your thoughts are more forward looking than my first steps. First, I’m going to see what happens with no trend at all. I’ll report that. If we find lots of error this way, we’ll know that detrending only makes things worse. But, if we get ok answers with no detrending, then well move on to your suggestion.
Lucia, I agree with some of the things Francois, has said, in particulat looking only at SST as opposed to
Land and Sea. Just a thought
Steve– I’ll look at SST only. But, it will be deferred until I get a handle on the uncertainties in the method using “pseudo-data”. The reason is that I think most of us will find any furture results more convincing if we figure out the plausible method to estimate uncertainties before we get an answer!
One of the reason I like more standard method of hypothesis testing to idiosyncratic ones is that the standard method gives us a systematic way of estimating biases and uncertainties. Even in the event that we know the standard method is not perfect, this is still better than everyone just giving their opinion without doing numbers.
Of course, this is a blog, so I’ll be doing it all at “blog quality”. (OTOH. Every rational person does things “blog quality” first to at least figure out which things couldn’t possibly work if you throw all the analytical bells and whistles at the data hoping to get a “peer reviewed” quality result.)
So… you’ll probably all laugh when you see my “excel montecarlo” estimates of biaes based on 10 sets of “pseudo-data” generated results. But it will at least let me know which things deserve more detailed investigations. Ten is often enough to see if a hypothetical problem is so small we don’t need to look at it, or so large one could never ever figure out the correct result using the data at hand.
With regard to things Arthur is asking about the limitations of 125 years per se, this is a real open issue.
Hi Lucia – actually, I just started doing the same fourier analysis process with the RSS data (first pass was HADCRU) and there’s *NO* 1-year peak in that, but there are clear 3.6 and 9-year peaks at exactly the same positions as in the HADCRU data. So I suspect there’s something in the Hadley data that’s skewing things in a seasonal fashion (improper weighting of the land stations?) that’s not there in reality… or maybe it should be there and RSS is missing it? Anyway, I want to look through the other datasets first to see how stable these patterns are.
Arthur, somewhere recently on one of the blogs someone showed that there was an ~1 year peak in the FT of the difference between UAH and RSS.
Scafetta’s use of a model to “detrend” the temperature record seemed to strange to me given his and Bruce West’s earlier work in which they concluded that models (like the GISS model) may underestimate the contribution of solar to 20th century warming.
apsmith, yes, I think FFT’s are the best approach. But shouldn’t you be looking for peaks in the complex Fourier domain? These would correspond to the decay time constants that SS and co are looking for. The imaginary part would be the inverse time constant.
Lucia, I proposed using synthetic data to test the algorithm for recovering relaxation time constants. I see now that it has been done. Tamino, at RC, did it even before the SS paper had appeared in JGR. He used timelags of 5 years, and shows that the autocorrelations look quite different to those from real temp data, but also the recovered relaxation times are substantially less than five years. In this case it reflects bias; I think that at 15 years, the effect of detrending would add significantly.
The comments to appear that you link to, by Foster et al (not unrelated, I know) have a more thorough working with AR(1) in their section 5. They show, for example, that if you put in a relaxation time of 30 years, you get back about 13 years. Detrending would hurt there.
The SS paper has been criticised for poorly representing the atmosphere. However, these criticisms do not relate to that; they say that even if the temperature really was AR(1), the analysis would give a badly wrong answer.
On Scafetta, I’ve realised what is wrong with adding a second AR(1). It isn’t initially obvious why sensitivity should be related to the relaxation time. The reason is explained (not well) in SS eqns (4-7), or better by Foster et al in their eq (4):
C dT/dt = F − λT + e
To relate the constants, you need both the small-time (initial) and large-time limits of this differential equation. The problem with Scafetta’s model is that the temperature with two AR(1)s does not satisfy this equation, or any simple modification. You can’t say that the short-term process can be ignored in the long-term limit, because the initial condition is also needed.
The second paragraph from above:
“The impact is not surprising as this paper represents the relatively rare attempts to estimate climate sensitivity empirically rather than using models. The empirical value turned out to be much lower than generally predicted by climate models.”
is followed later by:
“Steven Schwartz discussed four possible explanations for the mismatch: a) The 8 year estimate could be in error due to uncertainties in the measured data, b)The time record may be too short to use this method to estimate the time constant, c) The method used to infer the time constant could be inappropriate and, d) Climate models may be inaccurate.”
Surely part of the validation of the climate models is to test the validity of the prediction against the data?
D. R. Williams, if Schwartz wants to stay in good graces with th modelers (and I’d say he was on thin ice already) the last thing he would suggest is that the reason for his empirical results mismatching with the models has anything to do the ~models~ being wrong. And so he quite literally suggested this last. I know I should be more disturbed by this, but I’ve come to expect this sort of thing. Consider it astounding that he acknowledged that possibility at all!
Pliny–
Using the synthetic data is a good idea.
However, the results Tamino got are wrong.
I addressed Tamino’s post almost as soon as he posted it. His analysis comes to the wrong conclusion because he makes a classic mistake (as does every freaking peer reviewed paper on this subject).
Measured data is never precisely the same as the underlying thing. It always includes noise.
If you assume the measurement, T, is the sum of “real” GMST, Θ and white noise, u, you find
ρT=ρΘ * { 1/ (1+<uu>/<ΘΘ>) }
Many people forget about measurement noise before launching off and doing analyses, but it’s always there!
When you add the noise, the data look almost precisely the way they should if they follow the IPCC equation. (Arthur’s looking at some peaks in the spectrum– but the peak Tamino saw was due to another feature– integration.)
I’m repeating those steps but adding the noise. I’ll be explaining more about the noise and the integration when I have the first “chunk” done. (Then, I’m hoping Arthur will tell me where he finds robust peaks!)
@Lucia,
Again from a numerically challenged ecologist.
When doing measurements I am used to deal with concepts like precicion (i.e. the inherent variability between replicates) and accuracy ( the trueness of the measurement WRT to the “real” value”), and the eventual distributions of these.
Could you elaborate a bit on “white noise”? How is it related to accuracy and precicion (if related at all)? Is it an amalgamate of the two?
Does the *white* signify that it is distribution-free?.
Cassanders
In Cod we trust
What are the results if you do the analysis for the Northern Hemisphere alone and for the Southern Hemisphere alone?
Cassanders,
With regard to individual temperature measurements, white noise mostly relates to precision. If we examined a daily temperature measurement, and for some reason, we expected the imprecision of the individual measurement to have a standard error of 0.1 C, then 0.1 C would be the magnitude of the standard error of the white noise.
But, the 0.1 C only tells you how big the white noise is. The feature that makes it white is that the error from 1 measurement is uncorrelated from the any other measurement. This happens, when, the errors on say “day 2” have nothing to do with the errors made on “day 1”.
In contrast, if the errors on “day 2” are somehow related to the errors on “day 1”, then the noise will not be “white”. It might be red, pink, blue, 2nd order blah, blah, blah. (For example, if you sample temperature once a second using a thermocouple with a 5 minute response time, the errors will look “red”. This is because it takes the thermometer 5 minutes to respond to a change in temperature. However, if you sample every 24 hours using this thermocouple, the noise will look “white”, because the error due to the response time is tiny and all you are left with is error due to other factors (like calibration error, electrical noise etc.)
Most decent experimentalists do their best to make sure the random component of errors in the data are white. This makes it much easier to deal with data for most down stream processing purposes.
Because most experimentalists do this, many downstream users then forget that even white noise doesn’t always average out. It averages out when you are calculating the mean. But, for example, in turbulence, the experimentalist is actually trying to measure standard deviations and correlations. When measuring these, you must never forget that the white noise doesn’t average out. You need to know how big it is and subtract it. (Better yet, make sure it’s small enough to not matter.)
Doug– I don’t know the answer to that question. I think Schwartz compared in his first paper and found they were similar in magnitude. However, I’m not sure.
Lucia,
Thanks to the pointers to your previous comments on Tamino. I’ve been reading them, and also the post by UC at CA.
However, the part of Tamino’s work that I was noting was where he generated AR(1) data with a 5 year lag, put it through the analysis, and came up with a shorter relaxation time. I can’t see that that would be affected by the noise of experimental measurement. It’s just a statistical processing issue. Incidentally, I note that UC seems to have done the same thing, but with sometimes longer relaxation times as a result. Neither T nor UC where very explicit about what they did at this stage (though UC gave R code for the GISS treatment).
It’s actually a bit unclear to me what you are suggesting. Is it that data should be synthesized with an AR(1) process, then have white noise added? Is it the difference from the pink noise of the AR(1) that matters?
Anyway, I’ll keep trying to get through your Dec post, UC’s posts and the CA discussion. Incidentally I noticed that Francois O and UC both brought up the detrending issue, and I haven’t seen a real answer to it.
Pliny–
I know what I’m saying is unclear– because I plan to document in more detail after I run the synthetic data with the white noise.
However, what I am saying is:
a) there is a bias.
b) the adding the white noise to the synthetic process may make a difference in the magnitude of the bias. (I don’t know how much though, but it’s not zero.)
d) adding white noise to the synthetic process may make a difference to the standard deviation in the population of time constant determined experimentally. (I don’t know if it makes it larger or smaller, and in particular, I don’t know if this will matter to my method or Schwartz’s.)
e) Tamino (FASM) didn’t determine the magnitude of bias or standard deviation with white noise added to the synthetic process process.
So, while there is a bias and there is a large standard devaition, I don’t currently know how large the bias or the standard deviation is/are!
I want to run my method with a bias correction. Also, for my method, I need to know the bias over a large range of ρ because I apply a linear fit to the ρ vs lag time data, so the information in FASM isn’t enough for me to go on. (And given what Schwartz said in his response, it appears there is no standard method. So, as long as I’m doing the synthetic data thing, I’m just going to estimate the biase with the synthetic data!)
I’m going to deal with detrending issues after looking at the problem with no detrending. I’ll look at the issue Arthur addresssed then too. The reason for this is that the bias/scatter issue exists both with and without detrending, and with/without the “spikes” in the spectrum. So, to deal with all of them, I need to do this first.
I’ll probably be posting today. 🙂
I’m feeling still quite new to this empirical temperature analysis business, so please forgive me if this is an obvious question – I’m also re-reading Schwartz’s 2007 paper to try to understand it all better. If somebody here has better references on the subject I’d love to read them! But anyway, here’s my question:
Isn’t the whole point of the relaxation time (tau) to be a measure of the response of temperature to forcing? I.e. don’t you somehow need to separate out the driving causes of temperature change from the temperature response? So how can we possibly hope to extract it from the temperature time series alone? I feel I’m missing some key insight here!
Yep!!! In the full problem, you need to know the spectral properties of both the random component and the mean component.
Schwartz gets around the need to know the forcing using the classic method: Making an assumption! 🙂
Schwartz assumes the random component of the external forcing for the temperature anomaly is white. This is certainly not exact, and may or may not be a good approximation. In principle, the definition of “anomaly” takes care of the annual periodicity (and this can be shown.) However, if the random component of the external component is not spectrally white (or nearly so), this will introduce errors.
In his Schwartz’s first paper, and in his reply to comments, Schwartz assumed the mean component is a linear ramp function that’s been going on “forever”. That’s the assumption that permits linear detrending.
So, Schwartz made two assumptions to deal with the exact issue you are pondering. As with all assumptions, either one or the other could be sufficiently wrong to result in inaccurate answers. (This is why showing the autocorrelation looks “right” is an important fiduciary step. If the data looked self-inconsistent with the assumptions in any way, that would cast serious doubt on the method.)
Scafetta tried to account for the non-linear nature of the mean component by by detrending using GISS model mean data but also did a linear detrend. This made very little difference to the answer. (This likely means the amount of ‘false noise’ introduce by detrending is relatively small compared to the amount of real, honest to-goodness random “noise” in the extrenal forcing to the system. )
Lucia – that makes sense – but the variations in the temperatures, with or without detrending, which is just the simple thing I’ve been looking at (and I thought that’s what Schwartz etc were looking at), are definitely not “white”! The periodic variations are surely from periodic forcings, whether external (year, solar cycle?) or internal oscillations in the climate system – so do you have to also fit the magnitudes of those, and subtract them out, to properly do Schwartz’s method? I notice Schwartz 2007 does mention using “deseasonalized” monthly temperature records, which I don’t think he explains – that would presumably eliminate the yearly cycles.
Is there a theoretical basis for the “white noise” assumption? I don’t see the term “white noise” in Schwartz 2007 at all…
Arthur — Schwartz cites underlying references for the white noise assumption.
The physical reasons I can think of are imperfect, but here the are:
1) If you examine the TSI charts, you” see loads of high frequency noise. That might tend towards “white”.
2) Over the course of the year, we have somewhat random things happen, which are not related specifically to weather. These include different people spewing out more or less particulates, those particulates falling on snow in different places, volcanoes erupting etc.
3) Internal behavior of the climate could add more noise to the “f” term.
So, collectively a bunch of things might tend to result in the noise being “white” or approximately so. (Or, the assumption could be bad.)
BTW. I’m adjusting what I did for the bias– still assuming the white noise is ok. I’m doing it in this order not because the stuff you are suggesting is irrelevant, but because I need to have a clue how to deal with bias in my method first.
I’m up to 15 years or so. I need to proof read though.
I have a question (and I apologize to all you who understand the physics of the Schwartz and respondents better than I — which should be just about everybody).
Everybody appears to be working with the empirical AutoCorrelation Function (ACF). That is the estimate of the autocorrelations at each lag from the detrended time series. Coming from a different discipline, economics, I almost never work with the ACF. Rather I work with the estimated ARMA model. Then once I have an estimate of the ARMA (that is stationary), I calculate the theoretical ACF from the ARMA coefficients. The reason for this is that just about all the coefficients of the empirical ACFs are full of the small but pervasion random autocorrelation one gets with the ACF of a purely random, non-autocorrelated series. (Look at the probability of the Q-statistic move up and down for the ACF of white noise, and you will get an idea of the problem.)
Further, if you are fitting a pure AR(…) model, i.e. no MA terms, then don’t you get the time constant (or is it time constants when there are more than one AR terms?) directly from the AR coefficient?
Any help (with explicit equations if possible) would be appreciated.
Thanks, Marty
Martin-
I think the problem is that the physical scientists want to connect each term in the fit to a physical meaning or at least explain where they come from. So, we would either need some explanation of what the other terms in the ARMA mean, physically, or an explanation of why they are required to get need to get rid of some known feature that comes up in the process of handling data.
So, under Schwartz’s model, the time constant of the climate is claimed to have a physical basis and be meaningful. I can justify white noise based on measurement imprecision.
The time constant comes from saying the climate obeys
dT/dt = -T/τ + α F where F is forcing and τ is the time constant and α is just another parameter.
That’s basically conservation of energy.
The idea that we can just pull that out of the series the precise way we are arises if we say F is white noise. This because an AR(1) process if and only if F is white noise. Otherwise, it’s a different process.
But I can’t connect higher order terms to the physics. Or I could come up with a physical basis for higher order terms, but I then need to say that I believe the “forcing” to be red noise or something that creates the other terms in the ARMA. (And you can see Arthur is arguing F is probably not white because T isn’t red.)
Now, mind you, the forcing isn’t white. But scientiests will often accept the possiblity of white noise because it’s the simplest thing you can pick. But if you start advancing other types of noise, they are going to demand an physically sound reason for that particular noise being appropriate for the Forcing, F.
And the problem with that to get scientists to accept other terms in F, we need to explain what they might be. I can explain adding white noise to the Temperatures– that measurement uncertainty.
On the other hand, if there is a good statistical explanation for why the terms “just arise” in a string of data even when the real underlying process for temperature, T, is red, then that can fly. (Sort of like, we use N-1 in the denominator of a standard deviation because of what happens when we estimate the variance using the sample mean.)
I just don’t know the reason the other terms in the ARMA yet. (Doesn’t mean there isn’t one. I just don’t understand the physics of math.)
Arthur,
I had trouble with that sensitivity-relaxation connection too, and Schwartz explains it poorly. But here is how I understand it. It follows from the linear system assumption. Suppose that initially T=0, and you add a new steady flux F. Then, with C=heat capacity:
dT/dt = a + bT (call this Eq 1 (eq 4 in FASM), it’s the general linear assumption, with a,b to be found)
Initially T=0, so you’re just steadily heating a body, so a=F/C
and with the assumption that the response to any perturbation has relaxation time Ï„, then b=-1/Ï„
therefore, when finally T has settled and dT/dt=0, then F/C=T/Ï„, or sensitivity T/F = Ï„/C
Note that it is essential that, for this logic, T actually satisfies Eq 1. That is what is quite wrong, I think, with Scafetta’s mixing of two AR(1) processes. You lose this essential requirement.
Andrew, my question:
“Surely part of the validation of the climate models is to test the validity of the prediction against the data?”
is directed at the modeling process, not the research that provides an estimate by examining actual data.
I’m a bit mystified at the lack of such an elementary notion as model validation. If the models do indeed predict a single time constant, then the empirical methods that do likewise should not be subject to examination and discussion as to whether one, two, or more time constants.
Indeed, it seems that the cart has been put before the horse. Why was the empirical data fit not done in 1993? Why no previous discussion about the number of time constants necessary to describe the system. Would it not be appropriate to do the analysis and have the discussion before building a model? And are these model predictions themselves an end-product, or are they used as inputs elsewhere?
sorry, a bit seems to have been lost in editing:
…or more time constants are needed.
Lucia,
Thanks, I halfway there. And now let me cause more problems, albeit the same idea.
The physical basis for an AR(1) seems to me to me the same as that for an AR(3) process. (Let me be clear on notation here: AR(1) means a first order autoregressive process. AR1 is the first order coefficient of any autoregressive process, i.e. an AR(1), AR(6) and an ARMA(3,3) all have an AR1. A higher order AR processes, say an AR(24), might have only a few non-zero coefficients. I do not believe there is any standard notation, but I use AR(1, 3, 12, 24) to denote a 24th order AR process with non-zero coefficients at lags 1, 3, 12, and 24.) An AR(1) process has geometric decay every period upon which the processes is defined, e.g. on monthly data. An AR(4) process has geometric decay every 4th period. This is elementary. But with regard to the physical world the time unit is arbitrary. Why couldn’t the fundamental unit of time be gamma0*PI*seconds? Or maybe, within our conventional time scales, why isn’t the decay nested, i.e. a decay of a decaying process? I haven’t done any work with such series, but I suspect that they have some distinct differences from a pure AR process.
Or let me get a bit nuttier here, suppose our data were continuous (we can always fit the N observations perfectly with an N-1st degree polynomial in time). Now we don’t have to look at the “delta t” — sorry I am not certain of how to paste a Greek delta — as an interval that can continuously vary instead of varying only for the natural numbers. Of course as the nuclear physics used to say to me “Die ganze Zahl schuf der liebe Gott, alles Ubrige ist Menschenwerk.” (But they were just quote Leopold Kronecker, the tormentor of Cantor.) But maybe the positive half of the real line is a bit approximation for time than various natural numbers, and then what is fundamental delta? Not clear. We could solve for the delta which gives the largest first order autocorrelation subject to being greater than 2*(1/sqrt(N)) and say that best fits the exponential decay model. Or maybe better is keep varying delta then create the discrete time series and test the AR(1) model versus the AR(K) or ARMA(K,P) model in a likelihood ratio test. If we get a clear cluster of AR(1) non-rejections, we have a division of time consistent with an AR(1) view of the world. But right now, whether it is monthly or annual, Schwartz is fitting an AR(1) shaped model into a non-AR(1) shaped data set. (Oh, in case, I hadn’t mentioned it: the serial correlation Lagrangian Multiplier test — the Breusch-Godfrey Test — fails for lag inclusion of 3 through 12 with annual data, and an ARMA(1,1) model can’t be rejected in favor of an AR(1) one.)
I guess if wanted to do that, I think the way to proceed is to fit a trend plus AR(K?) or ARMA(K?,P?), then take the AR(1) process out of the fit and subtract the remaining part from the data. Then you have — in theory — an AR(1) process plus the residual from the “true” model. The AR1 then gives you your tau: -1 / ln(AR1). End of story. No empirical correlograms or what. Just an AR(1) simplification and an estimate consistent with that simplification.
Martin,
I think you’re missing the primary purpose of this activity. It isn’t to find the best model for the time series. Schwartz’ model is a first order linear differential equation with constant coefficients – as simple as you could get. If you allow all those simplifications, the equation gives you a single time constant, and relates it to the climate sensitivity. The AR(1) model is a way of identifying this time constant. Other time series models don’t model this particular differential equation, and you lose the link to climate sensitivity, which is what it is all about.
Pliny,
You may right in that I am missing the point. But I am trying. The Schwartz article is an empirical estimate at the time constant, and what I am looking at is how he is pulling a number out of the data. So let me try saying my point another way. Look at Schwartz Figures 5 and 6. I’m saying that those patterns are AR(1). We probably all are willing to accept that whatever is generating those patterns has a positive AR1 coefficient and I am not questioning the underlying model, but I am questioning the way Schwartz goes about pulling his tau — which is just -1/ln(AR1) if the model fit well — from the data. If you say OK I am going to force the AR(1) model on the detrended data, then why not do exactly that: force it, make an estimate of the AR1, which is around 0.6 for the annual data and gives a tau of about 2 years. But taking the correlogram of the series is neither consistent or inconsistent with the underlying theory, it is — and this only my opinion, albeit a somewhat informed opinion — bad statistics.
If you build yourself a little model of the DGP, you will see if you use an AR(1,4,10,14) model with the fitted coefficients, you will generate ACF patterns similar to Schwartz’s on a regular basis. But if you stick in the AR(1) model coefficients, you will usually get a much lower tau and seldom get the pattern. Thus, if you are using the correlogram from the data, you are implicitly contradicting the AR(1) assumption on the DGP. This is what you tell me not to sweat. OK, but by using the correlogram from the data you will also be estimating something other that the time constant of an AR(1) process. Hence, I believe that it is necessary to explicitly go about extracting, estimating or whatever the AR1 value underlying the model.
Martin, I’m sure you’re right that much can be done to improve the statistical identification of the time constant. My worry is that we have to be able to then to go to the differential equation, of which the AR(1) process is a kind of discretisation, and say that the time constant determines the climate sensitivity. If a more elaborate model was fitted, we may have a better estimate of the coefficient, but with model assumptions that are inappropriate for the de. In effect, it is a better estimate for a different number. This is why I was suggesting that you can’t divorce the statistical analysis from the subsequent differential equation interpretation.
I like your idea of forcing the AR(1) model and estimating AR1, because the error could then give an indication of how good is the de model relating sensitivity to AR1.
Pliny,
Let me ask a question here which refers to Lucia’s next post (“What is the Climate Time Constant? Refining the Estimate I”) and to her previous post on estimating the time constant. How should one force the AR(1) model on the data?
As a non-physical scientist, my instincts are to just go ahead and estimate the AR(1) model temp(detrended) = Constant + B*lag1(temp(detrended)). No need to even say AR1 in the model since there are no other explanatory variables. For the GISS data, the AR1, i.e. the coefficient B of the lagged, detrended temperature, for annual data is about 0.63. For the monthly it is about 0.70.
But I could go to the ACF of the detrended data and do a nonlinear fit of the equation ACF(lag i) = B^(lag i) plus error of course. Now because I am using the geometric form I don’t have to worry about negative ACF values and can use the whole series. If do that, the AR1 for the annual goes up to 0.78 and the monthly goes up to 0.95.
In over the lowest to the highest that is a range on tau of 2 to 20. That range has nothing to do with the theory arguing for the estimation, just the question of what is the best AR1 representation of the data. As an econometrician I am going to say the annual AR(1) model has best “information” criterion although I really think the model is either some higher order AR process, an ARMA process, or maybe a fractionally differenced (long memory) ARIMA process (ARFIMA — AutorRegressive Fractionally Integrated Moving Average) — or maybe we just have the wrong units for the time scale altogether.
I should also note that when one uses the whole ACF the effect of higher order AR coefficients in the process affect ones estimate of the AR1 value unless they are accounted for explicitly. Thus, I would again come back to the direct estimation which comes back to the exclude explanatory variable bias, e.g. the AR1 coefficient in an annual ARMA(3,3) model is around 0.35, a tau of less than a year.
Now what I don’t know is suppose we have an AR(1) process. How often will we reject that process as the true model in comparison to some higher order ARMA model, AND when we do reject the AR(1), what is the best way to estimate the AR1 value? If the answer to the first part is a large percent, say 20-30%, then the answer to the second part, which is in the realm of statistics, with give us guidance. If the answer to the first part is a trivial percent, then it is back to the physics and the physical explanation of the higher order terms in the time series.
Marty
Martin,
I look at this more from a numerical DE perspective. I’ve put an electrical analogy on the latest thread. We have a linear inhomogeneous DE:
dT/dt-T/Ï„ = H where we need to think of two cases for H:
1. H=-F/C, where F is a steady (extra GHG) heat flux, and C is heat capacity, and
2. H=w(t), some stochastic process which has generated the observed values.
Case (1) gives us the sensitivity. In the latter case, to get a value of Ï„ out of what is observed, it is assumed that w(t) is a white noise process, or as nearly so as we can make it, by varying Ï„.
Discretised, to first order we have (T(n+1)-T(n))/dt – T(n)/Ï„ = W(n), which becomes an AR(1) process.
How to force it, ie make W(n) as near iid as possible? A reasonable requirement is that W(n) should have zero autocorrelation for lag 1, and I think, though I haven’t checked, that this is equivalent to what the existing process is trying to achieve. But you could adopt other limits on the A-C of W(n), and vary Ï„ to achieve them.
In terms of going to higher order, as I said earlier, I don’t think that is useful except insofar as it improves the approximation of dT/dt, and I doubt if this is critical.
I have to say that hanging this whole analysis on the assumption that observed temperatures were generated by a white noise flux seems a slender reed, but I think that is what is done.