The subject of the correlation of temperature anomalies over distance came up in the NCDC thread, and I figured it would be a good excuse to run an analysis that I’ve been meaning to do for awhile. This is not particularly novel; indeed, it was done as far back as Hansen and Lebedeff in 1987 (and probably before). But it is instructive to examine nonetheless.
To start with I used the USHCN v2.5 data from the NCDC’s FTP site. I converted the observations into anomalies relative to a 1971-2000 baseline  (which, in looking at correlation over time is not really needed, but I was doing another analysis at the time as well and it won’t affect the results). I next created all possible combinations of the 1218 USHCN stations, yielding northwards of 700,000 unique pairs. I calculated both the distance between each pair (based on the lat/lon) and the correlation between the anomalies for the 1895-2012 period, and did a simple scatter plot of one against the other:
(Click to embiggen)
Even with the smallest possible marker size, its not really possible to make out the mean and distribution particularly well from the plot. To plot the mean and confidence intervals, I looked at the data in increments of 10 miles (e.g. all pairs 0-10 miles apart, 10-20 miles apart, etc.) and calculated the mean, 5th percentile, and 95th percentile of the correlations. This yielded the following figure for monthly anomalies:
Unsurprisingly, temperature anomalies are strongly correlated over distances of less than 750 miles or so (coefficient > 0.5). Correlations remain above 0.8 for the first 300 miles. Interestingly enough, at distances over 1600 miles they are slightly negatively correlated, which might reflect peculiarities of weather patterns between one side of the U.S. or the other.
If we look at annual anomalies, the picture is similar, though the error bars are a tad wider. Mean correlations never drop below zero, and there is a notable uptick near the end (a feature present in the monthly data as well). I’m not sure why this uptick occurs, though my half-baked explanation would be that at those extreme distances you are probably comparing coastal stations to one another, and ocean currents might drive higher correlation.
If we directly compare mean correlations of annual and monthly anomalies over distance, we see that monthly anomalies correlate better over short distances but worse over long distances than annual anomalies. If we were to plot decadal anomalies, I’d suspect it would be even better over long distances (and slightly worse over short ones).
I want to apologize to folks for not commenting much of late. I just left my old job to join a new start-up, and the transition ended up taking quite a bit of time. Now that its done, however, I will be around more and will be doing a post discussing the difficulty of estimating absolute CONUS temperatures (and why anomalies are useful) some time later this week.
________________________________________________
UPDATE
As mentioned in the original article, this is a first pass analysis, and does not reflect things like weather patterns that will affect correlations, especially on monthly timeframes. I’m making the monthly correlations available here, along with the latitude, longitude, and impermeable surface area percent (in a 1 km cell) for each station pair so folks can play with the data (note that station_1 and station_2 are not USHCN station IDs, but rather the order (out of 1218) in which they appear if station_ids are sorted in ascending order): http://dl.dropbox.com/u/79856625/conus_station_correlations.csv.zip
Here are CONUS correlations within specific latitude bands:
Will update with a figure showing urban and rural station correlations shortly.





The negative correlation between temperature and distance >1,800 needs to be explained.
The line should surely end at 0+/- noise.
That is finishes at minus 0.25 is very odd.
Indeed, the rise at the end is odd. I’d suspect it has something to do with comparing coastal stations to other coastal stations, but I have no evidence that is the cause.
I’ve been doing shade plots of monthly (and time averaged) GHCN readings. They give a strong visual effect to the spatial correlations.
One thing that shows up is that CONUS shows much less apparent correlation than ROW.
Zeke,
It’s very difficult for me to comprehend what the significance of what you’ve done might be – but I’m not nearly as sharp as a lot of other commenters here.
I suspect that the effects you see may be influenced by the geometry as well as the geography of the US. It might be interesting to see what would happen if you ran your code on bands say 500 miles wide running from Canada to southern border, or horizontal bands say 300 miles wide extending from east to west coast. Or say sweep a 500 mile band across the US with running analyses – assuming this isn’t prohibitively difficult – I can’t tell.
I think the thing I’m trying to get at is that the possible pairings are constrained by the shape of the US and if you try different shapes, the possible pairing combinations will be affected and so may be the correlations.
If this seems nuts, it could be.
DocMartyn:
It’s rather common in turbulent flow.
Some discussion here (see figure 1).
Steven McIntyre encountered a similar form when he computed the autocorrelation function for Antarctica. See this figure.
I think Zeke is making a mistake by assuming the correlation function is isotropic. Commonly there’s a difference in correlation length in the direction of flow versus perpendicular to the direction of flow.
Carrick,
How would you suggest altering the analysis to separate them out? I suppose I could look at correlations within defined latitudinal bands, but the choice would be somewhat arbitrary.
I think this is a diagnostic of weather system size. If it were uk data I’d separate them according to knowledge of prevailing weather direction. Generally weather comes in from the south west so I’d expect to see better long distance correlation on that axis and better short distance correlation on the perpendicular axis. As far as the uptick goes figure 3 in the paper Carrick linked caught my eye.
It seems a good way to handle the US is to make a gridded map of the US that displays the annual 30 year temperature for each grid. Then, to find a correction/adjustment station, find stations nearest to the corrected station by following the isotherm.
Curious:
This is what I would try too. Basically break it down in to “with direction of flow” (±45 °) “against direction of flow” (±45°) and “perpendicular to direction of flow”. Zeroth order, “direction of flow” = along lines of equal latitude, West to East.
Zeke, I have found what you found in differences in station correlations for monthly and annual data and had supposed without really thinking about it very hard that it had to do with seasonal differences that will show more using individual months whereas with annual data those differences are averaged out.
I also think it might surprise some readers and posters here that trends of a reference station and near neighbor staions can vary dramatically while the correlations remain high.
“trends of a reference station and near neighbor staions can vary dramatically while the correlations remain high”
I’m sure that happens. But overall trends do correlate, rather similarly to monthly readings. There is a corresponding GHCN station trend map here. (This one isn’t WebGL and you have to click on the top right map to rotate).
As Carrick observed above, I did a post at CA a few years ago showing spatial autocorrelation of Antarctic station temperatures – see http://climateaudit.wordpress.com/2009/02/20/antarctic-spatial-autocorrelation-1/
There was other contemporary discussion at CA, Jeff Id and Lubos: see http://climateaudit.wordpress.com/2009/03/27/omatic-correlations/ ; http://noconsensus.wordpress.com/2009/03/27/auto-matic-correlation/; http://motls.blogspot.com/2009/03/spatial-correlations-in-antarctica.html
One of the important corollaries of spatial autocorrelation is that principal components applied to a matrix of data will yield Chladni patterns – a point discussed at CA for Antarctica here: http://climateaudit.wordpress.com/2009/02/25/steig-eigenvectors-and-chladni-patterns-2/
http://climateaudit.wordpress.com/2009/02/24/steig-eigenvectors-and-chladni-patterns/
and for other networks elsewhere.
Stieg’s supposedly “significant” eigenvectors were nothing more than the first three Chladni patterns. We reported this in our first draft of O’Donnell et al 2010.
Unfortunately, as a then “anonymous” reviewer of O’Donnell et al 2010, Steig required that we remove this important discussion– without disclosing his conflict of interest.
Or just add a third dimension for alignment. If I grok what Carrick is saying, “follows flow direction” could be approximated by the difference in longitude divided by the distance between the two stations (a rough approximation of the “cosine” between the two-station “vector” and the equator, disregarding curvature).
lots of possibilities pop to mind…
anisotropy (NS/EW), hole effects[coastal, mountain ranges], usings pair variances involving different times*, fewer pairs at greater distances…all in all quite a few things could be looked at…and maybe will be in time.
given that correlation function or semivariograms are used to model spatial variance the behavior at larger distance may will have little relevance to method [particularly given sample density in many areas]…leads into understanding search parameters used in any subsequent kriging, but that is downstream here.
lots of interesting things to consider. should be further parsed in time and space–just a big bag as is…
——-
* yes or no on different times aspect not stated but upwards of 700K unique pairs certainly suggest this was the case–if so seems like very iffy assumption. At least an assumption that should be validated somewhere….
Zeke,
somewhere around here I have a reference to the direction of flow,
hmmm found it in a paper done on ghcn daily by the guys at cru.
the correlation length is a function of latitude, season and direction of flow as I recall.. a bunch of geographical features can also play havoc with it
Nick Stokes (Comment #109495)
February 2nd, 2013 at 7:54 pm
Thanks, Nick. I took a look at your site, and it doesn’t seem to me to show what you say at all. You have places with a trend of 2° per century right next to places with almost no trend at all … here’s a screenshot
http://i863.photobucket.com/albums/ab195/weschenbach/stationtrends_zpsaf2c2ce1.jpg
I don’t see that as long distance correlation at all. I’m also surprised that all of Canada correlates so well, and all of the US correlates so poorly. That seems doubtful.
In any case, it doesn’t seem to support your contention. I showed here that correlation between nearby trends can be quite poor.
All the best to you,
w.
Willis,
I mentioned above that monthly readings in the US seem less correlated than in ROW. That is true for trends too. If you look at the other side of the globe, ypu’ll see much better correlation. A screencap is here; it’s the view you get by clicking on the Caspian Sea in the upper right map.
Some of the more spectacular exceptions are caused by station problems – I’m showing unadjusted data, though there is a adjusted option. In fact, it shows such situations up. I traced the story of Nitchequon, Ca, here.
A usefull “trick” is to use a transparent color here.
E.g. in R, specify col=”#00000010″
Willis:
That’s probably really true:
The correlation length increases as you go further north.
see this from BEST.
The correlation also gets worse when you are near a land-ocean boundary (within 500 miles).
One issue is low station density make the correlation look better (when it’s shown as an interpolated field) in regions have sparse coverage.
Unfortunately Nick’s program does not work on my Mac, so I can’t play with what he’s got.
I am still a little confused about all this. Each point in the plot has to represent the correlation between the temperature record of a ‘pair’ of stations. Each station will be compared to every other station.
Now what it appears to be saying, for each and every station, is that the nearby station share the same change in temperature.
As the distance increases there is less and less similarity between the anomaly profile pairs of stations falls until there is no relationship what so ever. Then at distance >1,800 the correlation become negative.
This relationship is the same for all stations; which I think shows there is a problem with the math.
As for Canada. this is the 15 year trend mapped in both 1×1 and 5×5 grid squares using the stations that EC calculates anomalies for in their monthly summaries.
http://sunshinehours.wordpress.com/2012/12/09/canada-grid-square-choices-5×5-warming-and-1×1-cooling/
West is cooling. North and East are warming.
Kenneth Fritsch (Comment #109494)
“I also think it might surprise some readers and posters here that trends of a reference station and near neighbor staions can vary dramatically while the correlations remain high.”
I should have noted that my most recent analysis used GHCN adjusted monthly station temperature data with the longest series available (90-100 years). I suspect over shorter time periods (series) the deviation of good correlation to different trends would be less obvious. I can link my analysis if anyone is interested, but it is a rather easy proposition to do your own.
What at first surprised me was that a series in general can have a very good correlation and yet have what I found to be very different trends. I would suppose that someone more capable than I in math could demonstrate how this occurs.
These occurrences, I think, touch on the situation of temperature reconstructions and doing validation on high frequency proxy responses or low frequency ones that are more attuned to trends. The withdrawn Gergis et al. paper on the Australasian temperature reconstruction comes to mind.
Doc
Carrick already answered, but in turbulent flows, something called “conservation of mass” requires that the spatial correlation goes negative for velocity in some circumstances.
I continue to be perplexed by the possibility that this view may be entirely dominated by unrecognized (by Zeke’s post) signal – not the same as noise. Distance seems meaningless to me unless it is cleansed of other distractions in the station pairs, such as urban-rural, or inland-seaside (say within 100 miles), or altitude(which might not matter).
I apologize for this very ignorant observation, but I suspect that this post is a prime example of an analysis whose result is more driven by unrecognized, or unidentified, or undiscovered, or ignored signal.
Some signals are characterized as noise in some of the climatology which I think I understand – admittedly not much. Assuming that I do understand what I’m reading above, if it didn’t get past stupid me, there isn’t anything there – yet. I am not accusing Zeke of being disingenuous, just not having thought about it enough.
Carrick,
“Unfortunately Nick’s program does not work on my Mac”
I’ll look into that. Does the plot fail to show, or is it a problem with the controls?
Strange, all those long distance dots and I don’t see any teleconnecting.
Nick:
I can’t get the plot to show. Your other tools work though, except for zooming.
I was thinking putting in more command keys might be an interesting alternative to mouse-clicks that are system dependent.
E.g., a = pan left, d = pan right, w = pan up, x = pan down, + = zoom in, – = zoom out, etc.
” continue to be perplexed by the possibility that this view may be entirely dominated by unrecognized (by Zeke’s post) signal – not the same as noise. Distance seems meaningless to me unless it is cleansed of other distractions in the station pairs, such as urban-rural, or inland-seaside (say within 100 miles), or altitude(which might not matter). ”
altitude splits will not cause any significant change, basically its a linear offset. coastal.. versus inland.. changes amplitude, so you might gain a little from that. urban/rural minor difference in amplitudes.. biggest issues are going to be.. latitude, season, direction of flow, and then geographic discontinuity. like..
one site in a rain shadow of mountain range and the other site on the other side of the ridge.
Think of this way. if urban and rural were correlated differently, finding UHI would be simple. Its not.
lucia:
My guess for a sufficiency condition is narrow band turbulence.
I’ve seen this formula used:
$latex \rho(r) = e^{-\alpha r} \cos k r$.
Not ever seen it derived, though it seems plausible enough.
Doc
It doesn’t suggest any problem with math. The exact same thing happens in turbulent flows. Not only that conservation of mass requires it for certaint properties under certain circumstances.
Thanks, Carrick,
Yes, I’ll do that. But first I’ll try to find why it isn’t showing.
Mosh:
Thanks for the help. I still can’t see the value in Zeke’s analysis given all the other influences. It looks like looking at distances alone doesn’t really shake anything out.
I used Environment Canada data for BC from 2001-2012 to do the same exercise as Zeke.
http://sunshinehours.files.wordpress.com/2013/02/bc-correlations-of-anomalies-over-distance-2001-2012.png
Two of the worst pairs in terms of distance/correlation were:
3 km apart correlation .716:
BURNABY SIMON FRASER U + PORT MOODY GLENAYRE
11 km apart correlation .691:
N VANC GROUSE MTN RESORT + N VAN SEYMOUR HATCHERY
In each case, one site is at the top of a mountain and the other was at the bottom nearby.
Whats funny is that 15 km apart top of mountain pair had a correlation of .683.
BURNABY SIMON FRASER U + N VANC GROUSE MTN RESORT
I also had some negative correlations when there was very little data.
and …
FERNIE WARDNER KTNY HATCHERY 112
28 km apart. Cor: 0.746
300m difference in elevation.
Your method description isn’t particularly clear on how you established common baselines for stations with short histories–i.e., not exhibit continuous operation over “1971-2000” without change in siting or instrumentation. Although anomalies are useful. It isn’t clear that the complications introduced by them are superior to the benefits compared to other methods such as attempting to normalize by altitude (e.g., estimating the lapse-rate) and siting (model of microsite biases).
I agree with Carrick’s concern that the “mean” correlation and intervals reflect an isotopic assumption. There is nothing directly wrong with plotting the correlation of all station pairs, but once you start computing the mean and c.i. you are implicitly claiming all those pairs belong to the same population. That assumption is decidedly unphysical.
Carrick is concerned about “weather system size”. Mosher mentions geographic basins. There is also some discussion of water edge effects in your original post. McIntyre seems right on point drawing you attention to the Chladni patterns. The ocean is fairly isothermal in a given location and forms a boundary condition in a given location giving rise the Chladni patterns in a PCA as Steve mentions–basically the harmonic decomposition. You also need to consider water coverage between stations. Roy Spencer had a nice angle on that one back in 2010.
Lucia, I know what your words mean, but I think that the description is nonsense.
We have two stations 10 miles apart (A&B) and 1,800 miles away from another pair (X&Y).
Now over a course of DECADES the temperature signal of A tracks B beautifully and over the same time period X tracks Y. However, X inversely tracks A.
What this means is that the anomaly value of Any Station is generated from nearby stations, so the further out you go the less like Any Station you are.
This would seem to be a finger print of massive cheating. Some stations would be expected to have a similar anomaly just by chance, and these should be randomly distributed, and so we should so some points at Correlation = 0.9 and 1-2000 miles.
Stations have been forced to look like nearby stations, so the correlation is higher.
Doc
“What this means is that the anomaly value of Any Station is generated from nearby stations, so the further out you go the less like Any Station you are. This would seem to be a finger print of massive cheating.”
No, as Zeke said, for this analysis taking anomalies is redundant. It just adds a constant which won’t affect the correlation coefficient.
I live in the mountains in Utah and I know that most of the time my home temperature anomolies correlate strongly with the airport in the valley.
However there is a random event in the winter called an inversion where the valleys get trapped under a layer of clouds and the mountains have nice bright sunshiny days and there can be as much as a 60 degree difference in absolute temperatures. The Piute indians called it the white death.
I have been somewhat tracking the frequency for 14 years now and as far as I can tell the effect is random. Some years there are just a few days of ineversions and other years it lasts months. I have done some research and for at least the last hundred years or so the occurance and effect is random (other than it occurs primarily in the winter).
I have two stations whose anamolies track well for 9 months or so, but then track very differently for 3 months out of the year and there is only a couple of miles of seperation. I am guessing about 6 miles and 1,500 vertical feet.
As a sailplane pilot I have also noticed a somewhat similar effect over larger population centers that seem to have a random dome of stagnant air over them, PHX, LA, SLC, especially coastal cities whose air gets trapped against the mountains. This dome of air is always either significantly colder or hotter than the air outside of it. I often use that little tidbit of knowledge to gain a few extra feet of altitude.
I don’t believe the dome of air is the UHI effect because as often as not the air inside the dome is colder than the air outside. Often though the dome of air temps are predictable, Phoenix in the summer plus 15, SLC in the winter minus thirty.
Do these areas anomalies track well with the surrounding areas that aren’t affected? I really doubt it unless they are tracked for a long enough time for random weather to cancel out and I don’t think the cities have been big enough and around long enough for that to have happened.
DocMartyn
Pretty much– yes. That’s what the negative correlations at long distances mean.
It’s not only not cheating, this sort of spacial correlation happens all the time in turbulence.
More like the finger print of plain silly.
Real data. Negative correlation happens.
Priestley’s curve (thin line) is of the form
$latex \exp(-\alpha r) \cos k r$.
Carrick (Comment #109543)
Thanks for letting me know about the problem on the Mac. I think it was just a problem Safari was finding with the sylvester matrix library. I think it’s OK now. I’ll fix similar cases where I used that library.
Bruce,
I tried to point to point it out to you at your blog. You’re not using homogenized data are you? Vincent et al 2011 is the discussion of the homogenized (might be Vincent et al 2012 actually hmm) EnvCanada dataset. Also correlations between stations increase with latitude to nearly 1200 km near the poles.
Nick:
Thanks! That fixed the problem.
OK Lucia, as the only dumb kid in the class, can you explain to me what this ‘turbulence’ actually is?
What is the medium, how is it being moved and what constrains the flow?
Methinks that you are stating, ‘have seen that line-shape’, have explanation in previous system, use explanation for this system.
DocMartyn
As the truly dumbest kid in the class, what I think Zeke, Lucia et al are trying to say without actually saying it is that correlations exist within systems (which it obviously does). The $64,000 question then is whether or not the earths atmosphere is a single system or a group of systems that can be correlated.
It appears to me that the different systems like the Northern and Southern Hemisphere do not correlate very well because of the boundary condition in the equatorial area and as demonstrated by the increasing Ice levels in the Antarctic and the decreasing ice levels in the Arctic.
Or the ENSO, PDO effects as well as surface effects like Continents, water and ice. Zekes plot clearly shows that correlation declines with distance, and if the distance is great enough the correlation goes negative, just like the differences in the ice at the poles, hence conservation of mass (energy).
To me, what this all means is that the total energy budget for the earth is relatively constant and the negative correlation should match the positive correlations on a global scale.
Genghis:
This does make sense to me, although I thought we were talking about anomalies not temperatures so it is the indicators of change that would correlate globally, not indicators of total energy (if that expression actually means anything).
You can’t be the dumbest commenter. I have it locked up.
R, I am using the monthly summaries produced by Environment Canada. And I am only using the stations that EC calculates anomalies for (see the D column).
http://climate.weatheroffice.gc.ca/prods_servs/cdn_climate_summary_report_e.html?intMonth=1&intYear=2013&prov=BC&txtFormat=text&btnSubmit=Submit
I’m more interested in very low correlations when the stations are only a few kilometers apart than high correlations).
Elevation differences can cause low correlation.
Doc
Read this book:
http://www.amazon.com/First-Course-Turbulence-Henk-Tennekes/dp/0262200198
Title “A First Course in Turbulence [Hardcover]” by
Henk Tennekes (Author), John L. Lumley (Author)
What this video
And this one which first shows die injected into a laminar flow (which would be very low velocity if that’s water). You’ll see the stream of ink is very narrow all the way along the flow. Then they then turn up the flow and the flow become turbulent. You’ll the stream of die suddenly diffuses to the edges of the pipe.
Put very simply, turbulence is ’caused’ by the non-linear (convective) terms in the Navier Stokes equations, but it can be avoided if convection is sufficiently low to keep something called the “Reynolds number” low. Air flow in the atmosphere is pretty much… well.. turbulent.
One of my favorite demonstrations of turbulence is Rayleigh-Benard convection, because it self-organizes.
youtube simulation.
You actually see this in real world measurements as a line spectrum (for example) in the measured pressure over time of the fluid (when people see it in their measurements, generally they don’t go “yeah! forced convection!” though).
Carrick–
That is a particularly good example because you can actually see there is a characteristic length that the correlation will be positive for points that are close to each other, drop to zero.. go negative and so on. That said— to make the point we need the a very wide plate and we need to let the quasi-steady run a long time. The video highlights the transient and the aspect ratio is to close to a square. (Looks like width is twice height? )
Turbulence is a feature of the atmosphere. I think it has to do with the characteristic length being very large, making the Reynolds number also very large even at relatively low air flow velocities. It’s the reason why stable trace gases with molecular weights significantly different from nitrogen and oxygen like CO2 and helium are well mixed up to about 100km altitude. It’s also the reason why it only takes a few strokes with a teaspoon to uniformly mix cream into your cup of coffee. Mixing by molecular diffusion is many orders of magnitude slower.
R
alls
“EnvCanada dataset. Also correlations between stations increase with latitude to nearly 1200 km near the poles”
except in arctic regions during some seasons it falls to below 500km
as I recall. Don’t have the cite handy..
“It’s not only not cheating, this sort of spacial correlation happens all the time in turbulence.”
Since the references from Carrick seemed to show that the negative correlations appear at approx. 1/2 a wavelength of something apart, does this mean that the 1800 miles -ve correlations are somethnig to do with some climate-related ‘thing’ with a wavelength of around 3600 miles?
‘Cos that is not unlike the size of the oscilations in the jet streams.
lucia:
Yes that’s a good point. If you looked at fluid velocity (or pressure) you would see an oscillatory behavior, and if you picked the distance just right, the oscillations would have opposite signs (that is they would anticorrelate).
It’s hard to see from this figure, but if you took relatively close points, you’d see them tracking more closely than as you separated the points, hence the drop in the magnitude of the correlation with distance. It’s easy to see how this sort of oscillation would give rise to a correlation function like $latex \exp(-\alpha r) \cos(k r)$.
steveta_uk:
Good catch. And of course those oscillations are controlled by “Rossby waves” so yet more periodic structure in an otherwise apparently chaotic atmosphere.
There’s something inherently satisfying in contemplating the notion of $10^{44}$ molecules energetically and chaotically colliding with each other on a spinning gravitating sphere self organizing into cool behavior like this. Makes me wonder what the creationists would think about that….
By the way, if you think about this, you’ll maybe see part of why I’m viewing as dubious the assumption of isotropic correlation functions (that is correlation functions that don’t depend on the orientation of the two stations, nor their geographic locations, other than their separation).
steveta_uk
Yep.
Carrick
Yep. In this context only way, to only utility about thinking of isotropic turbulence is to point out that negative correlations are sometimes *required* for certain things in turbulent flow.
In the Bernard cell convection, if the autocorrelation stayed positive, that would mean if you were standing at one point in the just a little above the nopenetration lower surface plain and the velocity fluctuation happened to be “up”, then you would expect that velocity was positive definite at all points on the plane. But you know that the net flow through the plane is zero. So, you know that in reality, the correlation must be negative somewhere.
So: in a certain number of circumstances, if the spacial correlation is anything other than “white noise”, you must get negative correlations at some point. (This doesn’t mean you always must get them for all things. But they are clearly not impossible even if they surprise a physician when he first seems them.)
While the bit about tubulent flow and all is interesting, I think a simpler explanation for the weak to very weak negative correlations of many data pairs at distances greater than 1800 miles is that the temperature fields in the US are strongly anisotropic. I think this is best illustrated by the temperature zone maps ( eg., zone map . It also eplains why the average correlation coefficients slightly increases (though continue to show essentially no correlation) with even greater distance (hint: it’s the coasts vs. the heartland).
Where I’ll quibble with the above analysis is in calling r values of greater than 0.5 as indicative of strong (rather than moderate) correlation. For me, and what I was taught all those years ago, r values need to be at least 0.7 to be considered indicative of strong correlation. With r values of 0.5, it is pretty hard to use to the anomaly from one station to predict with any degree of accuracy the anomaly at its pair. Another thing that needs to be pointed out is that, any given distance, there is a pretty big range r values between station pairs, so that it is difficult to simply use distance between stations as a means of predicting how well two stations will correlate.
Are we connecting turbulent flow (jet stream) to negative correlations between station temperature series and if so why do we see it for monthly and not annual temperatures?
Kenneth–
I don’t know for sure it’s the jet stream– but it’s not implausible. I’m just pointing out that the negative correlations are common in turbulent flows. Weather is a manifestation of turbulence, so we shouldn’t be surprised.
Note sure. In turbulent flows, small scale features are also shorter lived than long scale features. So, it’s not implausible that temporal averaging smears out fine scale correlation to some extent. If it was the jet stream, that moves around over the course of a year. So, when you average over a year, that’s not going to have much effect– and you are left with correlations imposed by only the larger scale features (like pdo, enso etc.)
Note that I’ve updated the original post with a link to the data file so folks can do their own analysis. I’ve also added in a graph showing correlations by latitude band for the U.S.
.
j ferguson (and others):
I intended this just to be a simple initial analysis. I agree that there are a lot more things to look at.
.
On the subject of correlations of temperatures not indicating correlation of trends, I agree, though I would note the longer the time frame (e.g. annual, decadal) the more similar the two should be.
bruce
“Elevation differences can cause low correlation.
very hard to prove, especially given your repeated mistakes with EC data
BobN:
I think this depends on whether you are using a single station to predict the temperature at another location, or a station of networks. People are doing the later here.
Zeke,
Don’t do anything not worthwhile on my account. Mosher has reminded me that there is unlikely to be any value in a rural/urban run, nor altitudes either.
I continue to suspect that it isn’t distance alone that signifies, but where increasing the distance forces the pairs to be located. Obviously the closer the pairs, the more evenly they can be distributed as pairs across the country, but as the distance between them increases fewer will fit just anywhere.
a series of runs which might be informative would be one where the orientation of the pairs in that run were controlled, say pairs whose orientation was between 350 and 010 north, 260 to 280, or ??. Obviously this would reduce the number of stations due to some not finding a pair with that orientation constraint.
Zeke, I like the idea of your throwing out there some sidelights from your main analyses. I happen to find these observations of interest. I am not good at deciphering what some others here might be saying about their interests in these observations and whether what they say is from some past food fights or is serious.
On the same note I am reminded that I need to go back and look more closely at some sidelight observations I had in doing breakpoint (BP) calculations on USHCN data from a while back. I found some reoccurring BPs in these adjusted and unadjusted temperature series on a monthly and annual basis where the most frequently reoccurring were also obvious when looking at the USHCN average series over the entire US. These BPs were not blips of a higher frequency break since I was using minimum BP segments on the order of a decade or so and as such I would classify the breaks as regime changes in the climate. The fact that the BPs showed up in the adjusted series also implies that these BPs are from “natural” causes are not related to the non climate changes.
What I found of interest was that when the BPs were based on monthly data one could find the same BP dates for several stations that were all located within a narrow boundary of latitude and longitude. As an example, I use BPs around the year 1957 where the monthly data showed BPs centered on starting dates for all months of 1957. These month specific dates were associated with stations in a relatively narrow range of latitudes and longitudes. I have to check this more closely but as I recall these BP dates progressed station-wise from east to west across the US.
I also want to redo some maps of the US showing the trends for stations. I need to assure myself that the time periods for the stations are reasonably the same.
Kenneth Fritsch (Comment #109665)
February 5th, 2013 at 11:01 am
What if many stations changed the time of observation en masse (at or near the same time) in response to a directive of some type?
LL, I used adjusted USHCN data and that data is, independent of the homogenization algorithm, adjusted for TOB using documented meta data.
Also to have these changes made in waves across the US from month to month would be indicative of a methodical change that, I think, would be well documented.
Below is a link to a map of the US, showing in shades of blue to red, the linear trends from 1920-2005 for the GHCN mean adjusted monthly station temperatures. While the regional warmer and cooler areas are readily visualized, the intermixing of different trend values in close proximity can also be seen. The trends are in degrees Centigrade per century. All stations used spanned the time period of 1920-2005.
I may repeat this map using a different time period.
http://imageshack.us/a/img152/6410/trends19202005ghcnmonth.png
This is what I was trying to say (too short on time then). Here is a map of mean US temperatures. If you want to find station to correct another, choose a station within the same color as the station to be corrected. (At least, that’s close enough to the idea I was trying to get across.)
http://serc.carleton.edu/images/eslabs/drought/mean_annual_temp.jpg
Or, you could average 30 Julys and use that for July temp corrections, etc.
Kenneth Fritsch,
On your 1920 to 2005 chart there is something kind of interesting 🙂
“Since its introduction, the program has developed 270,000 management plans that consist of more than 31,000,000 acres (130,000 km²) of private land.” That is a fair amount of reforested land.
Not that it would cause winds or anything, but it looks like it could have some impact on surface temperatures.
The program started in 1941 and most of the acreage is in the rural southeastern US.
The mean annual temperature map also shows why there is a more positive correlation ( per the main post) at the most extreme distances.
FWIW there is a fair indication that the season makes a big difference The correlation for temperature (is highest in winter and lowest in summer. The dashes are the correlation for diurnal temperature range and the solid line the correlation for precipitation. In winter the temperature correlation is well over 1500 km, and in summer it drops, depending on latitude to between 1200 and 800 km, so perhaps the GISSTEMP 1200 is not such a good limit in the summer at low latitudes.
However, this correlation IS encouraging for polar regions, where we see that the 1200 km range is a good solid estimate all year long between latitudes 60 and 90 N. One may assume that the same could, maybe even should, hold true for the southern polar regions.
Seems to have fallen into the spam bucket, but FWIW there is a very strong change in the correlations with season. In winter the temperature correlation is well over 1500 km, and in summer it drops, depending on latitude to between 1200 and 800 km.
However, this correlation IS encouraging for polar regions, where we see that the 1200 km range is a good solid estimate all year long between latitudes 60 and 90 N.
Hm, interesting Eli.
Like your reference.
Of course the downside is the spacing between stations gets pretty far apart as you approach the poles. An analysis that gets the uncertainty right as you approach the poles would be interesting in that respect.
Nick: (Comment #109570)
This is simply not true.
Anomalization involves 12 constants applied to different portions of a temperature series. Choosing different anomaly time periods will in virtually all cases change the resulting correlation coefficient to some extent.
Eli (#109702) –
As far as I can tell, those correlations are from one land station to another land station. Any studies on correlations between land and sea temperatures (either SST or marine SAT)?
RomanM (Comment #109716)
“This is simply not true.”
It is true. Zeke takes the correlation of two time series of numbers over a period. In doing so he subtracts the mean of each over that period.
In taking anomalies, whether by month or year, he makes exactly the same change to each of those time series terms, year by year. And it makes exactly no difference once the mean has been removed.
Romanm,
Oh well, it’s very late here, and perhaps I should rethink that. It’s true that when you subtract monthly averages to form anomalies, but then subtract an overall mean, there is in effect a mean-free seasonal oscillation which could make an effective variation to the correlation through it’s dependence on base period. So yes, I guess it would depend.
Nick: (Comment #109720)
Or, you might try an example or two. 😉
There usually is a different pattern to the monthly means used to calculate the anomalies depending on the rates of seasonal warming and/or cooling taking place during the specific anomalization period.
The link below shows the clustering by month of the break dates for the breaks occurring in the year 1957. The breakpoints were calculated for the GHCN US stations for the adjusted monthly mean temperatures. The map also shows that there is a clustering of all the stations in the US with breakpoints in the year 1957.
This phenomena also occurs with other breakpoints/dates in the US which have a higher number of occurrences, i.e. they tend to cluster into a region of the US.
http://imageshack.us/a/img16/8957/breakpoints1957stationg.png
Re: Kenneth Fritsch (Comment #109723)
February 6th, 2013 at 10:41 am
The breakpoint series seem to end at the borders of Georgia.
In the US map linked below I show the station locations of linear trends for the minimum and maximum monthly adjusted US GHCN temperatures in the same manner as I did for the mean temperatures. I think there is more “mixing” of different trends into nearby stations than I saw in the mean temperatures but I’ll let you all be the judge of that. The period covered is the same as for the mean and was 1920-2005.
http://imageshack.us/a/img401/6410/trends19202005ghcnmonth.png
Layman Lurker (Comment #109725
I had not noticed but you are right that Georgia looks isolated. I’ll take a look and make sure that Georgia had stations that covered the period 1920-2005.
Harold, Carrick
As said at the link, Eli remembers seeing a couple of other papers on the issue, but lost them in the far past. AFAEK they were all land stations. It would be difficult to do with sea surface temperatures taken from ships, because there are no fixed positions. Possibly you could look at buoy records or stations on small islands.
Kenneth,
Interesting results regarding breakpoints in the homogenized data. What method are you using to detect those breakpoints? I might want to see if I can replicate it and test some things.
.
RomanM,
Fair enough, I was incorrect in asserting that anomalies vs absolutes would not affect the results. I suspect that rerunning the analysis using absolutes wouldn’t change the overall results much, but I’ll give it a shot when I have the time. Unfortunately, it takes about 4-5 hours on my machine to calculate all of the correlations between pairs, as my current method isn’t particularly efficient.
Zeke,
It makes a difference, but I think what you did is right. You’re always going to be interested in the correlation of anomalies. Building in the correlation of seasonal averages doesn’t really help you. Deseasonalization is good!
It might be worth checking whether the choice of anomaly base period makes a difference. I can’t imagine it would be much.
Layman Lurker (Comment #109725)
After Layman Lurker pointed out the missing break points in Georgia, I checked my old files and found that the Georgia data at the bottom of an Excel file had been somehow truncated. I could say that is just the breaks, but actually it shows poor practice on my part of archiving R generated data in Excel. The corrected map is linked below. Thanks, Layman Lurker, and I owe you one.
http://imageshack.us/a/img843/8957/breakpoints1957stationg.png
Zeke (Comment #109737)
Zeke, I used the breakpoints function in R in the library (strucchange). The minimum segment length (designated by h in the function) can have an effect on the exact location and number of breakpoints found in a given series. For this analysis I used the default value for h.
The map of station locations with break dates occurring in the same year(s) shows the clustering of those station locations in the US – as I noted above.
http://imageshack.us/a/img831/6226/breakdatesclusterbyyear.png
If I am ever in Chicago I’ll look you up and I’ll come and collect on that debt. 🙂
Kenneth Fritsch,
If I’m reading the documentation for strucchange correctly, it looks for breakpoints within individual series rather than in difference series between individual stations and surrounding stations. If this is true, than couldn’t your regionally-clustered breakpoints (e.g. in the South in ’57) represent actual climatic signals (multidecadal variability) and not local inhomogenities? In general, I’d expect breakpoints that are strongly spatially correlated to be real climatic signals, unless there is some large network change that occurred at multiple stations at the same time (something, I presume, that would show up in the station metadata and be adjusted for).
Zeke (Comment #109761)
Zeke, I had this discussion with Victor Venema. The breakpoints function in R finds breakpoints in linearly regressed segments. You can use it on a given series or you use it on the differenced series from 2 given series. You do the differencing (or not) and the function does breakpoints.
In this case I did breakpoints on Adjusted monthly GHCN temperature series and without differencing because I was looking for climate related changes. My point in showing the map is that evidently these climate changes occur regionally in time and not over the entire US, and, in fact, when broken down to break dates by month appear to change on a sub regional scale. I intentionally used the default minimum segment in the breakpoints function so that these breakpoints are more indicative of changes that separate rather lengthy periods of time (decades). That is why I termed the breaks as defining regime changes in climate.
What we have here are station series correlations over distances that could perhaps indicate uniformly changing climate, and then, on the other hand, differing trends and breakpoint dates over relatively short distances indicating a less uniformly changing climate.
Kenneth,
Fair enough. As far as I know, the PHA explicitly tries to avoid removing any non-local signals, as the assumption is that any undocumented factor affecting multiple stations in a region at the same time is climactic rather than an inhomogenity due to station moves, instrument changes, etc.