Today’s “How to teach yourself R lesson” involves calculating annual average temperatures from a string of daily temperatures.
The average temperatures, which I must observe, look fairly trendless are plotted to the right. That said, there may be a trend buried in there. If so, it’s sufficiently small to require teasing out using statistics.
Now, for those interested in learning R, I will describe what I did in some detail. I will then make a few comment in the final summary.
Backgraound: Recall: Having obtained the large file containing daily high temperatures recorded at VaexJoe, Sweden, I wanted to figure out how to calculate the annual averages daily high using functions available through the R project.
Main steps to calculating the average temperatures
It turned out these averages were easy to calculate using the ‘zoo’ package.
The steps I performed to calculate the annual average were:
- Load the zoo library.
- Create a ‘zoo’ class variable with temperatures indexed by date.
- Use ‘aggregate’ to calculate the means as a function of some condition. In my case, the ‘condition’ is the year in which data are collected.
Naturally, after calculating these things, I graphed a chart, examined a histogram and looked at some simple statistics.
R project Session Details
I typed the commands shown in blue. When ‘R’ has processed my command, it either spits back the ‘>’ prompt or says something and then spits out the ‘>’ prompt. The # indicate my comments which I typed into ‘R’.
>#read in my file. (For details see Read Data into R.)
>vaexJoe<- read.table("file:///Users/lucia/Desktop/global/R_Data_Sets/VaexJoe.txt",sep="\t",header=TRUE);
># view names of files
>names(vaexJoe)
[1] “SOUID” “DATE” “TX” “Q_TX” “Temp_9” “DAY”
[7] “Temp_19” “EarlyHalf” “LateHalf” “Record” “Day” “Year”
># recall that “Temp_19” contains the cleaned up daily temperature datas with “NA” (not available) listed where ever the raw file indicated an error.
> # load the zoo library so I can use it. (Details: Attach an r-project package.)
> library(zoo)
># after doing this, R spits out stuff starting with Attaching package: ‘zoo’
># create a ‘zoo’ object with temperatures in column ‘Temp_19’ and dates starting with January 1, 1918, which is the first date in the data file.
>z <- zooreg(Temp_19, start = as.Date("1918-01-01"))
># Ask R to provide a summary because I have no confidence in commands I have never used before.
> summary(z)
| Index |
z |
| Min. :1918-01-01 |
Min. :-211.0 |
| 1st Qu.:1939-01-01 |
1st Qu.: 32.0 |
| Median :1960-01-01 |
Median : 102.0 |
| Mean :1960-01-01 |
Mean : 104.5 |
| 3rd Qu.:1980-12-31 |
3rd Qu.: 177.0 |
| Max. :2001-12-31 |
Max. : 344.0 |
| |
NA’s : 57.0 |
># I notice the first and last dates match. The temperarutes ranges are correct etc. So, the zoo object seems ok.
># notice there are 57 “NA” values in the full ‘z’ column. Those correspond to the bad data. Previous inspection indicated most occurred druing the 20 and one occurred during WWII.
># Calculate means using the “aggregate” function
>aggregate(z, as.numeric(format(time(z), “%Y”)), mean)
># R prints out the mean temperatures for the years fro 1918 to 2001. I notice “NA” appears in every year with missing data. At some point, I might care about that, but not today.
># Today, I decide to stuff the aggregate data into a variable called “means” which will be calculated for each year. (Maybe this is a frame or a class? I don’t get R project lingo yet.)
>means<-aggregate(z, as.numeric(format(time(z), "%Y")), mean)
># Eyeball to plot of Average temperature vs. Time.
>plot(means)
So, what did I learn?
Recall, currently, I’m trying to learn how to use R. I succeeded in
- calculating an average yearly temperatures from a string of daily temperatures.
- learning how R handles missing data. When even one data point is missing from a years worth of data, the aggregate function returns “NA” for the full year’s worth of data.
- plotting and eyeballing the averaged data. I see that there are a fair number of “NA”s during the 20s and 30’s and one during the 40s. I also see that the variability is rather large compared to any trend with time.
What about Global Climate Change?
I know the two or three readers that will visit are going to ask “What about ‘the ultimate’ question? Has Vaexjoe warmed due to Anthropogenic Global Warming? ”
Based on what I’ve done so far, the technically correct answer is “Beats me!”
In fact, we haven’t even learned much about the testable question Tamino posed which was: Was the average Vaxjoe temperature between 1985 and 2001 warmer than the average temperature between 1977 and 1984 and is that difference is statistically significant to the 95% confidence level.
For now, we can eyeball the data and decide whether or not the answer seems obvious by simply viewing data.
I’d suggest the answer is no, the answer isn’t obvious. Any trend is small; the scatter is large.
Still, as I once told my students: when the answer is so obvious you can tell by looking at a graph, you usually don’t need to do any statistical tests to convince anyone the trend exists. (Or… you don’t need to do the statistical tests unless they are mandated by law or corporate policy. )
But in this data set, trend is small compared to the scatter. So, we feel confident a trend exists without doing some statistical tests and doing these tests– and doing them correctly.