Category Archives: Learning R

Less Sucky Colors In R.

Some officially color-challenged readers and even normal color sighted readers have complained about colors in my charts generated by R. People suggested some good colors, but what I wanted to find was a way to use R’s built-in color palettes to make non-sucky colors schemes for plotting. (Some people suggested hand picking colors, but that’s not useful for quick and dirty blog posts. I want to find something fairly automatic that results in a color/dash system that creates graphs that are readable for most blog visitors.)

Today, I spent some time trying to find ‘simple’ solutions to auto-generate colors.

Here’s a plot generated using the default color scheme using rainbow:

colors=rainbow( numTrace )

As you can see I alternated types of dashes in addition to using color. I used line width 2 instead of the default 1 which at least makes the lines thicker. But can I see the color for “Dora” against the white background? No way in heck! I can’t even see that color in the legend much less hope to find it in the graph. (It’s hidden behind other data. But still….)

To my eyes visibility is improved by setting the color values to 0.7, leaving saturation at 1.

colors=rainbow( numTrace ,s = 1, v = 0.7 )

I know there are other solutions like picking colors individually, which works for final graphs. But it’s a bit fiddly for quick blog posts. That said, just setting “v=0.7” doesn’t solve the problem for me. I still don’t like the fact that ‘Hoyt’ and ‘Krivova’ look very similar to me. The fact that cycling through three dash types resulted in two similar colors being solid dashed is unfortunate in this graph.)

I tried something using the ‘colorspace’ library. It’s a little complicated, but it seems to autogenerate colors that I think are a bit more visible to my eyes. To avoid trying to distinguish red from violet, I create 1 more color than required by my data series and drop the final one. I still use the dashes. I also use the function to detect the luminosity, and if its too light, I replace the color with a darkened value. Here’s how it looks.

Anyone who would like to comment on visibility or improvements, let me know. I can only judge by my eyes. With these sorts of things, the more eyes, the better.

Update: Possibly better?

If you want to see the overly complicated code, take a look at:ReadLeif_TSI_fun. The code contains “fiddle factors” and lines of code that help me remember what I need to look up in R. I can clean up after people suggest changes in color schemes.

R question: Inverting Spectrum?

We need a new open thread. So, I figured I’d just post my R question of the day. I’ve persuaded myself that the “noise” in models and earth weather is stationary. Based on that, I’m now treating things as ergodic and averaging over periods. But that leads to a question:

Using “spectrum”, I easily create spectra for “p” batches of “n” month long time series of detrended temperature data. I can then average these spectra over the “p” batches. I can then plot, resulting in crummy looking graphs like these:

(The line is just an eyeball line to see if the decay rate is more or less linear.)

(The curved line happens to be the shape we’d expect if noise was AR1 with phi=0.85, but has no particular significance otherwise.)

I could, of course, proceed to do all sorts of things in the frequency domain. But my question is:

Is there a simple already existing built in function to transform the spectrum back into the time domain? (One that happens to match the convention used by spectrum with respect to 2’s and pi’s would be nice.)

Now: Open thread. Feel free to comment on the graphs, ask questions, suggest R code, advice on using fft to perform the inverse transform or whatever. All is welcome. But ideally, I am hoping for discussion on which pre-existing R models are particularly useful for transforming, windowing etc.

Graph with Measurement Uncertainty.

Ron had asked me to consider adding measurement uncertainty to the observed values in the graph earlier this week. It occurred to me that I could estimate the uncertainty due to measurement as follows:

  1. Peruse papers to discover the portion of the uncertainty in the annual average temperature measurement due to lack of station coverage. I found values for GISTemp in Hansen et al. 2010 and for NOAA in Brohan et. al. These were 0.05C for the (2s) value and 0.03 for the 1s value respectively. For Hadley, I read off the ±uncertainty from the annual time series.
  2. I transformed these into standard errors (i.e. 1s values), and then, assuming this error is white, I obtained monthly standard errors by multiplying the error for the annual value by 121/2. I then generated 120 months of noise with the monthly standard error, added the appropriate uncertainty to its corresponding to the observations , and computed trends for each error-padded observation. I repeated 1000 times then took the standard deviation of the of the error-padded trends for each observation. These were $latex \displaystyle s_o $ =( 0.028C/dec, 0.034C/dec, 0.033 C/dec) for GISS, HadCrut and NOAA respectively. For all future calculations, I took the largest of these to use as the uncertainty in measurements: that is I used $latex \displaystyle s_o = 0.034C/dec $
  3. I then estimate of the standard error for the reported value of an observation about the model mean to include the following three uncertainties: a) the uncertainty we would expect for “model weather” (i.e. $latex \displaystyle \sigma_m $), b) the uncertainty in our estimate of the model mean based on the average of ‘n’ runs $latex \displaystyle \frac { \sigma} {n} $, and c) the measurement uncertainty $latex \displaystyle s_o $. This resulted in a total standard error of:

    (1)$latex \displaystyle \sigma_T^2 =\sigma_m^2 \frac{1+n} {n} + s_o^2 $

    So, to be clear, this standard error is supposed to encompass the region in which a trend computed from a single realization of earth data measured with the amount of error claimed by HadCrut- would fall if a model produces the correct trend and variance for weather. (Some caveats apply. for example: we really don’t know (i.e. $latex \displaystyle \sigma_m $) with 100% certainty. Accounting for the uncertainty would expand the error bars. I just don’t know how much.)

As before, I treated the standard error for model weather as known, and multiplied by the critical “z” value for the ±95% confidence intervals. I superimposed these on the previous values computed without including the uncertainty due to measurement error, $latex \displaystyle s_o $. The graph with a start date of 2001 and the new wider uncertainty intervals added on top of the old narrower ones is shown below:

If you squint, you can see there really, really are two sets of horizontal hashes on the error bars.

Now for a bleg: I don’t like showing both the narrower and wider error bars in the same color. Tomorrow, I’m flipping through a book with examples of plots with error bars. Any tips on making plots with error bars would be welcome. I would particularly like verbose examples.

Model Z Test: Since 2001

As I have mentioned in the past, I have been screaming at the R programming language recently. But more about why R sometimes makes me scream later. . . Today, I have a graph showing information people might think worth digesting:

I’m not going to try to discuss the implications of this graph, as that discussion would involve discussing a while bunch of sub topics (which may come up in comments.)

Are you wondering what the graph above shows? Assuming it doesn’t just show mistakes….
I’ll first describe what I did, and then explain what the graph shows:

Roughly, it shows a test to determine whether an observed trend from Jan 2001-Dec 2010 is equal to the sample mean trend computed over “n” runs from an individual model. The particular test used is a type of “z” test.

In a “z” test, we assume we know the standard deviation of 120 month trends we would compute from an infinite number of runs with an identical history of forcings prior to and through the end of any 120 month. Of course, we don’t really know this standard deviation. Because no modeling group can ran an infinite number of runs when projecting into the 21st century, it can’t be computed.

In principle, we can get an estimate by computing the variance in “n” model trends from Jan 2001-Dec 2010. Unfortunately, because the number of model runs is small, this results in a sizable uncertainty in our estimate of the standard deviation. So, to get a better estimate, I computed the variances over 9 non-over lapping 120 month periods during the 21st century then averaged those. My estimate of the standard deviation σ of 120 month trends for an individual model is the standard deviation computed by taking the square-root of the averaged variance.

In my analysis, this σ is assumed to describe the variability in a trend due to “model weather” during any individual realization of 120 month period simulated by a model. (Other variability over time would be due to the deterministic (i.e. climate) response to forcings.)

If a model was perfect in all regards, this value of σ would also describe the variability arising from “weather” during 1 realization of earth weather over any period. (Note: this does not include variability due to variations in forcing over time.)

So, assuming we know the value of σ and that value corresponds to both “model weather” and “earth weather”, the difference between the model mean trend based on “n” runs and the observed earth weather will have a standard deviation of σT22(N+1)/N.

At this point, I made a rather wild assumption: that value of σ contains no uncertainty and that the trends across models runs are normally distributed. I then computed the uncertainty 95% uncertainty intervals using the ‘z’ value for the normal distribution. These are shown.

So, under the assumptions,we can see that

  • the observed trend computed based on HadCrut is outside the ±95% confidence intervals for 4 models,
  • the observed trend computed based on NOAA/NCDC is outside the ±95% confidence intervals for 3 models,
  • the observed trend computed based on GISTemp is outside the ±95% confidence intervals for 1 models,

(Caution: Before someone says “Wow, 4!”, thinks “Bi-nomial theory can be used to estimate the probability of 4 rejections out of 10″ if all models are ‘perfect’ ” … no… don’t do that. The tests aren’t independent. I’ll be discussing that further at a point.)

This is one test I will be discussing further in later posts. I’m also getting to the point of assembling another set of test that will use time series analyses to estimate the standard deviation of earth trends. So, we’ll be seeing more graphs. But since I got to this one and feel comfortable enough to at least answer questions that were immediately obvious to me, I thought I’d show it.

More later.

Update: I thought I should display sensitivity to start year by showing results if we begin in 2000 as well.

Attempt to do things in R.

As some of you know, I’m trying to convert to using R as my base plotting tool. It has some advantages over Excel, but I have to admit that I’m not always sure I have done the exact analysis intended. Today’s plot is an attempt to show:

  1. The least square trend fit computed using Ordinary Least Squares, shown with ±2 σ where σ represents the standard error in the mean trend computed using red corrected ordinary least squares.
  2. The best fit trend computed assuming residuals from the trend with time are ARMA(1,1). The illustrated line only showed the trend with its associated ±2σ standard error in the mean trend as spit out by R.

All fits are to GISSTemp data. Before over interpreting this: I am sufficiently uncertain about the ARMA(1,1) fit and especially how to interpret the ±2σ uncertainties from the ARMA(1,1) fit, that I am planning to run some monte carlo with synthetic data to verify that what I do really, truly does on average reproduce the mean trend from a batch of synthetically generated data and gives appropriate uncertainty intervals.

But.. in the mean time, anyone who wants to read the skanky parts of the Rscript and tell me if they can tell straight off whether I am using R correctly to fit data under the assumption of ARMA(1,1) and whether I am correctly fishing out the stated uncertainty in the trend, have a look. Let me know! (If possible, for today, please limit advise to using R as a black box rather than suggesting more complicated analyses that I could do after I become an R master.)

Script: First bit from Kelly O’Day; last bit by me.ReadGISSTempR_ARIMAFit

Margin Control in R: OMA and MAR and the vanishing axis label.

Yesterday, I was perplexed when the X and Y axis labels in my figures generated with R were vanishing or getting truncated. Not good.

I had some difficulty finding help on line for a variety of reasons. One might simply be that many of my difficulties arise because I want to use larger fonts than many people. The specific problem I encountered will tend to happen to R novices who take an existing plotting routine and then try to increase the font size on the title and axis labels. People who like itty-bitty font sizes will rarely encounter this problem.

Two resources helped me understand what was happening. One was Mr. Kelly O’Day’s ppt presentation which gives a good overview of R plotting basics. The other was Stowers Institute which provides a very specific discussion on”par()”, “mar” and “oma” which control margins in R. Today, I’m just going to discuss how inappropriate setting of the “mar” parameter resulted in the disappearance of X and Y axis labels.

What are “par(), mar and oma”?
Continue reading Margin Control in R: OMA and MAR and the vanishing axis label.

R Plotting Questions

I’ve decided that I’m just going to ask people stupid R questions. Today’s stupid questions will relate to making graphs pretty.

As some of you know, yesterday, I posted figures generated using a script that cannibalized Mr. Kelly O’Day’s handy script for plotting NinoSST’s.

I wanted to discuss a question about the persistence of La Nina, and thought I re-use the script. But it occurred to me that the visually challenged (i.e. me) might get head aches trying to squint to read the text in graphs like this one:

Yes. We can all click to embiggen. Oh, I know lots of people think small font looks prettier and somehow ‘more scientific’. But after embiggening, I still have to squint. Plus, when I embiggen, I find it difficult to read the text while also inspecting the figure. So, even if “lots of people” like small “elegant”, “scientific” looking fonts, I don’t.

The desire for larger fonts lead me to entering things like ?plot — which didn’t help much– and googling things like “how to change font size in R” etc. In the process, I discovered some some commands, that let me make changes– but I am dissatisfied with the plot. Since I’m sure at least one of you knows a zillion different ways to control graphs in R, I’ve decided to just show my new plot and ask questions.

Here’s the new plot:

As you can see,

  1. I increased the font size for the title and labels. This was accomplished by setting the cex.main and cex.lab to larger values than Kelley selected. I modified her his plotting function, passed a default value of “cex1”, and then set the title and labels to 110% that size:
     
    plot(nino_34$yr_frac, nino_34$ssta, type = “n”, axes=F, ann=T, main=title, xlab = x_lab, ylab = y_lab, cex.main=1.10*cex1, cex.lab=1.10*cex1

    Note that setting “axes=F” cased the default axes to vanish. I would prefer to keep the default axes which looked nice but just control the itty-bitty default font. But the tutorial pages I found didn’t do that. They made them vanish and then added them later.

    Note: Solved by guessing… I added cex.axis=cex1 to the plot() command. D’oh. (I wonder why the tutorials did it a different way. Oh well…)

  2. Despite asking for a label on the x-axis using “plot”, there no label on the “X” axis. None appeared when I wrote the plot command including xlab = “Year”. So clearly, something is interfering with these showing up. Anyone have a tip?
  3. I managed to create custom axes. However, my method is clunky and unsatisfactory. There must be a better way. I used axis(). When I used the default tick marks, the graph looked like s*it. So, I specified tick marks. I picked some. The graph looked like s*it. I fiddled until the graph didn’t look too wretched. The final fiddling involved these two commands:
     
    axis(2, at=yTicks,las=1, cex.axis=cex1, outer=F, line=lineY) # Add Y lables.
    axis(1, at=xTicks,las=1, cex.axis=cex1, outer=F,line=lineX) # Add X lables.
     
    The xTick and yTicks are the x and y tick marks shown. LineY happens to shift the axis up. LineX happens to not shift at all. Note that the two axis don’t meet, that’s suboptimal, but trust me, the defaults were uglier.
  4. I moved the text around. This was accomplished by tweaking the positions in “mtext”, positioning things relative to various axes:
     
    mtext(“Lucia Liljegren – http://rankexploits.com/musings”, side =1,line =1, adj = 0, cex = cex1, outer=TRUE)
    mtext(format(Sys.time(), “%m/%d/ %Y”), side =1, line =1, adj = 1, cex = cex1, outer=TRUE)
    mtext(source_note, side =1,line=lineSource,adj=0.07, cex=cex1, outer=F).
     
    I’m happy with this– but if there is a better way, let me know.
  5. I made the text larger:
     
    mtext(“Lucia Liljegren – http://rankexploits.com/musings”, side =1,line =1, adj = 0, cex = cex1, outer=TRUE)
    mtext(format(Sys.time(), “%m/%d/ %Y”), side =1, line =1, adj = 1, cex = cex1, outer=TRUE)
    mtext(source_note, side =1,line=lineSource,adj=0.07, cex = cex1, outer=F)

    I’m happy with this– but if there is a better way, let me know.

  6. Update: Actually, I’ve concluded I’m even less happy with the graph than I thought. I had manually adjusted the size of the window. The graph above looks not so horrible– but when I closed the window and reran– wow! Horrible. Any tips on how to make graphs look good, while also not having ittsie-bitsie fonts would be welcome.

Any tips on how to make this less ugly? Specifically, I’d like the axes to touch each other, and I’d like to avoid having to do anything remotely complicated when dealing with tick marks. It’s nice to have all this control, but clearly, I don’t want to manually adjust things for every graph I make. It seems to me there must be some easier way. But googling is coming up with tutorials that just don’t solve my particular issues in a clean way.

If you want to look at the ugly script, it’s here: LuciasNino34GraphingRScript

Comparison of variance estimates for trend: Part 2 (Still synthetic data)

This post is a continuation of the discussion in Part 1 and will be equally rambling. There is a bleg at the end: (If you can suggest a test for the euquality of two variances in R, I’d appreciate it. 🙂 Update: I found this:”We can use the F test to test for equality in the variances, provided that the two samples are from normal populations.” here. Next rambling post will use that. 🙂 )

Now, for the ramble:
Continue reading Comparison of variance estimates for trend: Part 2 (Still synthetic data)

Calculating Annual Average Temperatures: Conditional Averages

Annual Avearaged  Daily High Temp vs Year VaexJoeToday’s “How to teach yourself R lesson” involves calculating annual average temperatures from a string of daily temperatures.

The average temperatures, which I must observe, look fairly trendless are plotted to the right. That said, there may be a trend buried in there. If so, it’s sufficiently small to require teasing out using statistics.

Now, for those interested in learning R, I will describe what I did in some detail. I will then make a few comment in the final summary.

Backgraound: Recall: Having obtained the large file containing daily high temperatures recorded at VaexJoe, Sweden, I wanted to figure out how to calculate the annual averages daily high using functions available through the R project.

Main steps to calculating the average temperatures

It turned out these averages were easy to calculate using the ‘zoo’ package.

The steps I performed to calculate the annual average were:

  1. Load the zoo library.
  2. Create a ‘zoo’ class variable with temperatures indexed by date.
  3. Use ‘aggregate’ to calculate the means as a function of some condition. In my case, the ‘condition’ is the year in which data are collected.

Naturally, after calculating these things, I graphed a chart, examined a histogram and looked at some simple statistics.

R project Session Details

I typed the commands shown in blue. When ‘R’ has processed my command, it either spits back the ‘>’ prompt or says something and then spits out the ‘>’ prompt. The # indicate my comments which I typed into ‘R’.

>#read in my file. (For details see Read Data into R.)

>vaexJoe<- read.table("file:///Users/lucia/Desktop/global/R_Data_Sets/VaexJoe.txt",sep="\t",header=TRUE);

># view names of files
>names(vaexJoe)
[1] “SOUID” “DATE” “TX” “Q_TX” “Temp_9” “DAY”
[7] “Temp_19” “EarlyHalf” “LateHalf” “Record” “Day” “Year”

># recall that “Temp_19” contains the cleaned up daily temperature datas with “NA” (not available) listed where ever the raw file indicated an error.

> # load the zoo library so I can use it. (Details: Attach an r-project package.)
> library(zoo)
># after doing this, R spits out stuff starting with Attaching package: ‘zoo’

># create a ‘zoo’ object with temperatures in column ‘Temp_19’ and dates starting with January 1, 1918, which is the first date in the data file.
>z <- zooreg(Temp_19, start = as.Date("1918-01-01"))

># Ask R to provide a summary because I have no confidence in commands I have never used before.
> summary(z)

Index z
Min. :1918-01-01 Min. :-211.0
1st Qu.:1939-01-01 1st Qu.: 32.0
Median :1960-01-01 Median : 102.0
Mean :1960-01-01 Mean : 104.5
3rd Qu.:1980-12-31 3rd Qu.: 177.0
Max. :2001-12-31 Max. : 344.0
NA’s : 57.0

># I notice the first and last dates match. The temperarutes ranges are correct etc. So, the zoo object seems ok.
># notice there are 57 “NA” values in the full ‘z’ column. Those correspond to the bad data. Previous inspection indicated most occurred druing the 20 and one occurred during WWII.

># Calculate means using the “aggregate” function
>aggregate(z, as.numeric(format(time(z), “%Y”)), mean)

># R prints out the mean temperatures for the years fro 1918 to 2001. I notice “NA” appears in every year with missing data. At some point, I might care about that, but not today.

># Today, I decide to stuff the aggregate data into a variable called “means” which will be calculated for each year. (Maybe this is a frame or a class? I don’t get R project lingo yet.)
>means<-aggregate(z, as.numeric(format(time(z), "%Y")), mean)

># Eyeball to plot of Average temperature vs. Time.
>plot(means)

Annual Avearaged  Daily High Temp vs Year VaexJoe

So, what did I learn?

Recall, currently, I’m trying to learn how to use R. I succeeded in

  1. calculating an average yearly temperatures from a string of daily temperatures.
  2. learning how R handles missing data. When even one data point is missing from a years worth of data, the aggregate function returns “NA” for the full year’s worth of data.
  3. plotting and eyeballing the averaged data. I see that there are a fair number of “NA”s during the 20s and 30’s and one during the 40s. I also see that the variability is rather large compared to any trend with time.

What about Global Climate Change?

I know the two or three readers that will visit are going to ask “What about ‘the ultimate’ question? Has Vaexjoe warmed due to Anthropogenic Global Warming? ”

Based on what I’ve done so far, the technically correct answer is “Beats me!”

In fact, we haven’t even learned much about the testable question Tamino posed which was: Was the average Vaxjoe temperature between 1985 and 2001 warmer than the average temperature between 1977 and 1984 and is that difference is statistically significant to the 95% confidence level.

For now, we can eyeball the data and decide whether or not the answer seems obvious by simply viewing data.

I’d suggest the answer is no, the answer isn’t obvious. Any trend is small; the scatter is large.

Still, as I once told my students: when the answer is so obvious you can tell by looking at a graph, you usually don’t need to do any statistical tests to convince anyone the trend exists. (Or… you don’t need to do the statistical tests unless they are mandated by law or corporate policy. )

But in this data set, trend is small compared to the scatter. So, we feel confident a trend exists without doing some statistical tests and doing these tests– and doing them correctly.

How to attach an R – Project Package.

If you use R any length of time, you will soon learn that many functions you want to use exist in extensions called packages.

What is a package?

The R project has a main program that includes the default function many analysts use to plot, calculate and perform common tasks. It also permits programmers to write special purpose code and distribute these as add-ons to the main code.

I discovered I needed to use “zoo”, a package that will (supposedly) make it simple to perform conditional averages. In my case, I have a of daily temperature values lasting 83 years. I’d like to calculate yearly averages; eventually, I may wish to calculate seasonal averages etc. I could program a loop, but I knew that it would be easier if someone else has already coded this, so I subscribed to the R-help mailing list and asked politely.

A nice person suggested I use “zoo” but wouldn’t tell me how to attach it. (Email lists… sigh…) However, he did make some suggestions about how I could google to find out how to attach a function. Installing turned out to be trivial; (this is likely why they guy wouldn’t answer my question.)

How I installed the R package “zoo”

In any case, here is how I attached “zoo”.
First, I read in various R documents, that I most users with internet access can download and install packages by typing ‘install(“packageName”)’, where “packageName” is the name of the package. The name of the package I want is “zoo”.

This advice was followed by a lot of very specific advice for those using Unix and no additional advice fro those using Mac’s or PC’s.

I use a mac, so, I decided I must be “most users”. I fired up “R” and type the command shown in blue below.

>install.packages(“zoo”)
— Please select a CRAN mirror for use in this session —
trying URL
‘http://cran.cnr.Berkeley.edu/bin/macosx/universal/contrib/2.4/zoo_1.3-0.tgz’
Content type ‘application/x-gzip’ length 675762 bytes
opened URL
========================

The dialog box asked me to select a CRAN mirror and window showing a list of mirrors popped open.

Install Zoo Package for R project

I’m in Illinois, so I selected “USA(IL)”.

That turned out to be a mistake. R told me that it couldn’t find “zoo” on that mirror. I fiddled around, and finally closed R, then fired it up again. I repeated the previous steps, and this time selected “USA(CA)”.

This time, R was happy and, after a pause, said this:

downloaded 659Kb
The downloaded packages are in
/tmp/RtmpocOIeY/downloaded_packages

I wasn’t quite sure I liked the idea that this is stored in a directory called “tmp”, but there were no further instructions.

How to load the R package “zoo”

Next, I read in manual discussing packages that to actually use, the functions in package, you need to load the package. To load “zoo”, I needed to type “library(zoo)”, like this:

> library(zoo)
#http://wiki.r-project.org/rwiki/doku.php?id=getting-started:installation:packages

Attaching package: ‘zoo’
The following object(s) are masked from package:base :
rapply

Typing that seems to have loaded the package; this works provided it’s already installed.

Evidently, now that I’ve installed zoo, any time I want to use zoo, I can type library(zoo), and it will be available in R. (You can also unload function when you no longer need them. This is done to conserve memory when you need to use several memory intensive packages.)

So, how do I actually use Zoo?

What can I do with “zoo” to calculate conditional averages? Beats me!

The nice guy suggested I type vignette(“zoo”), vignette(“zoo-quickref”), ?zooreg, ?aggregate.zoo for more information.

I tried vignette(“zoo”). A pdf file opened– that looks promising!

With luck, today, I’ll manage to obtain yearly average maximum temperatures for VaexJoe, Sweden.

Why not use Excel?

If I’d used Excel, I’d have annual averages by now. But my graphics wouldn’t look this nice:

Temp vs Day

(I’ll explain what that is and how I got it later.)

Also, doing other steps in any statistical analysis would begin to be slow and pesky. (Have you ever tried to select a column of 3400 cells in Excel?

In any case figuring out if the higher temperatures recently experienced in VaexJoe, Sweden are statistically significant is not time critical. (Heck, it’s not critical at all!)

So I think, in the long run, using the problem to learn R so I can apply it to other more important questions is worth the effort.