I’m not going to go through all the math that has been posted elsewhere on unit roots. This will be much more practical. I’ll treat the climate system as a black box that just adds noise to the input signal. I can add more complexity like time constants for a two box model later. As a first pass, I’ll use white noise and see what happens. First I’ll create the synthetic time series A,A1 and A2 with 0, 1 and 2 unit roots respectively. I’m also going to use just the Phillips-Perron test for unit roots. I’m using R for Windows version 2.10.1 and the PP test is from the tseries package. The PP test has a null hypothesis that the series has a unit root. A p value of less than 0.05 means the null hypothesis is rejected (no unit root) with a confidence of greater than 95%.
One can easily create a time series with one unit root by integrating, or cumulatively summing, a white noise series. The code in R for this is:
A<-rnorm (n)
A1<-cumsum (A)
Where n is an integer defining the length of the time series, rnorm is Gaussian noise with mean zero and standard deviation one and cumsum is A1(n) = A(n) + A1(n-1). Similarly, an I(2) time series can be created by cumulatively summing an I(1) series. Differences are created using diff(X). The series are centered to as close to a mean of zero as I can get before summing to minimize trends in the summed series and scaled to similar ranges. The length of each series is 1000 points.

A2 is black, A1 is blue and A is red.
Testing A2, A and A1 for unit roots:
Phillips-Perron Unit Root Test
data: A2
Dickey-Fuller Z(alpha) = 0.0049, Truncation lag parameter = 7, p-value
= 0.99
alternative hypothesis: stationary
Warning message:
In pp.test(A2) : p-value greater than printed p-value
> A2d1<-diff(A2)
>pp.test(A2d1)
Phillips-Perron Unit Root Test
data: A2d1
Dickey-Fuller Z(alpha) = -13.204, Truncation lag parameter = 7, p-value
= 0.3732
alternative hypothesis: stationary
> A2d2<-diff(A2d1)
>pp.test(A2d2)
Phillips-Perron Unit Root Test
data: A2d2
Dickey-Fuller Z(alpha) = -921.257, Truncation lag parameter = 7,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A2d2) : p-value smaller than printed p-value
Phillips-Perron Unit Root Test
data: A
Dickey-Fuller Z(alpha) = -923.5331, Truncation lag parameter = 7,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A) : p-value smaller than printed p-value
> pp.test(A1)
Phillips-Perron Unit Root Test
data: A1
Dickey-Fuller Z(alpha) = -13.4424, Truncation lag parameter = 7,
p-value = 0.3599
alternative hypothesis: stationary
> A1d1<-diff(A1)
>pp.test(A1d1)
Phillips-Perron Unit Root Test
data: A1d1
Dickey-Fuller Z(alpha) = -922.0155, Truncation lag parameter = 7,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A1d1) : p-value smaller than printed p-value
The PP test results are as expected. A2 has to be differenced twice and A1 once before the PP test rejects the unit root hypothesis. But a cumulative sum is a low pass filter. Does the temperature series look like data from a low pass filter? Not really. If the climate is chaotic, then one expects to see variation at all time scales even when the input isn’t changing. I’ll add some noise, a fraction of series A, to A2 and see the effect of the noise and the series length on the test results.
>A2n<-A2+0.1*A

Phillips-Perron Unit Root Test
data: A2n
Dickey-Fuller Z(alpha) = -0.0466, Truncation lag parameter = 7, p-value
= 0.99
alternative hypothesis: stationary
Warning message:
In pp.test(A2n) : p-value greater than printed p-value
The full series with noise still has at least one unit root. But what are the results for a shorter segment?
>pp.test(A2n[1:100])
Phillips-Perron Unit Root Test
data: A2n[1:100]
Dickey-Fuller Z(alpha) = -9.8403, Truncation lag parameter = 3, p-value
= 0.545
alternative hypothesis: stationary
> pp.test(A2[1:100])
Phillips-Perron Unit Root Test
data: A2[1:100]
Dickey-Fuller Z(alpha) = -2.2552, Truncation lag parameter = 3, p-value
= 0.9602
alternative hypothesis: stationary
The first 100 points still fails to reject the presence of a unit root for both the original and the noisy series, but the p-value for the noisy series is less than that for the noise free series. Now I’ll test the first difference.
> A2nd1<-diff(A2n)
Phillips-Perron Unit Root Test
data: A2nd1[1:100]
Dickey-Fuller Z(alpha) = -119.0049, Truncation lag parameter = 3,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A2nd1[1:100]) : p-value smaller than printed p-value
Phillips-Perron Unit Root Test
data: A2nd1[1:200]
Dickey-Fuller Z(alpha) = -252.2832, Truncation lag parameter = 4,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A2nd1[1:200]) : p-value smaller than printed p-value
Phillips-Perron Unit Root Test
data: A2nd1[1:500]
Dickey-Fuller Z(alpha) = -722.274, Truncation lag parameter = 5,
p-value = 0.01
alternative hypothesis: stationary
Warning message:
In pp.test(A2nd1[1:500]) : p-value smaller than printed p-value
Phillips-Perron Unit Root Test
data: A2nd1[1:1000]
Dickey-Fuller Z(alpha) = 4430.345, Truncation lag parameter = 7,
p-value = 0.99
alternative hypothesis: stationary
Warning message:
In pp.test(A2nd1[1:1000]) : p-value greater than printed p-value
I need nearly the full series to fail to reject the presence of a second unit root.
To see how things compare, here’s a plot of the first 124 points of the synthetic noisy series in black and the GISS anomaly from 1880-2003, scaled to the same range, in red:

In answer to the title question, we don’t know that the temperature series is I(1) and not I(2) because it’s too short and too noisy. In terms of whether there’s a trend or not, I think there are more suitable tools available based on control chart theory like CUSUM or EWMA charts that are designed to test for the presence of small trends or variations from the mean in short series. That’s another subject, though and besides I don’t have a package to do EWMA control charts.