I’m pretty sure a bot bet this month. This suggests bots can now add even when the script does not contain the sum in the hidden form fields.
My entry table has in flag to indicate I thought it was a suspected bot. If the script reports this is your third try, I enter the bet, but flag it as a suspected bot. However, I should also note that known human with a real email was flagged as a bot. This is to prevent utter frustration. After all, you are only entering a bet! The difference between the bot and the human? The bot left a 1 letter name (N) and no email address. I edited the name to reflect the fact that I suspect it’s a bot. (If you are N, identify yourself further! )
While bots money is good and they are allowed to bet if they are smart enough to get past the entry form, I have tweaked the form to make things a bit harder for this new breed of adding enabled bots. No. I didn’t add a captcha. I
- Names can no longer contain spaces, are limited to alphanumeric symbols and ‘.’, and must contain at least 4 characters. Even bots need to have distinctive names! (I can permit spaces if people really want them. In fact… I’ve already decided to tweak that!)
- After validation using ‘FILTER_VALIDATE_EMAIL’, the email filed must contain at least 5 characters.
- Bettors must leave a user agent which I now record. (This is a new check; browsers leave one by default.)
- IPs must validate using FILTER_VALIDATE_IP. (Valid IPs are left by default.)
- I now record the referer. I want to see if bots that eventually get through are spoofing referers. (Many people’s privacy software spoofs referer so I’m not requiring this to be valid.)
- I no longer tell you the answer to the sum if or when you enter the value incorrectly.
For those wondering, here are the bets.
| Name | Prediction (C) | Bet |
| Anteros | 0.137 | 5 |
| dallas | 0.1 | 5 |
| Paul Butler | 0.144 | 5 |
| YFNWG | 0.135 | 5 |
| Ray | 0.24 | 5 |
| Jeremy Harvey | 0.198 | 2 |
| Gary Meyers | -0.054 | 3.142 |
| Bob Koss | 0.337 | 5 |
| ErnieP. | 0.112 | 4 |
| SteveF | 0.101 | 5 |
| Greg Meurer | 0.22 | 3 |
| Lance | 0.061 | 5 |
| PaulS | 0.135 | 4 |
| Skeptikal | 0.178 | 4 |
| MikeP | 0.18 | 5 |
| CoRev | 0.12 | 5 |
| diogenes | -0.01 | 2 |
| Pieter | 0.079 | 5 |
| plazaeme | 0.18 | 1 |
| AndrewKennett | 0.1 | 4 |
| Jeff Condon | 0.151 | 5 |
| Don B | 0.138 | 4 |
| mct | 0.017 | 5 |
| sHx | 0.121 | 5 |
| ivp0 | 0.173 | 5 |
| hswiseman | 0.06 | 5 |
| mike worst | 0.05 | 5 |
| Tim W. | 0.22 | 5 |
| Pavel Panenka | 0.321 | 3 |
| Owen | 0.213 | 5 |
| Big Bear | 0.089 | 5 |
| BenjaminG | 0.159 | 5 |
| denny | 0.115 | 3 |
| Peter | 0.31 | 2 |
| Jarmo | 0.132 | 5 |
| Nyq Only | 0.15 | 5 |
| Hal | 0.01 | 5 |
| Boris | 0.201 | 5 |
| Freezedried | 0.15 | 3 |
| Jefff | 0.07 | 4 |
| Arfur Bryant | 0.132 | 5 |
| Paul Ostergaard | 0.05 | 5 |
| Anthony V | 0.14 | 5 |
| Scott Basinger | 0.175 | 1 |
| KÃ¥re Kristiansen | -0.042 | 5 |
| Niels A Nielsen | 0.055 | 5 |
| Robert Leyland | 0.204 | 4 |
| Cassanders | 0.223 | 5 |
| Rick | 0.12 | 4 |
| DocMartyn | 0 | 5 |
| RobB | 0.113 | 5 |
| ob | -0.011 | 1 |
| N (bot?) | 0 | 1 |
| EdS | 0.18 | 5 |
| IainT | 0.65 | 3 |
| Tamara | 0.095 | 5 |
| pdm | 0.121 | 5 |
| MDR | 0.065 | 3 |
| Paul S | 0.28 | 3 |
| Steve Taylor | 0.04 | 4.25 |
| MarcH | 0.123 | 5 |
| John Knapp | 0.117 | 3 |
| Earle Williams | 0.123 | 5 |
| MichaelP | 0.195 | 3 |
| nzgsw | 0.132 | 5 |
| AMac | 0.131 | 3 |
| John Norris | 0.282 | 5 |
| Predictions shown in reverse chronological order. | ||
—
(Note: Anteros’s bet was manually entered late. He thought he’d entered on at the last minute, and noticed it wasn’t showing. Because he always bets, always does so at the last minute, and mentioned the issue quickly, I entered his bet.)
Lucia please feel free to move this to a thread you consider appropriate.
Motivated by what I’ve seen on Roy Spencer’s site (FBoFW) and the discussion on Zeke’s recent post here I started to take a look at the NOAA ISH data that Roy’s been using. It’s not very fun to download.
I think the hourly data is useful, though I am not entirely sure how its going to differ from the data used in most of the indices which I think are using monthly which is in turn based on daily data.
Here is a plot of 4 stations near my home turf western PA (US): http://img638.imageshack.us/img638/6899/chart1nk.png
My plan is to do another dozen or so stations nearby and then break and look at comparisons with an index such as BEST where I feel comfortable with the data. One thing I’m interested in is the station breakpoints in BEST, partly because that’s been discussed at CA and Roy’s blog, but only in a very general way.
It’s too early to comment on any results, but it’s interesting to note that the “Venango Regional” station appears to have a breakpoint in 1991-92, which seems to carry through the data gap in 2000-2003.
I am interested in any advice at this point.
Uh oh! Lucia and Anteros in cahouts (late entries don’cha no) with suspect Bot(s)? Now we need safes for our quatloos.
CoRev–
Yep! I hope bots’ don’t bet next month. But if they do, this month I’m recording their IPs, referrer and user agents. I want to know where “N” comes from.
CoRev…just check Anteros’s bet….he forgot the minus sign, loser!…lol
Where is the kind volunteer who usually calculates the mean, median, whatever, of our monthly guesses?
I have the sense, although not actually done any work to confirm it, that we betters as a group are guilty of anchoring. That is, we have been too heavily influenced by the most recent anomaly. Has anyone else thought that was the case?
Hi Lucia
I think I may have been the adding enabled bot 🙂
I was careless when I first entered a bet. I may have used the computer at work or my smartphone and not this computer which I normally use for blogging.
I’m sorry
Don B
I posted this analysis of the March betting in the original thread, but here it is again, in case anyone hasn’t seen it:
MAX 0.650
MIN -0.054
MEAN 0.140
MEDIAN 0.132
STD DEV 0.104
MEAN 1-33 0.143
MEAN 34-66 0.137
MEAN PLUS 1 SD 0.244
MEAN MINUS 1 SD 0.036
The above includes Anteros’s bet and the one of 0.65c, but not the bot!
Don B –
You might be right – it would be interesting to see if the mean of the bets was skewed in the direction of the previous months anomaly.
My guess is that if true it is not by much – there is also the pull of extrapolating the trend away from the previous anomaly, which can exaggerate in the opposite direction.
Throw in some crazy WAGs, bot-guesses, innovative algorithms and sneaky late bets, and an anchoring effect will be minor amidst the noise.
***
I think Niel A Nielsen should be awarded a Kimful of quatloos for successfully (if inadvertently) impersonating a bot. 🙂
Anteros–
If bettor are rational, the anomaly cluster around last month anomaly. That’s just about the best data we have for future guesses.
With luck, no bots will bet next month! (They always bet 0 anyway. I could probably screen them by figuring out if someone entered an honest to goodness ‘0’ vs. an empty field. I thought I did that….. but… hmmm…)
Niels– if you want to claim it, I’ll attribute that to you! That would mean I don’t have a bot problem. I was very puzzled. But you didn’t leave an email. I can use the one you comment with.
Computers can do math, but they can’t do poetry. Write a limerick with the last word missing, and ask the entering party to supply the word.
One day as the sun was a-setting
For fevered results I was sweating
September’s been posted, no
winner has boasted, so
On October’s temps I’ll be ______.
If you can’t figure something like that out, you don’t belong here. Or maybe you’re a robot.
On a completely unrelated note, as the past open thread is closed:
Hey Lucia, your site is being brought up in comments at Forbes 😀 http://www.forbes.com/sites/warrenmeyer/2012/04/19/a-vivid-reminder-of-how-the-climate-debate-is-broken/?commentId=comment_blogAndPostId/blog/comment/1036-839-918
Guess you’ve really made the big time!
Of course, just have to ignore all the insanity going on there.
Lucia: “I can use the one you comment with.”
Yes, you can.
Lucia –
Are you still sure there are bots that can add up?
Billc,
GSOD is the daily min/max/tavg version of ISH, and is much more manageable to work with. Its available here: http://berkeleyearth.org/source-files/
One of my current projects with the Berkeley group is comparing their results for CONUS to those of NCDC. Its an area with a lot of adjustments that have a significant effect on the trend, so using a completely different method of homogenization (e.g. Robert’s scalpel) is a useful test.
Anteros–
If it entry was Niels typo, no, I’m not sure. On that script, my ‘diagnosis’ is :
1) If they bet the anomaly was zero,
2) Took 3 tries to guess the sum
3) Provided no valid email,
They are probably a bot. But of course, they could just be a typo prone individual and entering from a very small keyboard might be the cause.
Ray, thanks.
Zeke,
Do you know how Tavg is calculated for GSOD? Meanwhile I will go read about it.
Zeke – Found this. I would say that the Count field is key.
ftp://ftp.ncdc.noaa.gov/pub/data/gsod/GSOD_DESC.txt
At least the Bot is quite smart, it entered the same prediction I did.
Doc– 0 C is not a ridiculous bet this month.
Lucia, I know that i reached a page where I was notified that you thought I May be a bot. I also remember that I used used a comma when I entered the sum and not a point.
I live in a strange European country, DK, where you do that
Writing in a car from my phone
Niels A Nielsen –
Could it be that you were hypnotised by the whirring of the windmills, and your humanness was affected?
At least, when writing in your car you display no hint of botness. Perhaps cars are the answer to windmills?
I’ll put translating “,” to “.” before checking if something is a number in my to do list! It’s pretty easy. I should just do that!
Anteros, he yes it must be the windmills. Our landscape is scarred with a lot of those products of human folly or rather utopianism. Outdated windmills are now considered an environmental problem. The materials Are not easily recyclable.
Lucia, yes and funny that my errors, forgetfulness (not providing an email adress) and carelesness got me under suspicion of being nonhuman.
I’m rich, I’m rich!
=============
Zeke (& anyone else interested):
I followed your suggestion (Zeke) to get the GSOD data courtesy of BEST. I also finished my promised download of “another dozen or so stations nearby”.
The linked graph shows my plot of monthly average anomalies I calculated myself from the hourly data, versus the monthly averages given by the BEST GSOD data.txt file, for a couple of my stations (again, centered on western PA, USA).
http://img826.imageshack.us/img826/2041/chart2c.png
After 1995, they start to diverge, with the BEST-GSOD temps rising rapidly with respect to the ISH data. Of course, I could be making mistakes but I am pretty sure I didn’t change anything in my method at 1995. It is interesting to note that this is around the same time as Roy Spencer started noticing differences between his compilation of the ISH data and the CRUTem and USHCN averages.
I don’t imagine that the raw GSOD data from BEST differs from the GSOD data that is online at NCDC, but I have not yet checked out the latter, since the format of the BEST data was something I knew how to work with. I should probably do that soon, but I may finish the rest of the stations compared to BEST before switching gears and doing that.
Again, any suggestions would be fantastic.
Billc
1. start by asking roy for his code and his dataset.
2. do your own download of ISH direct from source
3. Look at BEST common format source zeke pointed you at
4. Then look at the QAd data from BEST
Lot of work. But Roy needs to cough up his code and data.
Steve,
I don’t think I explained well enough what I have done so far. Responses to your #s above.
1) Sure, sometime. But I think #2 comes first. Roy’s analysis piqued my curiousity, at least for now I’m not looking to confirm or deny, just poke. I am not saying anything at all about UHI or population adjustments, just looking at raw data.
2) Done, from NCDC ftp site. That’s the source of my anomalies. Only analyzed a few stations though, but now I have a method and it isn’t too hard to add stations, since I have the data.
3) Done. That’s the source of the comparison for the plot linked above. I didn’t explain the plot very well. It is:
plotline = {my monthly avg temp calculated from hourly readings} MINUS {GSOD monthly avg. temp (courtesy of best)}
4) Still on the to-do list. BUT – I didn’t really expect to find a discrepancy between #2 and #3. Which makes me want to add:
3A) Check a few stations to make sure NOAA GSOD is the same as BEST GSOD. No reason it shouldn’t be, but need to check.
I agree it would be best if Roy would put it all out there, though I think his analysis should be easy to replicate. Time consuming, but not hard. But there’s plenty to critique in what he did with the population adjustments, I’m not even going there right now.
Salient points from Roy’s posts and comments by you and Zeke:
Roy “I am quite surprised that, even without any adjustments, the ISH data show 20% less U.S. warming than the CRUTem3 data over the 1973-2011 period. ”
Me “Me too, and even more surprised that they apparently differ from GSOD which they should match exactly”.
Zeke “Its also worth pointing out that ISH data has very little quality control or correction for inhomogenities, and it would be interesting replicating your approach with GHCN-Daily or the Berkeley monthly data.”
Me “NOAA claims pretty good QC on ISH data, but there is obviously no homogenization or any sort of adjustment designed to remove any sort of biases whatsoever. Having looked at the errata for ISH that NOAA posts on their website, I am not seeing what would obviously account for my results”.
Steve Mosher “Finally, the 20 day test you use for inclusion of data will cause you problems. In tests of all stations ( daily data ) I’ve found that as you relax the constraint from 100% of data coverage to less than 100% you introduce a cooling bias in trends. This is likely due to the fact that missing data is not uniform with respect to seasonality. have uniform data coverage across seasons is also critical for UHI studies since the effect has a strong seasonal component. That has been shown in the literature repeatedly”
Me “Interesting, but why would having a ’20 days out of a month’ criterion result in a seasonal bias as long as you have every month of the year and the intra-monthly coverage is random? Anyway I think my plot shows a clear seasonal bias and will try to elucidate this next.”
OK I figured out the difference between my averages and the BEST data, and it looks like the error is on my end (but this is really bizarre and I wonder if someone can explain why the heck this is the way it is) – see plot:
http://img685.imageshack.us/img685/6104/chart3e.png
The data is for the Bradford Airport (PA) station, for September 05, the month with the biggest difference between my average and the GSOD via BEST. The problem is I was averaging every sample over a 24-hour period rather than accumulating by hour and then averaging (problem is fixed in the blue line appearing on the plot).
I knew there were different daily count frequencies but I had no idea they would be concentrated at Tmin (I assumed they would be evenly distributed). The large number of readings near daily T minima lowered my average compared to the GSOD data. I should check this out for another station for a different time period, but live and learn I guess. Still, does anyone know why this is?
Given that Roy says he sampled at 6-hourly intervals, I guess his analysis might be free of this problem.
Well moving on I guess I will just stick to the GSOD data from BEST if I do any more of this.