Blocking TOR (because it’s a nightmare.)

I wanted to post a quick note on blog changes. I’ve decided I can no longer tolerate the periodic hack attempts coming from free or very cheap anonymizing services. These include TOR exit nodes and free anonymous proxies like “hide my ass”.

  1. I am now systematically identifying and blocking TOR exit nodes. The method is not fully automated yet but involves getting a list of TOR nodes currently in operation http://torstatus.blutmagie.de/ip_list_exit.php/Tor_ip_list_EXIT.csv, identifying which I have not yet blocked and blocking those. In the next few days I will script-i-fy this so I can run a cron job. Owing to the nature of TOR, all TOR exit nodes will be banned at Cloudflare.
  2. After having banned numerous known SPAM/HACK/NASTY servers and ISP’s, I have extended ZBblock to detect extra IPs in the ‘HTTP_X_FORWARDED_FOR’ header variable. The additional IPs are run through Zblocks list of bad IPs. If an otherwise innocent looking IP is being used to mask a known-nasty IP, the previously thought innocent IP will be banned. Also, any Brazilian IP that comes hiding IPs in the ‘HTTP_X_FORWARDED_FOR’ header will be banned. (If you are in Brazil and can think of why this might be a problem, let me know. But I’m tired of seeing Brazilian’s hiding spammy Chinese IPs. A few other countries will be treated similarly.)
  3. I have been blocking many known anonymous proxy IPs and will begin doing so more systematically. In the past, my script Cloudflare-banned IPs that were caught trying to hack. I normally unban those after 7 days, but my unban script keeps the ban in place host includes a word like “proxy”, “private” or “anonymous”. I will be escalating by visiting “http://proxy.org/proxies_sorted2.shtml” which lists proxies by IP, and banning IPs of proxies found listed on that page. The list seems to change at least daily, so I will be writing a script to read those IPs and setting a cron job to get those all banned.

    I know many people use anonymous proxies at work or on travel. If you must use an anonymous proxy service, please let me know the name of the service so I can make an exception for that service. If I know in advance, we can develop a workaround that permits me to screen out as many of stupid resource sucking crawlers while providing screens to people who really do need to use some sort of proxy. (If you are a stranger using ‘social engineering’ to try to carve out a hole for your seo company, try not to make the story to ridiculous. Plz. )

For those wondering if banning all these things really helps: Yes. It does.

I started banning TOR exit nodes the day before yesterday. I have seen a substantial drop in hacking attempts — particularly of the “timthumb”, “uploadify” or similar variety. I haven’t computed numbers, but my swag is The error lots are 50% shorter by number of entries and 90% shorter by unique IPs. (Some individual IPs will come in and hammer a while before I can ban them at Cloudflare.)

I am also seeing gaps of as much as 1 hour between errors in the error logs. These tend to log attempts to connect to missing uris. They fill up withe bots trying to hit non-existent pages containing words they guess. The words are usually things like”register”, “login”, “sign_in”. Today’s error logs have the wonderful-to-me feature that more than 1/2 the failed IP’s were trying to load broken links rather than hack-signature uris.

Hour long gaps are unprecedented in the past two years. So, banning TOR and free proxies really is helping.

I’ll be automating a lot of this tomorrow. Feel free to pipe in and give advice especially if you can help me find the IPs for free anonymous proxies more efficiently.

I’m now off to buy ingredients to make pie. The Women of the Moose requested pie for the meeting tonight. I think I’ll make apple.

67 thoughts on “Blocking TOR (because it’s a nightmare.)”

  1. Doc–If you were using TOR or a free proxy service, you would know you were using it. The TOR and proxy blocks will prevent people using those from even visiting the site. Well… unless the turn off TOR or their proxy which– for most– is trivially easy to do. No more difficult than taking off the black ski mask one might wear to hide their face from the person running the cash register.

  2. So…that “He’s always watching” the black cat is not true? Oh, well good luck with the bad guys.

  3. mwgrant–
    The kitty has caught some image scrapers and a few other things. But he can only catch one mouse at a time. The problem with TOR is that people who use it can constantly change IPs. So they try to hack with 1. I ban it. Then they change IPs and try a new hack in a few minutes. I ban that IP. Then…. So it really works better to find all the TOR exit node values and ban them all at once.

    I’ve banned about ~2,000 TOR exit nodes since Monday evening. 1,000 were banned in the first swoop. I’m now running a script manually and banning any freshly opened ones on the list. I’m down to banning about 25 when I get around to re-running the script. But I need to automate so the script just runs once a day, looks for new ones and then bans them.

    Same with the free proxies. I need to find the ips and ban them. Otherwise, the cat does catch lots of them. For example, any proxy with ‘.info’ in it’s host name… banned. But many just aren’t stupid enough to give themselves obvious proxy host names. So reading all the values off that page would be useful. Then… ban!

    (BTW: All those pages have PR language explaining they are a great service. Benefit to mankind an so on. Well… theoretically. But in practice, spammers/hackers/ script kiddies etc. really like any zero cost resource they can use to anonymize their IP to get past blocks.

  4. Lucia,

    “I’ve banned about ~2,000 TOR exit nodes since Monday evening. 1,000 were banned in the first swoop.”

    Yeah, I’ve noticed in time here how much you, and I presume others running a blog, have to do to maintain a site. I really don’t know if you have the patience of Job, but you’ve definitely got more than I would. I would not hesitate to give kitty open license to scratch some eyes out. Gotta protect the integrity of the process. Now, for some business. Is it too late to arrange a fix for the October GIS number…?

  5. mwgrant–

    Is it too late to arrange a fix for the October GIS number…?

    You mean the value you bet? Waaaaayyyy too late!

  6. MrE–

    Perhaps that site you use is current, I don’t know,

    The TOR list is current. TOR makes it available themselves. They sort of have to do so because those using the system need access to the node values. So…. if TOR didn’t make it available, TOR haters would just download TOR and run TOR nodes just to get the numbers!

    The TOR people also post a whole bunch of public relations stuff about how noble and good TOR is and how you shouldn’t block it suggesting all sorts of alternatives and giving “reasons” why hackers and criminals really wouldn’t use TOR because they have “better” resources.

    Well….sorry. But none of those “better” resources has the features loads and loads and loads of hackers, crackers, comment spammers and forums spammers want. These are:
    a) absolutely, totally, completely cost free,
    b) easy to download,
    c) easy to use,
    d) no investment in the IP if it ends up in a spam list,
    e) no having your ISP boot because of spam complaints
    f) somebody else writes, maintains, host TOR system.

    Or course the thing is infested with spammers, hackers, crackers etc. Not to mention it’s rumored that zillions of pedophiles use it to host and visit the sorts of site they wish to host and visit.

    So… yeah…. people in Iran and Kenya and blah, blah, blah can use it to get around firewalls or host their own content criticizing the government. But I can’t think of a single reason why I need to let someone use TOR to access this site.

  7. Lucia, it’s maybe some work to do, but what about a static blog hosted on Amazon S3 and Route 53, and delegate all the comments to Disqus or Livefyre: it’s cheaper and all the hassle of filtering, anti spam and the rest is done by the pros. Concerning the price it is cheaper than other hosted solutions.

  8. Tibor–
    I hate Disqus and Livefyre. Also, I’m not sure I could run the bets from Amazon. I’m not even familiar with Route53.

    mwgrant

    I mean the number you ‘report’. ;O)

    You’ll have to ask Gavin how to get those numbers fudged on the first announcement. That’s what counts for betting.

  9. diogenes (Comment #106230)

    I wasn’t aware of any numbers until your comment. I checked Spencer early yesterday–so now he has posted. Thanks. As I recall the anomalies differ signifcantly–off the top of my head I expect UAH around 0.2-0.3 and GISS around 0.5-0.6. (For my bet I used a simple approach based on just the historical data for the preceding month and the month of interest. Complexity did not do much to improve things.)

    Hmmm…I’ll have to go back and see if I also took a shot at UAH for kicks and giggles.

    Regards

  10. Disqus–
    For one thing, they are threaded. For another thing… they are just obnoxious. Volokh conspiracy uses them. People in comments have been bitching about everything bad about Discus since the switch. They are just… ick!!! I’ve hated Discus since the day they came out and they haven’t gotten any better.

    I Googled a bit about the Amazon stuff… It sounds pretty standard for static content. But I’m mystified how running a blog on it would be easy. It sounds like I need to a CMS somewhere else then convert all my stuff into static content, then somehow host the static stuff at Amazon? And.. the dynamic content somewhere else? (I guess that’s where Disqus comes in.)

    Making the posts static would mean– among other things– I couldn’t easily include my bets. The bets are fun– I’m not giving those up.

    Also… how do I include Latex? Or all sorts of things?

    Do you host your blog at Amazon? What specifically do you do to deal with things.

    I’m also a bit concerned that the exact same bots that hit my blog here would hit it there. But at Amazon, my site would stay up but I could end up with a huge $$ bill. So I don’t see how I would be able to avoid having to block them to keep my out of pocket costs down. Or is there something I’m missing?

  11. For the static part Octopress is not bad and I’m pretty sure they have a LaTex module, have a look here.
    For the comment part the only solution I know are Disqus and Lyverfyre, I tried the free option of Lyvefyre and it works, with the premium option you have stats and other things.

  12. Tibor–
    There are no comments at that blog and you have two posts: 1) the Hello post and 2) the one talking about setting it up. Do you have any links to large thriving blogs with discussion using Octopress? Before I consider it, I want to see examples of that I might aim for.

    Any solution involving Disqus is just out of the question. I have never ever ever seen an implementation of Discqus comments that is not revolting.

    Do you have a link to Lyvefyre? I googled… but it’s not helping.

  13. I’m reading lots of stuff in Lyvefyre. Many of the “pluses” are irrelevant. (For example: I do not need to coordinate conversations about moderation with my team of moderators. ) I don’t really want to do intensive analysis of the comments either. I just like a place for a nice more or less linear discussion co-located with the blog post.

    Out of curiosity– Can site visitors put Latex in comments in Lyvefyre? (Or Disqus? I suspect no… right?) We need Latex in comments here.

    Do you have examples of blogs with on going conversations in LyveFyre. I would want to know how that works out before making a change.

  14. That’s obviously a mockup to show how it works and what it costs, Iremoved Livefyre because the free options was limited. As all new technology there is a latency. On Octopress you have a list of blog but it’s indeed many hackers. What I mean is that’s the future because it’s cheaper and you spend less time on monitoring. Concerning the bandwidth and all these things Amazonis really great. Concerning the very problem you talk about in this post I believe it is the future, more and more big sites outsource their comment management, for all the reason you give. Static blogging is in fact a reality for many big site even WordPress is starting to propose such solutions with their W3 cache. What is expensive is the database management not the content hosting, so the comments not the articles. It is worth to think twice at these solutions.
    It’s getting late here, I’m going to sleep.

  15. Tibor–
    Bandwidth is a non-issue. That’s cheap everywhere.

    What is expensive is the database management not the content hosting, so the comments not the articles.

    Yes. But even comments are not that expensive. It’s the bots dashing in an loading everything. But I even mostly have that solved.

    The other problem is things specifically trying to hack/crack. Script injections, dictionary attackers etc. Being static doesn’t necessarily solve this– entities can try to hack that. Maybe there is something at Amazon that deals automatically prevents anything and everything from trying to hack in? (Or maybe I just don’t know how bots try to hack into Amazon yet because I don’t have a system there?)

    I still don’t get precisely what you envision with Octopress. If everything is static at Amazon, what do you do? Create the post at home and then… upload? Or what?

    Also– how would that solve the issue that I would want to turn all the former content into static content and host it? Etc.? (There are ways to do that with the WordPress blog. But clearly, it’s a task that would have to be done.)

    I know everything always looks simple if the idea is just to set something up and put up a few fresh blog posts. But I would need to have some notion about the full path forward before I spent much time converting a whole blog to Amazon. (And really… how would I do the betting?)

    I read this about livefyre so far: http://www.wpbeginner.com/opinion/reasons-why-we-switched-away-livefyre/ I would really need to see a site that uses livefyre comments. Because it sounds like it may share many of the features I hate about Disqus. And for me– hosting a blog with a comments system I find seriously unappealing is a non-starter. Disqus is… revolting. I think the only reason people have that is they want to see comment counts but they don’t care whether or not anyone can actually have a real conversations.

  16. lucia (Comment #106237),

    The bets are fun– I’m not giving those up.

    Are we going back to betting on UAH now?

  17. Quickly, cause I’m tired. By definition a static blog is not hackable as long as it is impossible to write on the disk where the site is stored, so you can imagine a big conspiracy but it is really, really difficult. For Octopress your articles are created on a local machine and uploaded to the clouds, very simple and efficient.
    Concerning the comments it was a suggestion to use LyveFyre or a similar system in order to get rid off the trouble you describe. There are other solutions for static comments but I’m not in this stuff at the moment, I’m sure I’m going to choose that solution in the future. There are also open source commenting systems you can run on the clouds but it is more expensive and you keep the problem of managing the spam yourself and the attacks yourself. What is sure is that decoupling comments and article is anyway a good policy, your site remains fluid even if you are attacked.

  18. Skeptical

    Are we going back to betting on UAH now?

    We can. It’s more fun to bet on the first to come out each month!!

    And what is your betting thing BTW.

    The thing both Skeptical and mwgrant are asking about. I wrote a little script that lets me place a form in the blog post. Then people can bet up to 5 quatloos on whatever we decide to bet on. Most months, we bet on the anomaly for UAH. But we also bet on other things.

  19. Tibor-
    I know a static resource can’t be hacked the same way a script can be hacked. But people can hack into the server and inject stuff into static files– it’s no different from Amazon than Dreamhost. But you’ve answered my question. We write the blog post at home, then upload. So… however we were to put in Latex, we would do that at home… and it would display… somehow.

    Concerning the comments it was a suggestion to use LyveFyre or a similar system in order to get rid off the trouble you describe

    Well…. the things trying to hack in are pretty stupid. I would imagine they would try to hack in at Amazon the same way they try to hack in at Dreamhost. The just dash around guessing uri’s an loading wrong address after wrong address after wrong address…..

    What is sure is that decoupling comments and article is anyway a good policy, your site remains fluid even if you are attacked.

    I guess I’m not seeing the advantage.

  20. A moose woman?
    I kind of prefer pecan pie, but I’m no moose. Moose do seem to like apples, if I remember correctly from my days living in Maine.

  21. I meant that, at first glance, torstatus.blutmagie.de looks like it might not be the originating source of the list.

  22. lucia,

    Thank you for all the work you do keeping this site up. It sounds like it is a PITA.

    It’s funny how the promises of the web, free information, are really the concentrated efforts of some people.

    This format is much better that discus, echo chamber, and other clunky comment services.

    Thank you, again.

  23. MrE–
    The name doesn’t sound like the originating source, but they do draw from the originating source. They happen to supply the IPs in a more convenient form that some other sites. There’s another one that gives all the IPs with various codes– but I would have to write a script to read the codes and identify the exit nodes. There is no point in blocking the ‘in’ nodes or the ‘bridge’ nodes.

    Reading Wikipedia, it looks like they get additional information on which permissions the exit nodes have activated. Exit nodes can permit different ports. But from my point of view, if it’s an exit node, I block.

  24. Lucia,
    It’s different because there is no backdoor as cpanel to access the S3 buckets, but obviously you always can hack a system, for instance you can buy Amazon…
    Concerning Latex, it can be embedded in markdown language used by almost all these new systems.
    Decoupling comments from articles is a good policy because your database can be overloaded without any impact on your articles accessibility.
    As I told you, but it is much more work and it is more expensive, you can also use an EC2 instance which can scale up and down to manage your comments with an open source comment manager.
    Eventually everything depends on your budget, the time you can spend on the system, etc…
    I just wanted to give you some information about hosting sites in the clouds and decoupling comments from articles.
    A last thing to consider is an hybrid system where the comments are statically stored after filtering the dynamic part consisting in gathering comments before filtering and posting them after acceptance.
    I think the advantages are summarized here: http://ryanhayes.net/blog/why-you-should-build-your-website-using-static-html-files/
    It’s really something growing at the moment, for instance the more popular source control management system at the moment is git: it’s essentially static. It’s not just a hype it’s an answer to the concerns about cost, efficiency and security.

  25. Tibor

    there is no backdoor as cpanel to access the S3 buckets

    There must be a front door… right? BTW, dreamhost doesn’t use cpanel.

    oncerning Latex, it can be embedded in markdown language used by almost all these new systems.

    So, does someone have a stylesheet for it? Do you know where I can get it? If it’s possible with css, I could stop using the plugin here and that would reduce load. So I’d like to know that either way.

    Decoupling comments from articles is a good policy because your database can be overloaded without any impact on your articles accessibility.

    I know it achieves that. But I still don’t see how that’s a huge benefit relative to what I currently have. Right now, when my blog goes down, cloudflare makes most the current posts available. Also, people can read all the old comments.

    and posting them after acceptance.

    I don’t want anything that involves me moderating. That is much too time consuming. Anthony does it– with several volunteers.

    I laughed when I clicked to the example blog on github. Go ahead and click:
    http://connatser.github.com/hera/

    Thanks for the information. Right now, it looks like I would need to do some research though to figure out what bits I lose. I still think it looks like I either a) lose betting or b) try to run two sites in one. That is: have static pages at one place, while I host dynamic content somewhere else. Then, I load the dynamic content from the static pages so it displays.

    That ends up to be quite abit of work (and I still need to protect the dynamic site– thought it might be more secure because there is less there.)

  26. Tibor

    I do not work for Amazon Web Services 🙂

    I didn’t think so. But if you did, I would lecture you on the huge number of spammers and cracker’s you rent out services too. Amazon EC is one of the groups that are auto-matically blocked from the site. It’s a huge nest of hackers/spammer/crackers. Huge.

  27. Tibor–
    Yes. I think it’s funny because I think it’s supposed to be an example of a successful implementaiton… but its dead. So whoever set it up gave up for some reason. Also, the blogger explaining how we should use static pages is still using dynamic pages!

    The markup link here:
    http://johnmacfarlane.net/pandoc/try

    I entered
    x2 in html. It then displayed
    x\textsuperscript{2}

    So, it translates the html markup into latex markup. (Or is it supposed to do something else?)

    If it’s just translating markup, it’s not going to give me what I want in comments Here’s the issue: Suppose Carrick (Oliver/ SteveF/ Nick Stokes/ Mosher / TroyCA etc.) wants to put an equation in comments. Each already knows the Latex markup– he uses it all the time for papers. He puts them in comments. Now the comments contain the Latex markup.

    But we want the comments to display pretty equations.

    Nuts and bolts: If I was hosting comments at Disqus or LiveFyre (or similar). How do we get that to display so people can read the equations? (For that matter, how do I get them to display in a static site? Wikipedia converts them all the .jpg’s. If I had to do that, it would likely be an ongoing nuisance when I wanted to write posts.)

    https://github.com/justinvh/Markdown-LaTeX has a link to a python file.
    This seems to explain how — if I had a file at home– I could translate all the Latex markup into a static file. Then, presumably, I can upload that to Amazon. Right?

    Of are you aware of how I could use that to let commenter (e.g. Carrick/oliver/Nick Stokes) include Latex in comments? If you know, let me know, because I’m not a computer scientists and sometimes what people who know these things think is obvious is not at all obvious to me. I need someone to say:

    “To get latex mark up to work in comments you will need to do:
    a) first thing
    b) second thing…

    And so on. Just suggesting a resource and assuming I can fill in the dots won’t always work. Because I’ll look at the resource and I won’t see how it gets me to the end point.

    Anyway: I do find the idea of static sites attractive. They can be less expensive and more secure. But it’s not a solution if the reason the resource is less expensive is that I lose functionality– like letting commenters insert equations that display, or letting me have good, tailored spam filtering, letting me avoid really annoying comment systems that interfere with good conversations (like Disqus) and so on. So, I need to see a path from start to finish that shows how I could have a fully functioning blogging system.

    BTW github is mostly static.

    Isn’t it also mostly repositories?

  28. Wow!!!! My error logs. So pure!

    I’ll upload an “apple-touch-icon-72×72.png” and an “apple-touch-icon-72×72-precomposed.png”. Not sure what to do about a sitemap. Other than that, there would only have been 1 error over night. NO timthumb. Wow!!!! This is like a dream.

    My only regret about banning TOR is not doing it sooner!!! Oh. Joy!!!

  29. Yes, you are right, it’s not WordPress with all its bells and whistles, but the learning curve is not that flat.
    Concerning Pandoc and other markdown latex aware renderers they do the best they can and when you say you want an html output they use what html offers the idea is you can insert latex formulas in markdown files and render them in whatever you want html,Opendoc XML, PDF, etc. (no miracle, just simplification) look at http://johnmacfarlane.net/pandoc/demos.html the section 17 TeX math in HTML:
    So the idea is that you take your LaTex file and produce a MathML file which can be rendered in most modern browsers.
    So you type : pandoc math.text -s –mathml -o mathMathML.html on your computer and you upload mathMathML.html (you have to install pandoc indeed, don’t scratch me).
    Now you want your guests to post beautiful comments to (show me an example on your site) but for registered users for instance you can offer a special comment box allowing the to upload their beautiful equations on your S3 bucket and you just post a link on it (I do not think it be a big deal to set up such a thing). You can generalize this idea and replace your comment system by Twitter and a smart javascript hack that insert the comments stored in files given by inks in the tweets below your articles.
    There are plenty of variants on static files, even for comments and you can leverage the existing free systems or at least reduce the dynamic part drastically.
    You ironize on Github but if you have a closer look you see that it is used as a bug tracking system which technically the same tyhing that a comment system.
    Anyway migrating a blog is a huge effort, I just give you some pointers and I could give you a hand if decide to give a try.

  30. Tibor–
    I did a search and found Carrick’s comments and those of others. But I also discovered Latex in comments is broken!!! Now I need to fix that. I know it’s fixable– I need to figure out what in the recent update broke it in comments. Grrrr….!

  31. Tibor

    You can generalize this idea and replace your comment system by Twitter and a smart javascript hack that insert the comments stored in files given by inks in the tweets below your articles.

    Replace the comments system with Twitter? Wouldn’t that limit comments to 140 characters?

    Anyway migrating a blog is a huge effort, I just give you some pointers and I could give you a hand if decide to give a try.

    I’m trying to ascertain four things:
    1) Just how huge the effort of migrating is.
    2) Whether I would like the system after expending the effort.
    3) Whether day to day blogging on that system might require more effort than just figuring out how to ban bots.
    4) Whether I get the functionality that I want and need.

    Oh… and a final:
    If static becomes “the way”, might it not be more wise to just wait? After all, if blogging using static files catches on, it’s quite likely someone will create a local tool I can use to create the blog posts, make it a static file — converting Latex as necessary (which is once ina blue moon), and upload (along with any images and links to images) in one fell swoop. Preferably, this tool would include whatever templating/ etc. I might like. Currently, it sounds like I would do all sorts of things piecemeal– create the blog post file. Get it into a template for html. Save that. Upload.

    And then I need to figure out how to run the bets….
    It seems like an awful lot of work for an uncertain outcome.

    I think I’d rather just see what happened to latex in comments! 🙂

  32. Yes no doubts, and if you are satisfied with your system as is better to wait mature static solutions, once again, I thought your concerns were mainly about security.
    Have a nice day.

  33. Tibor–
    I hope I’m not coming off as snippy. I just like information before I consider a change. I am concerned about security. But I want security with a system that has the features I want. So if more secure systems with features I want are available, I want to know about them. So… I ask about the features.

    It’s sort of like this: I have a list of functional design criteria. So, all systems have to be evaluated against all the criteria.

  34. Peux-tu me traduire “coming off as snippy” ?
    I’m working on something related to all this with a friend during my spare time. All you told me is very valuable for us, I’ll show you what we have as soon as we get something (some weeks or months I guess). You are completely right, in your approach but I think there are two levels in what you are doing here: casual comments, and collaborative work. Concerning this last aspect I believe some sort of wiki would be to be considered.
    We also would like to investigate R with Map Reduce on Amazon. We have used Mahout for a text mining project and we had the feeling it would be great to have a link between R and Mahout, if you guys have ideas on that I would be very intesrested too.

  35. Tibor–

    Peux-tu me traduire “coming off as snippy” ?

    Being irritable, snapping back just to be contradictory. Sometimes things come off that way on the web and one can’t be sure.

    I think there are two levels in what you are doing here: casual comments, and collaborative work.

    Sort of. I think it’s more that people like to bounce ideas of each other. The sort of ideas discussed sometimes require equations. (Well… anyway, we have physicists, chemists, engineers, oceanographers etc. in comments from time to time. So, everyone periodically likes to show an equation.)

    The comments nearly always need to be longer than tweets. And also the way disqus threads can be a nightmare.

    Oh– also, I at least don’t want to “lose” my comments. I know people who quite livefyre complained they lost comments when the decided to go back to wordpress comments system.

  36. In the last link I gave you, I believe there is a great idea to dig. Git is a source control management completely decentralized, there is no central repository, every body can pull or push his local repository. Github is based on this principle and using this very same principle, it should be possible to build blogs and wiki and save a version in the clouds which would be the public one. If you have time, have a look at the Wikipedia entry http://en.wikipedia.org/wiki/Git_(software) it is not bad.

  37. Sorry to have invaded this post but I think there is something which goes beyond the mere technical aspect, it’s the collaborative work. One ca perfectly imagine everybody participating to the Blackboard can have Github pages allowing develop threads on particular subjects which could be later synthesized on the Blackboard, it would be a kind of distributed and collaborative Blackboard the edition policy (in the political/ideological term) being given to what is eventually posted on the main site.

  38. Nick!
    Yes. Amazingly. HadCrut doesn’t seem to be out for Oct. yet. I hope to get graphs of Had/Giss/NOAA out when the final one comes out. HadCrut tended to be the latests back when they were posting HadCrut3 regularly. That might continue now that they are posting 4. (Or not… we’ll see!)

    Betting is going to go back to UAH because it’s more fun to bet on the ones that come out early. Roy posts quickly.

  39. Nick,
    “GISS has posted very early this month.”
    Maybe at GISS they like to get it out right away when it’s hot. 🙂
    .
    Hummm… I wounder if there might be a correlation been date of GISS post and the previous month’s anomaly?

  40. SteveF–
    I doubt there is a correlation. But who knows? Someone would have had to keep historic data on updates, update dates etc. I haven’t done that.

    My impression is that of the landbased groups, GISTemp tends to post earlier than Hadley did. I can’t set up a simple monitor to watch when NOAA posts because “change detection” watches http but won’t watch ftp:. I could probably find something– or code something myself, but I don’t think it’s worth the effort.

  41. Lucia,

    I was joking with Nick (who strikes me as something of a ‘denier’ of human frailties among climate scientist 😉 ) about the correlation of anomaly and release date. If it is ‘too low’ does GISS search extra hard for mistakes, delaying the release? If it is high then maybe GISS doesn’t look quite so hard for mistakes and releases sooner. I was not really suggesting you (or anyone else) actually look for a correlation.

  42. SteveF–

    If it is ‘too low’ does GISS search extra hard for mistakes, delaying the release?

    The might. It might depend on how much human involvement there is each month.

    I was not really suggesting you (or anyone else) actually look for a correlation.

    I didn’t really think you were suggesting. I just think it would be an interesting thing to look at if we happened to have the data! But… alas… I don’t.

  43. Lucia,
    Off to my birthday dinner (62) with my wife and youngest son (18)… it’s Hell to get old.

  44. SteveF,
    “a ‘denier’ of human frailties among climate scientist”

    Who, me? Perish the thought! I am a tireless investigator, and yes, I have the data!

    For 15 months I have been noting the GISS results, fairly promptly. A list of links here.

    So here is a scatter plot of monthly rise against day of month reported (Australian time). It is pretty scattery. It mainly shows that GISS tends to jump rather decisively one way or other. These are first reports – they get corrected and differencing the current file makes a change.

    But, in the Nate Silver spirit, I apply some stats – a regression. Here is the result – and yes, a negative slope. It looks like GISS might report rises with more alacrity than falls.

    Significant? Not really. The t-value is -1.003 – ie 1 sd. But if we keep an eye on them something may show up.

  45. NIck

    For 15 months I have been noting the GISS results, fairly promptly. A list of links

    Cool! And you kept everything in a format that let you do this!

  46. Nick Stokes,
    Thanks for posting the release date-temperature correlation. I take back my suggestion of ‘denial’ of human frailties.
    .
    Noisy? Yup. Is the slope real? Who knows. But I will apply Andy Dessler’s cloud feedback argument: in spite of data that is too noisy to draw any conclusion, I will conclude the release date data are at least consistent with GISS releasing high temperate anomalies sooner than low. Which I further note is just what is predicted by my PCSBM (parameterized climate scientist behavior model). 😉

  47. “We also would like to investigate R with Map Reduce on Amazon. We have used Mahout for a text mining project and we had the feeling it would be great to have a link between R and Mahout, if you guys have ideas on that I would be very intesrested too.”

    sounds cool. I’ll check out Mahoot and then tell you what I think about a link with R. visit my blog or get my email from lucia

    I have a few (too many ) R projects underway, but getting back into text work would be cool
    .

Comments are closed.