Joanne Nova down again?

Joanne Nova is down again!

I guess we’ll hear more when she’s back up. But does anyone know if she’s just going to move over to WordPress.com hosting? Or will she continue self hosting? If she’s self-hosting (as I am) my advice:

  1. Cloudflare.
  2. ZBblock.

Will some real people get blocked? Yep. But the fact is, I see lots and lots and lots of real hack attempts. Lots. I don’t think it’s personal in my case. Still, a hack is nuisance no matter what the motive.

I would share some of my draconian custom signatures. SkS should use these too. (It might help them keep their forum private provided John Cook doesn’t just fiddle with the setting to make it public from time to time.)

If anyone knows what happened at JoNova’s let me know. I’m curious.

I really hate all the scrapebots, hackbots, spambots — and for that matter stupid seo bots hammering the site.

28 thoughts on “Joanne Nova down again?”

  1. Maybe Cook is a double agent ;).

    Weak ENSO forecast….another coffin in the nail of the lukewarmers!

  2. Bots are just looking for data. Spam and hack bots are bad but seo bots should be helping people find what they want, in theory.

  3. MrE–

    Search engine bots are mostly ok. (Though not baidu.)

    But most SEO bots are evil – bandwidth – sucking – robots.txt – disobeying vermin.

  4. Weak ENSO forecast….another coffin in the nail of the lukewarmers!

    Did they tone down the forecast relative to earlier in Sept? (Gotta go look!)

  5. The JoNova site is back up… but as a temporary site.

    It’s only got one blog, which starts off by saying…

    Sorry about that unexpected break. More information is coming soon, but for the moment, at least we’ve reclaimed the domain name (partly) and right now, hopefully the proper site is being uploaded to a safe point and in the next day or two all the 900 odd posts and 120,000 comments will be restored.

  6. Does anyone know if there’s a reason so many commenters at her site are claiming this was done by global warming advocates? With how often it sounds like her site has gone down, it seems likely to me the security is so lax any number of people may have done it for fun.

  7. I took your advice and have protected my wordpress sites with a number of tools including zblock. But I wonder why you use Cloudflare instead of, say, incapsula. Zaphod seems to feel like Cloudflare let’s too many spammers into their system.

  8. @Brandon Shollenberger (Comment #103948)

    September 24th, 2012 at 9:20 pm

    Does anyone know if there’s a reason so many commenters at her site are claiming this was done by global warming advocates? With how often it sounds like her site has gone down, it seems likely to me the security is so lax any number of people may have done it for fun.

    Just more conspiracy theorists.

  9. coyote–
    First: If you use ZBBlock… is your killed_log.txt file someplace accessible on the web. If it is… I want to read it to fill the database I use to create these pages:
    http://bannasties.com/BanNastiesScripts/ShowDetailsIP.php?IP=94.23.54.31

    To answer your question about cloudflare: I use cloudflare because a) I learned about it first, b) it’s free, c) it has an api that lets me escalate my ZBblock bans to cloudflare and d) it works fine for me. I don’t know if incapsulata has an API. Since I’m satisfied with cloudflare, I’m using it.

    If I understand correctly, his gripe with Cloudflare is irrelevant to my decision to use them.

    I think Zaphod’s gripe with Cloudflare is that Cloudflare will not reveal their customers private contact and underlying server IP information unless the person making the request for private information are presents a warrant. (http://www.zdnet.com/cloudflare-how-we-got-caught-in-lulzsec-cia-crossfire-3040095169/ )

    What that means is that if I were a spammer/hacker/ all around bad buy or something nasty, and you in your capacity as “white hat-superman- defender of the world” hated me, wanted to identify me, and wanted to perform a counter attack to take me down, I could organize my use of Cloudflare to make it very difficult for you to find out the true underlying IP for my server. This would make it difficult for you acting in your capacity as “posse” to launch a counter attack against me. It also means police and other authorities would need a warrant to get information.

    FWIW: It takes some advance planning on the spammer/hackers part to get effective protection against counter attacks using cloudflare. But it could be done and evidently has. Note in the story http://www.zdnet.com/cloudflare-how-we-got-caught-in-lulzsec-cia-crossfire-3040095169/)

    that “A white hat hacker called The Jester” was trying to DDOS black hat Luzac (who had been DDOSing the CIA etc.) In addition to using Cloudflare, Luzac changed hosting companies 7 times in at least 23 days. That means Luzac had 7 different server IPs in 23 days and the only way that Cloudflare “helped” was by not telling “The Jester” their IP. This prevented “The Jester” from launching a DDOS attack against Luzac! )

    For what its worth: I’m also not sure that a service like incapsulata wouldn’t also refuse to reveal private customer data to a self-appointed posse either. I understand Zaphod’s gripe. He would have liked the white hat hacker “The Jester” to be able to have information that permitted him to DDOS the black hat who had been DDOSing the CIA. But I’m not entirely sure I disagree with Cloudflare’s policy of not snitching on customers unless provided a warrant.

    In the end: If Jo Nova wants effective protection against DDOsing, it seems Cloudflare really works for that. They have sustained both blackhat and white hat hackers all trying to DDOS each other through their system. So, not withstanding Zaphod’s gripe that Cloudflare would protect me from DDOS attackes even if I were evil that fact is, it will also protect me from DDOS attacks launched by those who are evil.

  10. bugs–
    SkS people seem to insist their forum was hacked. The evidence suggest that Cook fiddled with the settings and placed all the private stuff online.

    Having looked at the files, my theory is:
    1) Cook screwed up the settings. EVERYTHING was public.
    2) During discussion, someone left a link to a blog or site. (There are plenty in the discussion.)

    Then one of the following happened:

    1) A SkS forum participant clicked the outbound link and left an incoming referrer in someone’s stats package. The blogger read his referrers in a stats package, found the forum and was amazed. Either that blogger or someone s/he talked to decided to copy the whole forum. This is not hacking.

    2) Some curious person has set a bot on SkS for any number of reasons. (Link monitoring, seo, blay, blah) This bot could be mj-12, 80-legs or similar, or a custom bot. That bot of some sort found a link to the forum somewhere. Bots love, love, love forums and blogs. Love. It crawled the forum. Whoever set that bot on the site noticed all the crawlable links visited and copied. This is not hacking.

    3) A crawling blog ‘clicked’ a link in the private forum and left an incoming referrer at a blog. That blogger followed the link– just as in (1), thought “wow” and copied the forum.

    While all this was happening, various bots that swarm blogs and forums were trying to hack in– just as bots that swarm my site are constantly trying to hack in. After the forum became public, Cook or others look at the logs. They see the horrifying number of hack attempts and conclude that the forum was hacked. Oddly enough the forum might have been hacked and also leaked. These hack bots are generally just trying to get in to steal emails, leave spam or turn your server into a Zombie drone. One might have succeeded– and meanwhile someone else might have copied the forum. It is not an either/or thing.

    On the accusation that someone like Cook or Joanne might be jumping to some sort of conspiracy theory: having watched my logs for nearly a year now, I can very well see why someone who looked at logs after a site crashed or data were compromised would suspect a hack. They might suspect concerted action.

    At this point I see a very small number of clearly identifiable attacks. But I used to see lots and lots of them. And I’m not talking about the debatable ones like people asking for “crossdomain.xml” — a vulnerable resource– for no good reason. I’m not talking about comment spam bots. I’m not talking about the even more ‘interesting’ “referrer spam” (IT guys, you know what I mean…)

    I’m talking about cross-site scripting attacks, RFI attacks, obvious penetration testing (sometimes with a referrer back to a free pen-testing service!), rapid requests ‘hunting’ for vulnerable plugins (and Brandon checked some of these. They were trying to upload not-at-all nice stuff.) Hackers trying to log in using dictionary attacks on the login.php file? Still an hourly event.

    On top of this, there are the ubiquitous SEO bots. Some will defend seo bots as innocent and just looking for data. But those people aren’t seeing my referrer logs watching things claiming to be “Majestic” (i.e. mj-12) or “80-legs” trying to load multiple pages a second sometimes sending suspicious patters and so on. UA’s can be spoofed– so it may be the “real” mj-12 or 80-legs are just fine. But search bots who don’t tell me which IPs they operate on make it easy for hackers to spoof– and so those UA’s aren’t permitted here.

    Then… there are the twitter bots etc. All are often programmed stupidly and are a ridiculous drain. They also make it very difficult to “see” who is hitting. Altogehther, it is very easy to imagine that a hack that succeeded was targetted. It may not have been– but it’s easy to think it.

    My view is: Unless I can trace who hacked, I will generally assume a successful hack was just one of the many script-kiddies who are trying to make a living by turning my machine into a Zombie drone so they can make money doing… whatever it is they do.

  11. Isn’t there a way also to make your site seo bot and scrape friendly? That is, so that they can efficiently scan your site – get in and get out fast so that they cause fewer problems?

  12. MrE–

    Encouraging bots to go fast is not a “solution”. Their attempting to go fast is the behavior that causes the site to crash. Bots scraping fast causes the site to crash.

    Think about it this way:

    Entity A (e.g. SEO service) makes money by selling information they harvest from entity B (that would be me) to entity C. C has paid A for information. Does A communicate with B? No. Does A make money by being nice to B? No. As far as A is concerned, they will just trample all over B’s property in search of whatever C pays them to get and they sell that to C.

    Suppose in this metaphor you imagine all A is doing is taking photos to sell to C. So, in principle, B’s crops aren’t harmed while A comes through.

    But in practice, to make a lot of money, what A does is find themselves a tank outfitted with multiple cameras looking forward. As they take photos to sell to C, the tank totally tramples the fields leaving B with a trampled mess.

    Letting them tanks go through fast so that A can take pictures quickly and easily does not help B,the person, who owns the site. It helps A make more money — but I want to protect my site. Not help A make more money.

    Mind you, if an SEO firm approached me and offered to pay me to let them circulate, I would consider it. But lots of these services operates as if somehow, they get to do whatever they want and as far as I can tell, they take absolutely no pains to not harm sites. And lots of site operators observe the same things. AND we observe this from bots who “insist” their methods are totally nice and polite. Based on complaints at ZBblock’s forum, some of those bot services think that they should be the sole arbiter of what sorts of behaviors they are permitted on my server.

  13. As a government employee – this is my job. I shut down the nutter conspiracy websites that think global government controls everything.

  14. Thanks for keeping on top of this, you are a magnificent asset to the climate skeptic community with your side obsession with hacking and bots etc. I’d wager it’s similar to her earlier attack, easy to mount the attack and easily foiled but doing damage to the little guys for no good reason. It’s not like dissenters aren’t given a space there, hell matt and jon are a great asset, warmists who are happy to argue their point and are allowed to do so profusely. As one of the few places where dynamic conversations happen it is a great loss, especially for us aussies. Thanks for keeping an eye on it.

  15. Lucia, I understand your point of view and used to feel like you but I look at it a different way. I look at it that you(B in your explanation) want an audience (C). You (B) have data which is your website. The bots (A) are a way of getting that data to you audience(C). If they aggregate data so that people that are only interested in one aspect of your site help you connect with an audience then they are helping you, in theory. The problem is if they are being inefficient or you do not have enough resources such as bandwidth or cpu.

    I used think as you did but after reading some article on SEO and skimming some books about SEO a few months back this was their explanation. The bots are just data processors that can help websites. The books were both O’reilly one was something like SEO Warrior and the other was something like Webbots, Spiders, and Scrapers. It also claimed that if bot finds the data is laid out in the form it expects that it doesn’t keep searching the website and consumer resources.

    I am not speaking from experience so admit in practice this may be completely off base.

  16. MrE, if companies want to do that, they should do so in a reasonable manner. Most of the time, they don’t. It is not appropriate to have a “business relationship” which involves one party harming the other for nothing but convenience.

    It’s incredibly easy to make a well-behaved bot. I did exactly that not too long ago, with no financial benefit. I could probably modify these company’s bots to make them well-behaved in a couple days. So what’s their excuse?

  17. Brandon, The excuse is that it’s just data. If you connect your data to the internet then convenience is just as valid as financial benefit.

  18. A concerted DDoS is difficult to fend off with ZBblock. And you get lots of false positives.

    If you’re not using the services of a meticulously managed cloud space, bots/crawlers are IMNSHO more easily “managed” by leaving tarpits and mouse-traps.

    The tarpits will be files scanned by bots but which aren’t used by the proper content of the web site. When a bot steps in the tarpit, they are delivered content: Very. Slowly. Like at about dialup modem speed. Something that requires cooperation of at least the HTTP server. The content you feed them can be gigabytes of random data, if they hang around long enough. There is the opportunity to record their IP address and to block it.

    Mouse traps consist of web content, typically embedded images from a real web page, which no human user would load. But a crawler/scraper will. Because crawlers follow all links. Again, a good opportunity to record that IP address and to block it.

    I note that you have a mouse trap on your page; but it’s attractively clickable for humans.

  19. MrE, that makes no sense. The excuse for not using a well-behaved bot is “it’s just data”? How does “it’s just data” justify ignoring the guidelines the website sets up for use? How does “it’s just data” justify inundating a website with so many requests other people cannot access it?

    Nobody here is saying bots are inherently bad. The problem is a lot of bots behave badly.

  20. Bernd Felsche
    I’ve never heard of a mouse trap. Is the cat picture the mousetrap? That’s there to catch ‘Picscout’ not bots in general. It has to load for humans.

  21. MrE

    The bots (A) are a way of getting that data to you audience(C).

    I don’t know who you think C is, but consider “C” my audience. There are bots and there are bots. If I think a particular bot is getting data for someone who is actually my audience and the bot is programmed properly I let that bot get the data. BTW: Properly programmed means “so as not to crash my site.”

    then they are helping you

    I think you are misidentifying what various bots do. Some do help. Some do not. For example: Picscouts exists to a) find copyright violations and b) report them to someone other than me. While this might be fine in some sense nothing bout this help me or my audience. SEO bots often exist to help others with their link building. Once again: might be a perfectly valid things to do. But none of those visits help me or my audience. Not even “in theory”.

    With respect to my interests, there is absolutely no reason not to block these.

    As both picscout and seo bots are often programmed in ways that crash site, and they don’t help me even in theory, there is no reason not to go medieval on their asses. They are blocked. Period. Moreover, I think everyone who needs to economize how much they spend $$ running a server should totally and utterly block these things.

    skimming some books about SEO a few months back this was their explanation

    If you read PR about seo bots written by SEO firms, they will tell you that SEO bots are nice, polite, help your site, yada, yada. That doesn’t mean their claims are generally true. It doesn’t even mean the person who wrote the bot and thinks his bot is nice and polite is correct.

    Some bots are nice and polite. But — if you are a blogger many– in fact most– seo bots are scum sucking, cpu hogging memory sucking vermin.

    I mean, look: If I ran a store I might like the “shopping” bots that look at products, keep track of prices and use those to populate their services to surf my site. That might be a service, because I would then want my sites cameras, frying pans or tschotches to be listed. But I know I don’t sell anything — so I don’t want those bots. And those bots are programmed so stupidly that things with “shop-whatever” have appeared tried to race through etc and crashed the site. There is no reason why I should even try to organize my site to help them find what they “want”. And moreover, I have no reason to believe there is any way I can organize the site that will make those stupid things “behave” in a not site crashing way.

    I ban them. I advise all bloggers to ban them.

    It also claimed that if bot finds the data is laid out in the form it expects that it doesn’t keep searching the website and consumer resources.

    This claim is flat out B. S. There are zillions of bots. They are all looking for different things. Each “expects” to find different types of things. And some will crash a blog unless you ban them. ZBblock has certain bot services that are banned by default. OReilly can say whatever it wants, but no blogger who pays for their own service should let Majestic or 80 legs crawl.

    Generally speaking, the default position toward an SEO/reputation/ etc. bot should be: Ban it until it gives a valid explanation of what it does and why that specifically helps the site it is trawling through. And then only let it crawl if it is “well behaved”– as in, doesn’t draw too much cpu or memory.

  22. MrE–
    Here’s a discussion of what shopping bots are good for:

    http://www.pcworld.com/article/34799/article.html

    I have seen instances of (stooopidly programmed) “shopping” bots scouring my site. If you have a “theory” explaining how these bot visits would ‘help’ me communicate with my audience, I’d like to hear it. As far as I’m concerned, I’m doing me, my audience and the lazy bot owners who don’t want to spend time coding their bot to not-be-a – stooopid– sitecrashing trespasser a favor by banning their agents.

  23. The shopping bots would help you if you decided to start selling something. The quataloo currency might be throw them for a loop 😉

    How about tools like Fetch as Google? its often is stated that to you help prevent hackers by improving it’s crawlability.

    http://support.google.com/webmasters/bin/answer.py?hl=en&answer=158587

    http://howto.websitespot.com/fetch-as-google-makes-your-life-easier/

    4. Fight the hackers.

    Got someone trying to hack your site for whatever reason? Use Fetch as Google to make sure your page is being seen the right way. Many hackers will sneak inside your coding to insert their own spamming content (like links to porn sites). There is absolutely no way to find this stuff on your own. In fact, the only way to fight the hackers at this game is to let the search engines in.

    Fetch as Google will let you know immediately if there’s been a problem. These issues could go on indefinitely, completely ruining your web presence and searchability (many people have blocks on their accounts preventing them from accessing any inappropriate sites). With one simple check, remove that threat entirely.

  24. MrE–
    “If”. But I don’t sell anything. So I ban them.

    Fetch as google is not a ‘bot’. It is a google service; I use it. Googlebot is the bot. I let googlebot crawl. Google is very good about a) not crawling too fast b) not crawling stooopid locations c) obeying robots.txt and d) giving website owners ways to verify whether something that claims it is google is google. So, it does not get banned. On top of all this, Google does do me and my audience good. So it is not banned.

    Bear in mind: Google is not an seo or reputation bot. It’s a search engine. Search engine bots are mostly good. SEO bots? Not so much.

    I use ‘fetch as google’ frequently– at webmaster tools.

  25. Lucia,

    A mouse trap is a honeypot that’s tucked away, inconspicously, in a corner of an ordinary page, ready to trap rodents that step on/in it.

    Pretty much like a honeypot, but most honeypots are conspicous for sentient humans and set up to look like something sweet for a bot to eat.

  26. Bernd–
    Well…. I do have some hidden links that I use to stop some of the stoooopider scrapers fast. They look like links to archives. Anything that tries to load those those gets banned– generally for 7 days. It stops a lot of obviously bad stuff and so far I have not seen those hidden links catch a real-human. (Unfortunately, bing-microsoft bot is stupid enough to load those. Google is not. Generally, ms’s bot is pretty stupid and rather annoying. You’d think their programmers would try to give their bot some brains… but it’s pretty borderline.)

    The cat in the bottom of the page is actually for picscout. I think it visits two ways. One way which it uses for sure is a human who is looking for images to use for some reason visits using picscouts image-add on to their browser. When the browser add on sees an images loaded, it sends a message to picscout, and then the picscout bot visits. The picscout bot visits with no user agent and no referrer and loads the image but nothing else. When that happens, I ban it. That image link is to a script– that’s what I use to immediately detect that it’s a no-user agent/no referrer combination loading an image.

    But I think the bot sometimes also visits later. Once again with no user agent or referrer loading an image– I ban it.

    Even though people can spoof ua’s, I have never yet seen a person visit with a blank ua. Those are always machines. Mind you– if I were a NASA data page, it could be R fetching data– but I’m not NASA providing data pages. Or in principle, it could be a person who wrote their own browser or agent to do “something”. In the latter case, I’d tell them: list a user agent. Preferably one that lets the person looking at server logs know it’s you.

    Above, brandon mentioned he wrote a bot. While he was doing it, he tested it on my site. I told him to give it a name– so when I saw it, I knew it was brandon’s bot! He was able to make it a nice polite, robots.txt reading, obeying bot with a decent crawl rate rather quickly– I think it was a matter of hours. And I could tell him if any of his attempts triggered ZBblock because I could read his agent. I didn’t have to try to memorize his IP range!

Comments are closed.