When inspecting hits to my image files, I noticed the google image search result for forafrican man running backwards pointed to a graph of the hockey stick.

Odd. That someone searching for “African Man Running Backwards” clicked and tried to load the image is odder still.
To visitors from the UK
Apologies to people in the UK who got banned.
Chuck explained to me some idiosyncracies of how the ISPs in the UK operate which result in many more people share IPs than would happen to anyone in the US. Empirical evidence suggests this is especially true of 1 particular host. Because I didn’t anticipate 6 people using the same IP all connecting to the Blackboard during a 15 minute window, I coded the spam filter to 6 different IPs using the same behavior to be evidence of not only “bot” behavior but “bad-bot” behavior. Banning that IP then proceeded to block lots of people because many, many, many people the are trying to connect with the same one.
Fortunately, when I banned lots of people in one fell swoop, at least one generally complains. unbanning 1 IP also simultaneously grants access to everyone all at once. That’s why some of you noticed you were banned and then saw access restored.
Now that I am aware of how IP addresses are assigned in the UK, I am detecting HTTP_X_FORWARDED_FOR to count how many different people these really are. I hope I won’t be banning people in the UK for no reason at all!
I should warn those in the UK: A few of you have been periodically banned because an honest to goodness spammer– and in one case an actual cracker-bot– shared your IP.
The same feature that results in many perfectly innocent people presenting the same IP to my blog also elevates the likelihood that your connection will be shared by a spammer. If someone who shares your IP tried to log in, perform a cross-site scripting attack or anything similar, that IP will be banned. Unfortunately,that will affect you. That means if you are a regular reader, learn how to “guess” my email address so you can email me to let me know. I’ll be happy to unbann as soon as I am aware of the problem.
I’m allowed back! Thank you!
RobB–If your in the UK, expect t be banned in about 15 minutes and then back on! Chuck emailed me… I’m trying to diagnose. It does have to do with the shared IP.
Lucia –
Could the spammer I share an IP with be one of my flatmates? If so, I have my own sleuthing to do..
Anteros– There is no knowing. But based on what Chuck told me about UK assignment of IP’s from my end, I see many more people sharing IPs.
Now that I record the HTTP_X_FORWARDED_FOR IPs next time it happens to you I’ll be able to look as see if he shares your true originating IP. Right now, there is a series. People in England share the last one in the line which is what “presents” itself to Cloudflare. Your flat mate would tend to share the first one in the HTTP_X_FORWARDED_FOR string.
Since the Brits sharing the final IP in the HTTP_X_FORWARDED_FOR string don’t share the second to last I can tell they are different people. That lets me include a sliding scale for how many images see and ua’s they can use before I decide “that’s a bot”!
Anteros,
While you may well have decided to relax your usually high standards when it comes to choosing flatmates, which of course only you could know, it is technically highly unlikely.
That said, if you have fallen amongst thieves, they say the flatmate is always the last to know.
Lucia is actually talking about the structure and some characteristics of BT’s network backbone/core infrastructure in the UK.
I don’t think she’s quite got it down to resolving the individual distinctive ‘hand’ of the keyboard users yet 🙂
Blatted again. Can I blame Anteros’ flatmate? 🙂
‘African Man Running Backwards’? This is very murky ground Lucia. Better check with Conan Doyle, John le Carre and Homeland Security.
Cui–
You and at least 6 others present as the same IP and get batted merely for looking like 1 person who seems to want to connect from his I=phone, PC, sun workstation, Mac, and Blackburry 30 or so times in 15 minutes.
Anteros does not share this IP. Anteros shares an IP with an actual honest to goodness referrer spammer.
Lucia,
I read your other topic from a few days about but figured this might be a better one to respond in.
You may also be running into poorly configured proxy servers. I don’t mean malicious ones either, just corporate or governmental servers that are supposed to filter content which are setup incorrectly.
If the proxy server is setup to pre-cache 3 links deep with no bandwidth limit on how much to grab at a time (or in total), your own site will get hammered with requests by the proxy server when someone behind the proxy innocently visits your site.
I ended up having to ban a few IPs, not because of what the person was doing, but because where they worked (the server they were accessing my site through).
As per your previous examples, a board X-ray tech or Canadian prison guard may have been wanting to read The Blackboard while at work and not realize that the proxy server they are connected to is overloading your website.
Cui –
Did you hear that? I share my IP with a PROPER SPAMMER. No Mickey Mouse too-many-connections-at-a-time for me! 🙂
Well that’s a relief I have been banned and unbanned seemingly at random for the last couple of days.
I thought it was something I said (;-)
Arfur Dent –
The thing is, we’ll never know..
Lucia could be dissembling about all the bot-behaviour and just banning us because we offend her Chicagoan sensibilities, or because, well – anything at all!
Paranoid? Moi?
Anteros – I marvel in praise and awe. In this period of austerity PROPER spammers are out of my price range, so I had to settle for a cheap fake spammer from the local poundshop. 🙂
Oh, forgot to mention, I ran into an auto-ban bot script that I modified to use for a problem I keep running into.
It looks like it is something you may not have considered before (instead of reacting to an attack, lure the bot into a trap).
http://www.lunarforums.com/lunarpages_security_center/auto_ip_ban_script_stop_rogue_scanning_and_trap_bad_spidersbots-t43858.0.html
Even if you don’t use it, it may give you some new ideas (as it did for me).
You lure them into a trap… and then what?
As long as they are hitting the site, even if they just hit the honey trap, they use site resources. I do have some honey traps to report bad bots to others.
On the other hand, the Blackboard post that gets linked was one of the more interesting threads we’ve had. In it, the AGW group got totally soaked in the process. Poor them and their jihad against Craig Loehle and their foolishness to pay any mind to anything Tamino rants about.
I seem to be back on in the last few minutes after being banned for about 3 days. In fact I was about to try to email using your old knitting blog email address ( the only one I could find!) just now to try to find out what I could possibly have done to deserve it- I haven’t even commented for weeks and never controversially, but do drop in as a reader quite often. I am relieved to find out what is was all about.
Richard J –
It’s disconcerting isn’t it?
P.S. Do you have any dodgy flatmates?
Family members?
Pets?
Maybe you could verify suspect behaviour by making people fill in one of those “prove you’re human” boxes, where they type in a series of characters to prove they are real. You then give those verified real people a cookie and as long as their machine has that cookie, they get access. I certainly wouldn’t mind having to fill in one of those boxes if I showed up as being in a suspect IP range.
Oh that’s a relief. I wondered what I’d done.
Stilgar
I have very strong circumstantial evidence that leads me to believe much of the image scraping is run by bots whose intention is to scrape images.
Skeptical–
What you are describing is a captcha. A captcha is useful for comment spam. It’s not really useful for the types of attacks and scraping I’ve been seeing. Also, because of what’s been happening it is essential that the “solution” reduces server load and it can’t involve an overwhelming amount of coding and maintainance for me. A captcha solution is not that easy, and would– I believe– increase server load.
Because the hammering had been really bad, to make the method result in a net reduction in server resources, banning is at Cloudflare. I really don’t have any other effective mechanism that also saves bandwidth. (I’m on shared hosting.)
When banned, you are shown their page. I don’t have much control over what they present.
Lucia,
If it’s only the images they’re scraping, maybe you could reduce the resolution of your images to the lowest possible that still displays reasonably well. This wont stop them but it should reduce your bandwidth use a bit.
Another option would be to make lots of useless junk images and fill your image folder with them. They might leave you alone if they get lots of junk.
Now for my conspiracy theory… maybe it’s lots of alarmists constantly viewing your pages with the intent of eating up all your bandwidth to get you off the net.
Skeptical–
Bandwidth is not the main issue.
The trap writes the ip address to the .htaccess page and serves up a 403-forbidden access page. Put a tranparent .gif at the bottom of the page and link to to the trap. Set your robots.txt file to exlude the trap.
Any bot that uses robots.txt properly will avoid the trap, the ones that ignore it, will follow the link into the trap. The trap writes the ip address to a block list in.
Every now and then you take the list of ip’s from the .htaccess file and add it to your server iptable blocklist (server’s blocklist, doesnt even let the ip address even read the website) or Cloudflare if they let you.
Stilgar–
I can’t ban by IP using .htaccess at my blog because I use cloudflare.
I’m effectively doing the same thing by banning at cloudflare not .htaccess. But my methos isn’t a honeypot because that doesn’t catch stuff. When IPs do certain things, I tell cloudflare to ban them.
The difficulty with the “trap” method is that the worst of bots don’t fall into the trap. If I put a gif on the page ever single IP hits that. I do, however, have honeytraps on this very page. A few bots fall into that and get reported to project Honeypot. But at most 1% of the things trying to crack fall into that pot.
Unfortunately, the “UK” issue would have the same problems with a honeypot as with the cloudflare issue.
How can you get banned with a banning notice which has an IP address completely different from the one which shows up on your router? Also, how can the banning notice show an IP address different from that shown by sites which will pick up your data and display it, when this last gives an address the same as the router?
I don’t understand technically how that can happen.
On the UK, are you saying that the BT network gives the identical IP address to multiple people at the same time? This is not just reassigning the dynamic address? How does that work?
Michel, the British Telecoms (BT) core/backbone in the UK is an ATM network, not an IP network. ATM networks can appear as a ‘cloud’ or black -box setup, where traffic enters or leaves the network without there being any indication of intermediate paths traversed. In a sense, a ‘self-routing’ network for the data being transmitted. (The above is a VERY broad brush)
So at the edge of the BT network, where it connects to the ‘rest of the world’ at various points, it can happen that the IP address of that edge router or switch gets reported as the IP address of a whole bunch of users on the BT network. i.e. the edge router appears as a proxy server.
In a similar way, all the machines on many home broadband setups will appear to the world as if they all have the dynamic IP address of the broadband modem/router, because of NAT being implemented. Or the problems that arise in Voice over IP networks because a phone or the gateway server is behind such a NAT config.
Chuckles –
Thanks for your explanation.
I too was a bit baffled when Lucia was trying to work out why I was getting banned about 7 times a day. I gave Lucia my IP address, and of course it was completely different from the one that cloudflare said it was banning. I was truly flummoxed.
Now I’m just further aware of my technical ignorance 🙂
Chuckles, thanks for that – I was trying to work out why I was being randomly banned, and had determined that the IP Cloudfare was reporting as mine was somewhere in the path from me to the Blackboard when the problem went away. So thanks for the final jigsaw piece!