Newsblur: A modest proposal

August 13, 2012 lucia 81 Comments

Many readers here have discussed “C” word. That’s right: Copying and Copyright. We’ve generally discussed it in terms of the IPCC’s claims about ownership of work done by government employees or claims that people cannot quote from IPCC drafts and so forth. Today, I’m presented with different copying dillemma

I’m trying to decide just what to think and do about Newsblur. First a little background:

I became aware of Newsblur when a reader was auto-banned from my blog because he had visited this page: http://newsblur.com/reader/page/1100897. While trying to figure out why he was banned, I visited the newsblur.com page and was taken aback by when I found what appeared to be full copy of my front blog page at http://rankexploits.com/musings. By full I mean: full. It looks exactly like the front page of my blog: all images, comments etc. A quick glance at the html shows it loads my .css files, .js files, images etc.

I hunted around and discovered that “newsblur” claims to be a “feed reader” of some sort and it appears they have managed to get people to pay for their service. (Their front page currently indicates they have 348 “premium” customers”.) I located the email of their Clay Samuels of Newsblur and expressed some displeasure at discovering what appeared to be a wholesale copy of my blog page hosted at their web site.

During an email exchange I learned Newsblur intends customers to view http://newsblur.com/reader/page/1100897 by loading http://www.newsblur.com/site/1100897/ where the copy of my blog is displayed in what appears to be a frame. Seeing that my reaction was: “It’s worse that I thought!” Later that evening I set out to write a framebuster to get my content out of their frames. (I appear to have succeeded.)

I could say more about newsblur. But what I would really like to do is to open comments to readers to suggest a policy for Newsblur or any other party who decides to copy my content and display it. My inclination is to do all of the following:

to permit them to show my feed. After all, I permit other feed readers to display that– and it’s fairly accepted that bloggers create feeds with precisely the content they wish to feed. If they can make a better feed reader, more power too them!
permit newblur.com to copy and display what they call “original content” under the conditions that a) they enter licensing agreement with the content provider with the fee to be negotiated. ( I would accept a rather modest fee of $0.01 each time a visitor loads my content at Newsblur instead of here plus an additional $0.10 each time Newsblur’s spider visits hits pages to take their “snapshot”. Co-authors should, of course, get a cut of this any licensing fees the Blackboard takes in.) and b) Newsblur switches to an “opt-in” scheme whereby publishers content is only shown after they and Newsblur have negotiated a mutually agreeable arrangement.

Note that I am happy to grant some entities access without requiring a formal license or fee provided I think whatever copying they do ought to fall under fair use and/or it benefits me. For example: I have no objection to google’s cache which is used differently from Newsblur. Also, I recognize that some content providers may refuse to grant licenses as noted by Fred Wilson. I think equal blame falls on those who develop business models that involve using third party content without giving a thought to how they might pay the content creators.

Obviously, for now, I added Javascript to my pages to interfere with NewsBlurs display of my blog content. The javascript is not Newsblur specific and affects anyone who frames or copies my content. One motive for writing this is thatit is possible for third parties to conduct cross-site scripting attacks by framing. For that reason, I prefer to make my default policy to make it difficult for any and all third parties to frame or copy my content wholescale. I will write exceptions for third parties parties whose framing or copying I deem the purpose for framing “worthy”. (It appears I need to write an exception for Google translate whose purpose I do deem worth. I need information to do that properly.)

But I would like readers to discuss the general issue of others copying my content, framing it and so forth. I know many minds can come up with more creating solutions than I can by myself; if we can find solutions that could be win-win for a company like newsblur and the bloggers whose content they wish to display that would be great.

For the purpose of discussion, you might wish to see how other blogs or sites display at Newsblur. When looking, I suggest you include glancing at Cloudflares frame enclosing hacker news as exploring that will show how Hacker News interacts with Newsblur. For fun, I suggest you load find the orange band, place your mouse over the back “hacker news” text and click. 🙂

========
Tom fuller suggested I provide code to bust the newsblur frame. This code appears to work. I’m not going to claim it’s efficient. The user must modify it to

Replace my blog url with theirs in this bit:
var correctLocation ="http://rankexploits.com/musings/"

This appears in two places because I suck at javascript.
Replace my domain name with theirs in this bit:
var correctWord2="rankexploits.com"
They could also replace text in the message. For example, you might not want to tell people that you are “Lucia”. You also might not want to suggest whoever is framing your content is an asshat, but I consider it a nice touch.

var topWindow = String(top.location)
var topWord=topWindow.split(“/”)

var selfWindow = String(self.location)
var selfWord=topWindow.split(“/”)

var correctLocation =”http://rankexploits.com/musings/”
var correctWord2=”rankexploits.com”

if( ( (topWord[2] != correctWord2) || (selfWord[2] != correctWord2) ) && (selfWord[2] != ‘translate.googleusercontent.com’ ) ) {
document.write(“Hi there! This message is from Lucia.What you are reading is not a feed. If it was a feed, this message would not display. I don’t know precisely where you are reading this content, this script is telling me the self.window is “+ selfWindow +” and the top.window is “+ topWindow + ” but my guess is either<ol><li>an ass-hat third party has copied the html of my blog page, saved it on his server and is displaying it to readers at the domain name of his choice, </li><li>an ass-hat third party is displaying my content by framing it and surrounding it with stuff he wants to promote or </li><<li>someone is doing something I can’t even guess at.</ol><I consider both the first two behaviors to be unacceptable use of my content. I have decided to insert a script in my my content (but not my feed). The script recognizes that my non-feed content is being displayed at a site where I have not authorized display and forwards the reader back to my site. If you believe you are viewing this at a site that is not a ‘asshat’, please copy this message and send to me. I’ll review that and code an exception. I will really need to know the self.window and top.window information. You should be forwarded shortly. If you are not forwarded, please click <a href='” + correctLocation + “‘>” + correctLocation + “</a> to read my blog at my site. Lucia” )
setTimeout(“redirect_after_pause()”,8000)
}

function redirect_after_pause() {
var correctLocation =”http://rankexploits.com/musings/”
top.location=correctLocation
}

</SCRIPT>

81 thoughts on “Newsblur: A modest proposal”

Sean Houlihane says:

August 13, 2012 at 10:44 am

I’m surprised you seem to regard this entity as something you can reason with. I tend to assume that sites like this which are attempting to monetise other people’s successful work are parasites like any other. I’ve observed sites copying my own content in what I assume is an attempt to convince google that their own site is legitimate. I even found a bot-translated copy of my site once!

You can appeal to their readers – one site a long time ago hotlinked one of my images so I replaced the image… or block them, or ignore them. Blocking won’t get round the sites that read and re-write your content. Copyright obviously protects you, in theory…

I would suggest a google search takedown request might be one route which has some marginal value.
lucia says:

August 13, 2012 at 11:00 am

Sean–
I take your view to some extent. But newsblur is not a ordinary splog or scraper.

At the same time I am under the impression that Newsblur wishes to be legitimate. Moreover, I am under the impression they obtained funding. I see an application for funding from Knight http://newschallenge1.tumblr.com/post/19397311604/newsblur At his blog he writes

“Iâ€™m pleased as punch to announce an investment in NewsBlur by Y Combinator, the investment firm. Over the past two months, weâ€™ve been humbled by the roster of experienced partners giving us candid advice. Itâ€™s their tough love that is the catalyst for the next few months of transitioning NewsBlur from side project to world-class news reader. Expect NewsBlur to become simpler and more refined.”

If this thing is going to take off, and their business model involves using bloggers content, I think that would be fine provided the content creators get a cut. For this reason, I think it is better to bring up the notion that newsblur should be paying for content rather than merely inserting goatsee pictures on their pages (and which is technically feasible. My framebuster inserts text at their pages, I could insert images.)

I also think it would be very, very bad if the bloggers whose content is being used to “fill” the system were unaware of what was happening until after Newsblur was do large that not providing your content to them amounted to cutting off your nose to spite your face. That sort of thing does happen in the web– and it would be a shame if content creators were leaft out in the cold while someone made a lot of money “wrapping” their content.

So, I’d rather discuss this than otherwise.
With respect to discussing this, my main regrett is that my readers do not represent the demographic of people who can make the largest difference as entities like newsblur are developed and funded. This is not a blog attracting Venture Capitalists, Copyright gurus, and so on. It’s a climate blog.

Nevertheless, if people could provide ideas that’s great. ( I’ll admit, right now, I have a pretty low opinion of newsblur, and I don’t mind people pointing out that their business model is one involving quite a bit of ass-hattery.)
thomaswfuller2 says:

August 13, 2012 at 11:33 am

You could also use your script as a club, offering to make it available to other content providers. Collective action works! But then, I’m an old commie… and of course you should charge for the script… I’m an old commie, not a dumb one.
lucia says:

August 13, 2012 at 11:35 am

Interesting. Found here:
http://www.ofbrooklyn.com/2010/11/3/hacker-news-effect-post-mortem/

Framebusting would be used by content owners to prevent 3rd parties (aka ‘non-owners’) from displaying content in a frame. I don’t know for sure what the term “Frame Buster Buster” means. But it sounds suspiciously like a method that would prevent the content owner from busting a frame. Smells not so good to me.

That said: My frame buster seems to work. It’s easy enough to use.
lucia says:

August 13, 2012 at 11:38 am

The script is trivial. But it needs exceptions for google. Alternatively, it needs to simply be targetted for newsblur.

If people are interested, I’ll write it as a wordpress plugin so bloggers can use it.
Scott Finegan says:

August 13, 2012 at 11:51 am

They should have asked, which is reason to deny.
Your commenters have a stake in this, small as it is. 😉
You have no idea what they might put in an adjacent frame at some later date, denigrating you or your blog.

Other than the obvious theft of content with no investment, what other benefits might they receive that aren’t apparent?
lucia says:

August 13, 2012 at 12:33 pm

Scott-

You have no idea what they might put in an adjacent frame at some later date, denigrating you or your blog.

I’m not especially worried about that. First, people get to have whatever opinions they have and express them. Second, I think he really wants to provide something he considers a better feed. But… well… he ought to be thinking of paying the people whose content he is charging to slap a bow on.

I should also note that the developer seems to have provided an awful lot of “free stuff” to the world ( most of which would only be of value to other developers. Others would really not give a crap about the stuff. ) This sort of behavior in the developer world sometimes makes them think that if their little enclave gives away lots of stuff for free then… maybe lots of stuff just is free. But this is not necessarily so.
My impression is does want to provide to do provide numerous services and their goal is to do something they call “freemium”. Lesser level of service is free; greater costs money.

Newsblur also seem to also be creating some sort of social network where people ‘share’ stuff (in what way I don’t know). I haven’t joined, so I have no idea what the word “share” means in context of the thing newsblur calls a “blur blog”. If it means framing my content in more and more and more windows… uhmmm.

You know, there are just soooooo many bad things about framing.

Here you can read a quote:
http://blogs.wsj.com/digits/2009/05/01/controversial-web-framing-makes-a-comeback/

Web usability expert Jakob Nielsen argues that â€œframes break the fundamental user model of the web page.â€ â€œAll of a sudden, you cannot bookmark the current page and return to it (the bookmark points to another version of the frameset), URLs stop working, and printouts become difficult. Even worse, the predictability of user actions goes out the door: who knows what information will appear where when you click on a link?â€

Facebookâ€™s decision to start framing has prompted mixed reviews. When a user clicks to follow the link for an article or Web site that a friend posts to be shared on Facebook, that outside content is still framed within Facebookâ€™s site, and the question of whether the publishers of those stories to lose credit for traffic directed there via Facebook comes into play.

Mahalo founder Jason Calacanis posed the question on Mahalo Answers: â€œWhat do you think of Facebook framing (aka hijacking) other websites when you leave their domain?â€ The top-rated answer by user blikkie complained that the Facebook frame stunted browsing onward from that link, preserving the Facebook URL in the address bar. â€œStealing the address bar is a bad enough offense on itâ€™s own, but breaking browsing habits like this is showing contempt for your users in my book,â€ he wrote.

Mr. Calacanis responded, â€œâ€I think Iâ€™m not cool with framing of websites in general. As a publisher itâ€™s not a good trend. However, i do see the value in sometimes doing this.

Content owners generally loose a lot when content is framed. And it’s many content owners. For example, if you do read “Hacker News” at newsblur and click a link to one of their articles, the article will open in newblurs frame. The url appearing in your browser display will be newsblur. If you book mark, you will book mark newsblur. In contrast, if you visited Hacker News directly and clicked the link, the link would open the actual article. In this case, actual articles might belong to the Washington Post, http://www.pagerduty.com/jobs/engineering/devops-engineer and so on. You can easily book mark them.

To make this a bit more climate blog specific: Suppose I link to Climate Audit. A person reading my article framed at Newsblur clicks that link. Climate Audit is now framed at Newsblur. To book mark climate audit or even learn it’s url, the user must “do something” to break Climate Audit out of the frame. Because if they don’t “do something” the default when setting a book mark is to book mark newsblur.

This is not good for me. It’s not good for climate audit. It’s not good for my readers. At least that’s my opinion.
Brandon Shollenberger says:

August 13, 2012 at 1:30 pm

The most disturbing aspect of this to me is every review I’ve read for NewsBlur praises it’s “innovation” of allowing the reader to see the pages as they originally were. In other words, they’re praising theft.
lucia says:

August 13, 2012 at 1:48 pm

Brandon–
I’ve read similar applause, from people who seem to be under the impression that showing the full pages is some sort of technological marvel. Technically, is it not at all difficult to do show full pages. You either frame them or copy them…. The former is really annoying, the latter is… well.. not supposed to be done without the copyright holder’s permission. (Or under fair use and it’s difficult to see what is going on here as fair use.)

It’s also pretty easy to detect that framing is going on. Framing has existed since… oh… pre2000. On a mac I point my mouse and do “Control-press”. Then I can see options, including “display frame in window”. If the “Hacker news” page opens up in http://newsblur.com/reader/page/6 that doesn’t look hacker, does it? Hacker news is here:
http://news.ycombinator.com/news

I’m also pretty astonished by this:

“Ycombinator” seems to have invested in newsblur.com. (See comment above.) Meanwhile if you visit http://newsblur.com/site/6/hacker-news and click “Hacker news” in black letter on the orange band, the “frame” will display an empty white area. (At least it does for me.)

This suggests that Hacker news– on Ycombinator.com– is using a framebust to prevent newsblur.com (and likely everyone) from loading their frame! You can surf to some of the “Stories” linked at Hacker news. That’s because the stories on Hacker’s news pages are really just links to other people’s content. Some of those people will use framebusters, some not.

But, it’s pretty funny that the as far as reading teh Hacker News feed goes, Newsblur pretty well sucks.
Skeptikal says:

August 13, 2012 at 2:03 pm

lucia (Comment #101392),

At the same time I am under the impression that Newsblur wishes to be legitimate. Moreover, I am under the impression they obtained funding.

I might be a bit cynical, but I see this as an enterprise set up with the sole purpose of obtaining funding. If they really wanted to be legitimate, they wouldn’t be trying to build their business on stolen content.
Brandon Shollenberger says:

August 13, 2012 at 2:04 pm

For what it’s worth, I’d put my money on that being a server thing, before it being a frame buster thing. Something like NewsBlur not working properly could happen for any number of reasons that had nothing to do with intentionally blocking that sort of behavior.

But it is funny no matter what the reason.
Brandon Shollenberger says:

August 13, 2012 at 2:06 pm

Skeptical, wanting to be legitimate doesn’t mean you know what being legitimate requires. To some people, what that site is doing is perfectly fine, so the business practice lucia is bothered by seems good to them.

There’s no reason to attribute to malice what can be attributed to any number of other things.
lucia says:

August 13, 2012 at 2:23 pm

Brandon–
For lack of any better place to put them quickly, I also put this before the first line in my template

<?php
header(‘X-Frame-Options: DENY’);
?>

That busts the frame when they try to frame a file I host. Of course it doesn’t work at all if they frame a file hosted by someone else. (But unless someone can explain to me why it’s fair use, that’s a copyright violation. )

But now, let’s have some more fun. Here’s a great feed!!

http://www.newsblur.com/site/1100901/

Here’s William Briggs:
http://www.newsblur.com/site/1100903/

You can read their copy of his site blog at
http://www.newsblur.com/reader/page/1100903

Interestingly enough, it appears newsblur.com has fresh copies of the actual page content, but when I click “feed” the most recent date is Dec 16,2011. So, the bot that refreshes the feed seems somewhat behind the times. If I click “story” I also read the Dec. 16, 2011 article on the mathematics of Santa Delivery.

I wonder if my visit will trigger newsblurs bot to visit Matt Briggs articles? (I can’t speak for Matt so I can’t begin to guess if he gives a hoot about this. I still think they ought to pay him a penny a view or something like that.)

I’m just stepping through 1 at a time. I often trigger “fetching feed”.
lucia says:

August 13, 2012 at 2:39 pm

To some people, what that site is doing is perfectly fine, so the business practice lucia is bothered by seems good to them.

Moreover, with copyright it will turn out that some content owners won’t care that their stuff is copied. For example, Matt Brigg’s stuff is up at newsblur. If you tell him it’s copied at http://www.newsblur.com/site/1100903/ his reaction might be “no skinn off my nose”. Or– he might think he’s willing to let it slide for a while.

Mind you: the amount of money I’m suggesting they should be paying for the load I estimate my site was experiencing is not much. As far as I can tell exactly 1 person was viewing at newsblur. Their spider seemed to visit twice a day on the two or three days I saw last week. Assuming their spider visited at that rate for a year that would be $73.0 for the spider visits and assuming one person a year read 1 page a day, that would be another $3.65. So, $77 a year. That’s it.

But bear in mind: Because their business model is to put a wrapper around many people’ contents, I think they should be offering everyone this much and they should not copy unless the blogger as opted in. If Matt didn’t want the money, I think Matt should consider advising them to donate the money to a worth cause.

But that’s my opinion. Matt may not care.
Brandon Shollenberger says:

August 13, 2012 at 2:53 pm

lucia, unfortunately, there is nothing which prevents NewsBlur from ignoring that header. That only works because the code was written to respect the header. It could easily have been written otherwise.

Ignoring it would probably hurt them if they were ever taken to court, but otherwise…
lucia says:

August 13, 2012 at 3:16 pm

Brandon–
I hunted around for a range of advice for framebusting on Saturday. My impression was that command was read by the browser a person surfing is using. Is that mistaken?

I read numerous bits of information explaining techniques to bust a framebuster. But certainly if
a) Party A copies Party B’s material without permission
b) Party A tries to to frame their copy.
c) On finding that Party B inserted even a very simple framebuster that would prevent them from framing the copy, Party A removed or disabled the framebuster and then
d) Party B sued for copyright violations

A judge is not going to look favorably on Party A.

If the copying was fair use, the whole “framebusting” thing wouldn’t turn the copying into a violation. But…. if A couldn’t defend it under fair use and they broke a framebuster to display it… bad. Very bad.

I can’t help but wonder if the “framebuster buster” he mentioned in his Nov 2011 priority list was essentially: 1) copying text and 2) hosting the copy.

That would bust the most common framebusters which compare “self” to “top” like this:

if (top != self) { top.location.replace(self.location.href); }

When someone copies the html, hosts the copy at “otherdomain.com” and then displays it in a frame at “otherdomain.com”, top==self is true.
Chuckles says:

August 13, 2012 at 3:35 pm

Lucia, In the scenario you describe, in common with all ‘edgy’ 21st century startup companies, you and your blog are not the customer, you are the product.
I would tell them to go forth and multiply. Nuke them from orbit if necessary.
Brandon Shollenberger says:

August 13, 2012 at 3:52 pm

lucia, nevermind. You’re right. I mixed it up with the fact NewsBlur could easily strip the header.
lucia says:

August 13, 2012 at 3:53 pm

you are the product.

Agreed. And I think they should pay me for my product. If they can find customers who pay them for my product they should certainly enter into an agreement with me wherein they pay me for my product.
lucia says:

August 13, 2012 at 3:59 pm

Brandon

lucia, nevermind. Youâ€™re right. I mixed it up with the fact NewsBlur could easily strip the header.

I think the confusing aspect is that in somecases they appear to merely frame. In others they appear to copy then frame their copy.

When merely framing, those X-headers should work. But otherwise…. not so much. Interestingly, if they merely framed, the “original feed” wouldn’t go stale.
lucia says:

August 13, 2012 at 4:07 pm

Thinking of the newsblur “system”, I wonder what is to prevent someone from creating a fake blog, populate it with “links” all followed, get newsblur to create a copy.

Then someone can link to the newsblur copy creating inbound links google will follow.

Then…. google will attribute some positive blessing to the links on that newsblur page?

And for that matter, maybe it’s happening.

Compare:
http://www.newsblur.com/site/1100920/
http://www.newsblur.com/site/1100921/
http://www.newsblur.com/site/1100923/

http://www.newsblur.com/site/1100925/
http://www.newsblur.com/site/1100926/

You get the picture. Now I’ve linked them.
lucia says:

August 13, 2012 at 5:09 pm

Their bot is still visiting and made a fresh copy. I added a time stamp the blog so you can see! Go here:

http://www.newsblur.com/site/1100897/

I wrote this on the 11th

Hello Samuel,

I have consulted with the 1 reader who subscribes to my blog through that which you represent as a “feed reader”. I use that term “feed” loosely, since copying non-feed pages like http://rankexploits.com/musings, hosting that html on your server and letting people load that html at our site is not providing them with content that could be called “a feed”.

Based on consultation with that reader and my own interested, I would like it if you would
a) cease and desist copying the full html posted at my hobby blog at http://rankexploits.com/musings,
b) desist in displaying the full html I post at my hobby blog http://rankexploits.com/musings,
and
c) only provide access to content available at my feed whose address is http://rankexploits.com/musings/feed .

That is with respect to my site: act like a feed reader by reading feeds. Do not behave like a scraper who copies content on a non-feed uri.

Please let me know if you have any questions and alert me when you have made made the requested change. After that, I will contact the 1 reader who likes to read my site at your blog and tell him it’s now safe to do so,

Thank you,
Lucia Liljegren

(BTW: I will blame typos on having imbibed on Saturday. I had, indeed, imbibed. Nevertheless, I thought it important to tell Samuel to cease and desist.)
Carrick says:

August 13, 2012 at 5:29 pm

I think it’s good sense to prevent unauthorized wholesale duplication of your work, even if you didn’t otherwise care.

What I would worry about most is what happens if the third party inserts malware into your webpages while luring them in using your content.
lucia says:

August 13, 2012 at 9:09 pm

Their page fetcher came around and copied:
199.15.250.233 – – [13/Aug/2012:14:44:45 -0700] “GET /musings HTTP/1.1” 301 497 “-” “NewsBlur Page Fetcher (1 subscriber) – http://www.newsblur.com (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)”
199.15.250.233 – – [13/Aug/2012:14:44:45 -0700] “GET /musings/ HTTP/1.1” 200 10394 “-” “NewsBlur Page Fetcher (1 subscriber) – http://www.newsblur.com (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)”

The relative time stamps are a bit confusing because the time stamp in my sidebar appears to be GMT (I think) and is also showing the time in the cloudflare cache so it’s a little stale. (The time stamp I see reads 2:38 am Aug 14. It’s 10:05 pm Aug 13 in Chicago.) My server logs are in California time. The copy of my page at newsblur has a time stamp of 9:36 pm Aug 13.

http://www.webcitation.org/69twkx2tS So, that time corresponds to the visit in the serverlogs. (Cloudflare will also cache then!)
steveta_uk says:

August 14, 2012 at 1:34 am

Please let us all know if some news “aggregator” does agree to a $0.01 fee per read, as we can all switch to reading your blog via their site, and thus redirect their profits into the correct pockets.
lucia says:

August 14, 2012 at 6:49 am

Please let us all know if some news â€œaggregatorâ€ does agree to a $0.01 fee per read, as we can all switch to reading your blog via their site, and thus redirect their profits into the correct pockets.

Well, that could be a danger for them, wouldn’t it?

Also, I have no objection if the aggregator displays my feed and wouldn’t charge the $0.01 if they displayed that. The feed is the amount of content I have specifically selected to permit aggregators, feed readers etc. to display after copying. It serves to inform readers that something has been published. Moreover, if I chose to do so, the feed can include the entire article. That’s why feeds exists.

But Newsblur isn’t limiting itself to displaying a feed. In fact, if you look at the server logs, it is crawling the top page of the blog.

Samuel Clay said he would remove that. He hasn’t yet. I emailed Saturday– maybe he had a busy weekened. I think I’ll remind him. I’m tempted to remind him on twitter. I’m trying to decide if that would be to obnoxious.
Duke C. says:

August 14, 2012 at 8:32 am

After subscribing to newsblur last night, I grabbed a screencap-

http://img163.imageshack.us/img163/8405/newsblur1.png

Looks like there are 5 subscribers, probably curious Blackboard readers (like me).

A screencap from this morning-

http://img19.imageshack.us/img19/3529/newsblur2.png

Being blocked by the newsblur proprietor?
Skeptikal says:

August 14, 2012 at 9:06 am

Duke C. (Comment #101449),

Being blocked by the newsblur proprietor?

Maybe their software is just fussy. Try using the exact url they recommended in the first screen-capture… http://rankexploits.com/musings/feed/rss/
lucia says:

August 14, 2012 at 9:25 am

Sorry to the people who got autobanned when visiting newsblur…..
(I first detected the issue when someone got autobanned. I thought I’d pulled out the various trip-ups that get people with “wrong referrers”, but I missed one! I thought I’d looked too. Arghh!!!

Unfortunately, while no one got banned when the message lingered only 2 seconds, (which is when I checked.) People did begin to get banned when my javascript message lingered 10 seconds which happened after Newsblur made a fresh copy.

Anyway, for some readers, visiting their page resulted in a “direct call to something missing in the theme folder”. Which is something that bots do do when penetration testing but people do not ordinarily do.

I’ve gone through the “killed” logs and I hope I found everyone who was banned for that reason.

I don’t know whether the most annoying thing was that the operation of his site triggered my “auto-banning” of a reader, or that when I wrote him complaining about his copying, his response was

Hey Lucia, so hold up, NewsBlur is about the best friend you can get as a publisher. We’re faithfully reproducing your page in a way that other RSS feed readers don’t. For your readers who use an RSS feed reader and miss seeing the original site, NewsBlur is perfect.

Newsblur does include an RSS reader. But the “original view” –shown by default–is not an RSS reader. And in my estimation, people who read in an RSS reader and “miss seeing the original site” can put their mouse over a link and click. It’s no more difficult than trying to scroll around and them moving your mouse down to the “story” button and clicking that and waiting for the stupid display to “whir” to the right to show a blog post in a frame.
Skeptikal says:

August 14, 2012 at 9:52 am

lucia (Comment #101451),

I donâ€™t know whether the most annoying thing was that the operation of his site triggered my â€œauto-banningâ€ of a reader, or that when I wrote him complaining about his copying, his response was

Maybe you should contact his funders and explain to them what they are actually funding. 😛
lucia says:

August 14, 2012 at 10:21 am

Just to show that the referrers left when someone visits http://www.newsblur.com/site/1100897/ :

xxx.xx.xxx.xx - - [14/Aug/2012:09:13:07 -0700] "GET /musings/wp-includes/js/jquery/jquery.js?ver=1.7.1 HTTP/1.1" 304 175 "http://www.newsblur.com/reader/page/1100897" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1" xxx.xx.xxx.xx - - [14/Aug/2012:09:13:07 -0700] "GET /musings/wp-content/plugins/flexo-archives-widget/flexo-anim.js?ver=2.0 HTTP/1.1" 304 173 "http://www.newsblur.com/reader/page/1100897" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1" xxx.xx.xxx.xx - - [14/Aug/2012:09:13:07 -0700] "GET /musings/wp-content/plugins/BanNasties/avatars/today_image.php?uri=/GrizeldaPeering.jpg&Days=12225&Post=rankexploits.com/musings/ HTTP/1.1" 200 2448 "http://www.newsblur.com/reader/page/1100897" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1" xxx.xx.xxx.xx - - [14/Aug/2012:09:13:15 -0700] "GET /musings/ HTTP/1.1" 200 10497 "http://www.newsblur.com/reader/page/1100897" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1" xxx.xx.xxx.xx - - [14/Aug/2012:09:13:16 -0700] "GET /musings/wp-content/plugins/flexo-archives-widget/flexo-anim.js?ver=2.0 HTTP/1.1" 304 173 "http://rankexploits.com/musings/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1" xxx.xx.xxx.xx - - [14/Aug/2012:09:13:16 -0700] "GET /musings/wp-includes/js/jquery/jquery.js?ver=1.7.1 HTTP/1.1" 304 175 "http://rankexploits.com/musings/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1"

In contrast, if you arrive after being redirected from http://www.newsblur.com/site/1100897/ , the referrer is:

108.69.172.56 – – [14/Aug/2012:09:13:15 -0700] “GET /musings/ HTTP/1.1” 200 10497 “http://www.newsblur.com/reader/page/1100897” “Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:14.0) Gecko/20100101 Firefox/14.0.1”
MarkB says:

August 14, 2012 at 12:09 pm

I have a Google blog (linked). When i looked at my stats a while ago, one of them was just a series of characters – couldn’t figure out what it would be. When I went to it, it was a Japanese site with my content on it in a frame. It wasn’t a big issue – who in Japan would want to read an American local history site? – so I let it go. This was at least six months ago, and I haven’t seen the link since. Sounds similar to what’s happening here.
lucia says:

August 14, 2012 at 12:30 pm

Well… I tweeted. That seems to have raised the priority of Samuel getting around to not showing my blog in “original”. He coded in a work around for my address:

https://github.com/samuelclay/NewsBlur/commit/af9d9bf11fb7c8b76caf8b8c7b7dc5c81f380672

It looks like I join the ny times on this list:
‘nytimes.com’,
‘stackoverflow.com’,
‘stackexchange.com’,
‘twitter.com’,
‘RankExploits’

But… it appears my system banned his final visit!
lucia says:

August 14, 2012 at 1:42 pm

Mark–
That has also happened to me. But I see newsblur as different because:
1) The purpose at cloudflare is for them to show fresh content to people who requested they obtain that content from my blog. It is essentially providing a substitute to the exact same audience.

My single blog frame buster seems to no longer work. So I’m now blocking anythign with a referrer from that particular newsblur blog. I’ll eventually write a script to deal with that I see is the “issue”. But at least now they are not displaying copies and not triggering my banning scripts! (I hope.)
FergalR says:

August 14, 2012 at 3:04 pm

Awesome work sabotaging those asshats. Go Lucia!
lucia says:

August 14, 2012 at 3:07 pm

FregalR–
I think he busted my frame buster and my individual page still show though…. He is a better programmer than I am. But I think I’ll get this fixed.
lucia says:

August 14, 2012 at 3:13 pm

Fregal–
I think I got the “stories” to not show. Go over here:

http://www.newsblur.com/site/1100897/

Click “feed” near the bottom. Then click “story”. You should get a nearly white pane and an apology. Then click the link. That should break out of the pane.
FergalR says:

August 14, 2012 at 3:14 pm

Even so – seeing this on google cache;

http://i46.tinypic.com/91b4zp.jpg

. . . then getting redirected here seconds later is delightful. Although my clever scheme to read blogs I don’t like without visiting them needs to be reworked.

The URL if it helps:
http://www.google.ie/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CAsQIDAA&url=http%3A%2F%2Fwebcache.googleusercontent.com%2Fsearch%3Fq%3Dcache%3A0VxWkxS2_MsJ%3Arankexploits.com%2Fmusings%2F%2B%26cd%3D1%26hl%3Den%26ct%3Dclnk%26gl%3Die%26client%3Dfirefox-a&ei=07sqULSoApGFhQeG3oG4Bg&usg=AFQjCNEDI8aPdf-Lq5zPTCt-paeZmfCaSQ&cad=rja
lucia says:

August 14, 2012 at 4:18 pm

Fergal–
Although my clever scheme to read blogs I donâ€™t like without visiting them needs to be reworked.
Well…. I did put a work around to let the French read in Google translage. 🙂

I posted the url when it looked like that at twitter. It looks like the roughly 3 minutes worth of work requried to implement the changes I asked for on by email on Saturday became a higher priority after my fairly obnoxious tweet. (I mean, I know he wouldn’t want the world to be seeing the “asshat” message he was hosting on his ow server and displaying at his own site.)

I could have sworn my individual page framebuster used to work though. Oh well.. my work around works. But I need to pass a variable in a query string and read it to make it for convenient for any one who ever does like to use newblur as feed reader. I just don’t want it to be a mirror or rehoster or whatever might be the correct word. (Well… unless they pay. Which I’m sure they don’t and never will.)
Mike Haseler says:

August 15, 2012 at 4:30 am

Lucia, a very interesting question. I work on the principle that a copy should be a “taster” which promotes the original producer and invites someone to get the full meal at the original website.

I have occasionally posted a full copy without permission:
1. When the article seemed to have been repressed and it was in the public interest to reproduce
2. When I am fairly confident the article was written by someone whose views agree with my own with the intention of publicising a point (and even so I only do it very rarely when e.g. I can’t find an email or it is a breaking news story and there isn’t time to get permission … although I tend to send an email afterwards explaining what I’ve done).

On the RSS feed. Like you I think that RSS feeds (and press releases) are given with permission for any reasonable use). Indeed, I would go further and say, that as an RSS feed is little more than a title and a brief summary, this is fair use under copyright law and there is no way to stop anyone using an RSS feed or creating the equivalent themselves.

But, going to the specific example. There are two reasons why this is clearly a breach:
1. They are using your content as if it were their own. In other words they are stealing your material.
2. They are not directing traffic to your site. Whether you get advertising or not, at the very least, your site ranking will partly depend on the number of visitors. The point at which “fair-use” becomes “theft” is when they steal your visitors. To be fair-use, you have to be encouraging readers to the site … even if it is a site you don’t like, you have to give a link (or at least tell people how to find it … if like me you object to giving links to certain defamatory sites)

So, if they reproduce the whole article, worse they repeat your whole content so there is no reason to visit your site … they cannot be encouraging visitors.

Personally, I’d write them an email invoicing them $400 per article (or find a going rate for journalism)
Eli Rabett says:

August 15, 2012 at 1:03 pm

FWIW it doesn’t pay to get into a coding war with clowns. There are more of them than thee or Eli. OTOH you could try and find a way for them to channel eyes to you, or you could just send them a demand letter, and if they don’t comply go to their ISP.
lucia says:

August 15, 2012 at 1:34 pm

Eli-
I agree that there is little point in a coding war. There are somethings I can do to control displaying things hosted on my server– but my skills on that are limited.

I requested Samuel Clay not display copies of my page he was hosting on his machine on Saturday. He wrote back to tell me how wonderful his service was and hoped I’d changed my mind. I told himto remove…. crickets…

I tweeted on Tuesday pointing to the page that now suggested whosever was copying and framing was an asshat. He then did what appears to be 2 minutes worth of work to stop displying my content.

OTOH you could try and find a way for them to channel eyes

Ordinary feeds are structured to channel eyes to me by providing some text– but without all the other stuff someone might want access to.

Providing a full copy of the html at an alternate site? Not so much.

and if they donâ€™t comply go to their ISP.

I think you mean their hosting company. But some people own and operate their own servers so there is no one else to to talk to.

I had started creating a “folder” and was looking into learning how to register posts to create the maximum possible award if I was unable to prevent the wholesale copying. I think Mr. Clay might not be aware that he is vulnerable to some heavy fines.

Imagine if some clever and litigeous bunny decided to find a person, person or entities to “subscribe” to his blog on newsblur. Then, instead of communicating “cease and desist” to Mr. Clay or Newsblur the first moment he saw copying, the litigious bunny kept a file of showing screen shots of the top of his blog every day for a 30 days . Meanwhile, the litigious bunny took the trouble to register all 30 of these copies. He then hires an attorney to file a copyright suit complaining of 30 individual instances of copying.

Heck– what if three pissed off bloggers joined forces and all did this?

What the heck sort of defense is newsblur going to present?

The lawsuit could throw in all sorts of other crap.
Eli Rabett says:

August 15, 2012 at 5:58 pm

Domain Name: NEWSBLUR.COM
Registrar: DYNADOT, LLC
Whois Server: whois.dynadot.com
Referral URL: http://www.dynadot.com
Name Server: NS1.DNSIMPLE.COM
Name Server: NS2.DNSIMPLE.COM
Name Server: NS3.DNSIMPLE.COM
Name Server: NS4.DNSIMPLE.COM
Status: clientTransferProhibited
Updated Date: 19-may-2012
Creation Date: 18-feb-2009
Expiration Date: 18-feb-2014

Copyright complaints to
http://www.dynadot.com/copyright_policy.html
lucia says:

August 15, 2012 at 9:04 pm

Eli– As I said:

I tweeted on Tuesday pointing to the page that now suggested whosever was copying and framing was an asshat. He then did what appears to be 2 minutes worth of work to stop displying my content.

So, I was able to get him to stop copying on Tuesday after I tweeted. That was much easier than providing chapter and verse in an official DMCA. I could have done an DMCA had he continued to refuse. But, quite honestly, I think modifying my pages so that the message telling people loading copied material at newsblur.com was better.
lucia says:

August 15, 2012 at 9:50 pm

199.15.250.233 – – [15/Aug/2012:14:12:10 -0700] “GET /musings HTTP/1.1” 301 497 “-” “NewsBlur Page Fetcher (5 subscribers) – http://www.newsblur.com (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)”

Sigh…..

Eli. It’s baaaacckkkkk! I’m trying to decide if this should involve a tweet or a DMCA using the page you pointed me to. Arghhh!!!!
Skeptikal says:

August 16, 2012 at 4:46 am

lucia (Comment #101564),

Sighâ€¦..

Maybe you could just ban 199.15.250.233 and see if that slows him down.
lucia says:

August 16, 2012 at 6:33 am

Skeptiacal–
I can ban his bot. The difficulty with that approach is that his business model contains a larger problem. The approach is to assume he has permission to copy and display material of lots of individuals. I am just one of them.

But on general principles, people whose business model it is to copy material and display or use it as part of a product they sell should be getting permission. It should be option not “wait until they figure out you are copying their stuff and telling them they can opt out. Then, when they ask you to ‘opt-out’, sending them a sales pitch telling them how you are doing them a favor. Then, after they tell you to take it down, doing nothing. Then, when someone escalates to twitter, making a coding change that does not prevent your bot from copying and displaying, not checking having the material continue to appear.”

As Eli said: the proper approach is not to try to get into a coding war. (With respect to the framing, the coding war may be the only approach. But to the extent that this involves copying…. The law does not require me to win a coding war against someone who is certainly a better programmer than I am to protect my copyright.
Skeptikal says:

August 16, 2012 at 7:13 am

lucia (Comment #101573),

But on general principles, people whose business model it is to copy material and display or use it as part of a product they sell should be getting permission.

I agree, but this guy doesn’t seem to care.

The law does not require me to win a coding war against someone who is certainly a better programmer than I am to protect my copyright.

Maybe not, but it’s got to be more fun than going to court. 😛
lucia says:

August 16, 2012 at 8:20 am

Maybe not, but it’s got to be more fun than going to court. 😛

Agreed.

But an appropriate DMCA sent to his name servers may strike the appropriate amount of fear in him and get him to talk to someone who knows something about copyright law and discuss how he can do this properly with someone who understands that content creators do have rights.

The only real question though is whether this guy think he falls under the safe harbor of service provider:

http://www.copyright.gov/title17/92chap5.html#512
Skeptikal says:

August 16, 2012 at 9:55 am

lucia (Comment #101577),

The only real question though is whether this guy think he falls under the safe harbor of service provider

Things can have different meanings in different countries but from an Australian perspective, reading that document I would take a “service provider” to be a company providing an internet service to consumers. It’s giving them safe harbor from liability for whatever gets transmitted over their networks.
lucia says:

August 16, 2012 at 10:13 am

Skeptical… Yes. But read this:
https://www.eff.org/deeplinks/2006/01/google-cache-ruled-fair-use

Home
About
Our Work
Deeplinks Blog
Press Room
Take Action
Shop

Share on TwitterShare on FacebookShare on Google+Share on Identi.caShare on DiasporaEmail This
January 25, 2006 | By Fred von Lohmann
Google Cache Ruled Fair Use

A district court in Nevada has ruled that the Google Cache is a fair use.

Blake Field, an author and attorney, brought the copyright infringement lawsuit against Google after the search engine automatically copied and cached a story he posted on his website. The district court found that Mr. Field “attempted to manufacture a claim for copyright infringement against Google in hopes of making money from Google’s standard [caching] practice.” Google responded that its Google Cache feature, which allows Google users to link to an archival copy of websites indexed by Google, does not violate copyright law.

The court granted summary judgment in favor of Google on four independent bases:

Serving a webpage from the Google Cache does not constitute direct infringement, because it results from automated, non-volitional activity by Google servers (Field did not allege infringement on the basis of the making of the initial copy by the Googlebot);
Field’s conduct (failure to set a “no archive” metatag; posting “allow all” robot.txt header) indicated that he impliedly licensed search engines to archive his web page;
The Google Cache is a fair use; and
The Google Cache qualifies for the DMCA’s 512(b) caching “safe harbor” for online service providers.

The decision is replete with interesting findings that could have important consequences for the search engine industry, the Internet Archive, the Google Library Project lawsuit, RSS republishing, and a host of other online activities.

I’m reading.

Note: Differences though:

(Field did not allege infringement on the basis of the making of the initial copy by the Googlebot)

I object to newsblur making the initial copy. In Google’s case, they make the initial copy for the purpose of search. Their algorithm later crunches all that information.

So, it seems to me once they have the copy which was copied for a valid reason the question is can they display when a user specifically requests that cache?

But as far as I can see, Newsblur’s purpose in making the copy is precisely to display the copy.

There are other differences too:
1) As far as I can tell, Newsblur has not been visiting my robots.txt.

2) If it had been visiting my robots.txt, it would have discovered it was forbidden. Back in July I began logging these. (Brandon Shollenberger knew I was doing this, and I think he thought my reasons might a bit… uhhmmmm nuts. Or at least my treatment of bots was draconian since I was both monitoring and banning the known bad ones. )

3) As far as I can tell, Newblur does not post any information that permits me to specifically instruct their robot not to archive. It also doesn’t tell me how to specifically instruct their robot not to crawl or visit. To do that I need to be told a “name” to use in robots.txt and in any metatags. (Example: googlebot.)

If anyone knows how I can write metatags to specifically state I
a) permit “good” bots (i.e. google, maybe msn etc.) to archive and
b) forbid all other bots–especially those with ‘no names’.
I’d appreciate.

I want to be sure what robots “see” is clear.
Skeptikal says:

August 16, 2012 at 10:46 am

The Google Cache is a fair use

I’d certainly agree with that. They don’t cache with the intent of putting it up on their own website for profit.

The Google Cache qualifies for the DMCAâ€™s 512(b) caching â€œsafe harborâ€ for online service providers.

Okay, that one I just don’t get.

I guess that over there, the term “service providers” must cover more than just the companies providing internet connections.

Regardless, I still couldn’t imagine Newsblur qualifying for safe harbor. Maybe search engines qualify as they have a legitimate use for the content in providing their search service, but the law wouldn’t cover someone who’s stealing content to put up on their own website.
lucia says:

August 16, 2012 at 11:15 am

Skeptical

(k) Definitions.â€”

(1) Service provider. â€” (A) As used in subsection (a), the term â€œservice providerâ€ means an entity offering the transmission, routing, or providing of connections for digital online communications, between or among points specified by a user, of material of the user’s choosing, without modification to the content of the material as sent or received.

(B) As used in this section, other than subsection (a), the term â€œservice providerâ€ means a provider of online services or network access, or the operator of facilities therefor, and includes an entity described in subparagraph (A).

If I have the outline letters and numbers right subsection (1) is 512(a). That is about (a) Transitory Digital Network Communications.

So, those are mostly ISPs.

The rest of the stuff is more general. So, it could be the host (my host is dreamhost) but it’s broader than that. It could be youtube, a blog, pinterest, a discussion forum and so on. My blog could be a “service provider” providing people the opportunity to discuss…whatever.

So, in that sense, newsblur could claim to be a “service provider”. Google is a “service provider”. That’s why I’m reading. Just because you are a “service provider” doesn’t mean you can do anything you wish.
lucia says:

August 16, 2012 at 11:18 am

The google ruling is here:

http://www.linksandlaw.com/decisions-148-google-cache.htm
Brandon Shollenberger says:

August 16, 2012 at 11:29 am

Skeptikal:

Things can have different meanings in different countries but from an Australian perspective, reading that document I would take a â€œservice providerâ€ to be a company providing an internet service to consumers. Itâ€™s giving them safe harbor from liability for whatever gets transmitted over their networks.

I guess that over there, the term â€œservice providersâ€ must cover more than just the companies providing internet connections.

There is a huge difference between internet service providers and online service providers. An OSP is basically anything that takes input from other sources and then transmits it That can be something like Google which crawls the internet and stores information about what it finds, or it can be something like a chat room. The key is that an automated process is used that can take material and make it available without the direct control of the owner. And as for ISPs, they are a subset of OSPs.

An important point is being an OSP doesn’t inherently qualify one for exemptions. There are a number of additional requirements one must meet to be eligible for “safe harbor” status.
Brandon Shollenberger says:

August 16, 2012 at 11:37 am

By the way, safe harbor from DMCA requires one register a DMCA Agent with the United States Copyright Office. You can find a directory of such agents online, and the N page is here. It appears NewsBlur hasn’t registered one.

That means NewsBlur isn’t eligible for any of the limitations on liability established by the DMCA law.
Skeptikal says:

August 16, 2012 at 11:53 am

So, in that sense, newsblur could claim to be a â€œservice providerâ€.

If it was that easy, then I’m sure Limewire would have made that claim.

I guess you’ll have to take him to court to find out for sure. 😛
Brandon Shollenberger says:

August 16, 2012 at 11:54 am

lucia:

1) As far as I can tell, Newsblur has not been visiting my robots.txt.

That is a huge no-no.

2) If it had been visiting my robots.txt, it would have discovered it was forbidden. Back in July I began logging these. (Brandon Shollenberger knew I was doing this, and I think he thought my reasons might a bitâ€¦ uhhmmmm nuts. Or at least my treatment of bots was draconian since I was both monitoring and banning the known bad ones. )

Logging never bothers me. What I thought was a bit nuts was reasonable bots who visited your robots file could get treated worse than bots that just ignored the file. That seems backwards.

If anyone knows how I can write metatags to specifically state I
a) permit â€œgoodâ€ bots (i.e. google, maybe msn etc.) to archive and
b) forbid all other botsâ€“especially those with â€˜no namesâ€™.
Iâ€™d appreciate.

The robots file is actually sufficient for that. All you need to do is create a whitelist for the “good” bots then tell the rest to bugger off. In addition to your whitelist, just add:

User-agent: *
Disallow: /

And it’s enough for any court. As for metatags, I normally recommend avoiding them unless directory structure issues force you to use them. They’re more work (they do also give you a finer control), but if you want to use them, I can tell you what you need to know.
Skeptikal says:

August 16, 2012 at 12:00 pm

Brandon Shollenberger (Comment #101607),

There is a huge difference between internet service providers and online service providers.

Thanks Brandon. All it said was “service providers” which in Australia would usually refer to ISP’s. You obviously have a more blanket use of the term over there.
Brandon Shollenberger says:

August 16, 2012 at 12:02 pm

Skeptical:

If it was that easy, then Iâ€™m sure Limewire would have made that claim.

I guess youâ€™ll have to take him to court to find out for sure. 😛

There’s no doubt NewsBlur is an OSP. However, as I pointed out above, it doesn’t appear to have registered a DMCA Agent with the US Copyright Office, so it’s not eligible for any limitation on liabilities.

Beyond that, there are a variety of ways in which NewsBlur may fail to meet the “safe habor” requirements. One requirement I’m curious about is NewsBlur must transmit the material without modification. That isn’t the case with NewsBlur, but I’m curious if the extent of the modification may be limited enough that a court would overlook it.
Skeptikal says:

August 16, 2012 at 12:04 pm

Brandon Shollenberger (Comment #101608),

By the way, safe harbor from DMCA requires one register a DMCA Agent with the United States Copyright Office….

It appears NewsBlur hasnâ€™t registered one.

There’s your answer, Lucia.
lucia says:

August 16, 2012 at 12:05 pm

Brandon–

Logging never bothers me. What I thought was a bit nuts was reasonable bots who visited your robots file could get treated worse than bots that just ignored the file. That seems backwards.

But that’s not what I actually do. Bots that I would ban anyway were banned no matter where they visit. So for example mj12 is banned. If it happens to visit robots.txt first, it is banned. If it skips it, it is banned.

Bots that visit robots.txt and aren’t otherwise banned make it through and get to crawl. If they are not previously “whitelisted” bots, they get logged in “robots_New_Aug.dat”. Some bots make it through. Others… nooooo!

But it is true that a bad bot that has switched it’s user agent to evade being banned would be logged when it visits robots.txt and I would recognize that sooner. So… I can ban it!

Also, I do have some strict rules, and some ‘bots’ I would deem ok get banned (whether or not they visit robots.txt.) But I log the all “not major” bots in robots_Minor_Aug.dat. It’s a small list and I can periodically see if I’m banning something I would have approved of. Then I can unban it– and more importantly white list it. If I weren’t logging I would never notice its existence and it would be banned when it started crawling.

Hhhh… something that
Brandon Shollenberger says:

August 16, 2012 at 12:09 pm

Huh. Editing seems to be disabled. I was going to add something to my last comment, but I guess I’ll just point it out here.

One of the biggest requirements for “safe harbor” protection is the OSP cannot directly profit off infringing material. That’s a major hurdle for things like LimeWire, and it may be one for NewsBlur as well.

Thanks Brandon. All it said was â€œservice providersâ€ which in Australia would usually refer to ISPâ€™s. You obviously have a more blanket use of the term over there.

Skeptical, while I’m glad to help, it isn’t just a US thing. It’s global. The issue you’re running into isn’t cultural. It’s technical versus lay. Most people have no reason to talk about OSPs as a category, so they never do. That means they never learn the term, and the only time they hear “service provider” is in reference to ISPs.

It’s just a case of a group being known by a subset rather than the whole. After all, ISPs are OSPs (but OSPs aren’t necessarily ISPs).
Brandon Shollenberger says:

August 16, 2012 at 12:11 pm

lucia:

But thatâ€™s not what I actually do. Bots that I would ban anyway were banned no matter where they visit. So for example mj12 is banned. If it happens to visit robots.txt first, it is banned. If it skips it, it is banned.

Ah. I probably misunderstood your description then.
lucia says:

August 16, 2012 at 12:24 pm

And itâ€™s enough for any court. As for metatags, I normally recommend avoiding them unless directory structure issues force you to use them. Theyâ€™re more work (they do also give you a finer control), but if you want to use them, I can tell you what you need to know.

Yes. I’m trying to do belt and suspenders.

As you know….I had already set up dynamics robots.txt and only those bots that I deemed “good” saw anything other than “disallow all”. (For the curious IT guys. DO NOT VISIT MY ROBOT.TXT with cookies on. Or if you do, turn off cookies. You can’t read that with a browser. No. No no. That is the one exception where a robot can get banned when it wouldn’t have otherwise. This may be nuts… but no…)

So, in principle, this should be enough. The robot should periodically visit robots.txt. If it did, it should have learned it was excluded. It should not have archived. But now…. I’m going to add the meta-tag. But I’d like to keep the google archive.
Brandon Shollenberger says:

August 16, 2012 at 12:27 pm

I just saw you had sent an e-mail asking about that, and I sent a response telling you what you’d need to add. There are too many lines of communications at times.

By the way, I typed “googlebot” in my example, but you may have to modify the name. I don’t keep track of what Google names its stuff anymore.
lucia says:

August 16, 2012 at 12:44 pm

Skeptical

By the way, safe harbor from DMCA requires one register a DMCA Agent with the United States Copyright Officeâ€¦.

It appears NewsBlur hasnâ€™t registered one.

Not only are you required to register one, but it appears you are also required to make provide information on how to report the complaint on your site. If a “service provider” wants DMCA protection they must register for a DMCA agent make it easy to find the DMCA agent.
Brandon Shollenberger says:

August 16, 2012 at 1:02 pm

lucia, all the law requires on the website is one post some basic contact information for the DMCA agent (name, address, phone number and e-mail) in a publicly accessible location. There’s nothing that requires it be easy to find. I imagine a court wouldn’t be impressed if somebody intentionally hid the information so it was hard to find, but I also imagine they’d give quite a bit of leeway to the service provider.
lucia says:

August 16, 2012 at 1:06 pm

Brandon–
The statute says it must be available at it’s site. I guess I consider making the information available at it’s web site “easy” while not including it there and forcing someone to hunt down the information at the copyright office “hard”. Technically, the copyright office is “publicly accessible”. But it’s a chore for those who don’t spend a lot of time hunting down stuff at the copyright office.
Brandon Shollenberger says:

August 16, 2012 at 1:31 pm

They are required to post the information to their site in addition to registering with the Copyright Office. There just isn’t any specification as to where they have to post it. That means finding the information might require scouring their entire site to figure out if and where they chose to place it.

That’s why if I want to know if a site has a DMCA agent, I’m probably not going to look on their site. I don’t want to claim they didn’t post one only to find out there was some link somewhere I overlooked. I find it easier to look something up in the Copyright Office’s directory than to try to find it on some sites.
lucia says:

August 16, 2012 at 1:45 pm

In my first email exchange, I closed with

If you have a DMAC agent set up and feel I should make a formal DMAC request, please provide me the contanct details.

(Typo… yes!)

How does one find out whether a DMCA agent is filed at the copyright office? I did do a site specific google search. But of course, that might not work. 🙂
Brandon Shollenberger says:

August 16, 2012 at 2:49 pm

As an update, when I checked NewsBlur just now, it only displays your feed. I can still access the archived pages by following the pre-existing links, but that’s it.

How does one find out whether a DMCA agent is filed at the copyright office? I did do a site specific google search. But of course, that might not work. 🙂

They have to register under their name so I just browse to the appropriate letter then hit Ctrl+F.
lucia says:

August 16, 2012 at 3:44 pm

Brandon– URL where you look for their name?
Brandon Shollenberger says:

August 16, 2012 at 4:41 pm

lucia:

Brandonâ€“ URL where you look for their name?

Here. You can find that link if you go to the Copyright Office’s web page covering OSPs. The first time I found it was because I saw the DMCA law required them keep a directory online, and I decided to look for it. It only took me a minute to find it.

Granted, I had the advantage of being familiar with the law and using the Copyright Office’s web site a number of times.
Brandon Shollenberger says:

August 16, 2012 at 4:43 pm

By the way, I noticed the messages you’ve added to the pages recently extend to the edges of the page. It looks kind of bleh where the blue columns are on either side of the page.

Just saying.
lucia says:

August 16, 2012 at 4:46 pm

Thanks. So if it exists it’s not under anything obvious like “Newsblur” or “Samuel Clay” or anything like that.
lucia says:

August 16, 2012 at 4:48 pm

Brandon– Yes. I wanted to add the words. I’ll figure out the div tags tomorrow.
Brandon Shollenberger says:

August 16, 2012 at 5:14 pm

A DMCA agent would have to carry the name of what it’s representing, so it not being under NewsBlur or any close variant means it doesn’t exist.
Charlie A says:

August 17, 2012 at 6:27 am

I get your measage about ass-hat third parties when attempting to use Pocket (formerly known as ReadItLater) to cache your blog so that I can read it later while offline.

All that Pocket is trying to do is to download your blog article to make it available for later reading.
lucia says:

August 17, 2012 at 7:27 am

Charlie–
Sorry, but I need your help to fix that.

The message itself contains the information I need to write a workaround so it doesn’t display and it doesn’t autoforward when you are doing something that is not “asshat”. Quickly use your tool to select the message, copying and pasting the message.

I just did that by loading it. I copied, now I’ll paste

Hi there! This message is from Lucia.

What you are reading is not a feed. If it was a feed, this message would not display.

I don’t know precisely where you are reading this content, this script is telling me the self.window is http://newsblur.com/reader/page/1100897 and the top.window is http://newsblur.com/reader/page/1100897 but my guess is either

an ass-hat third party has copied the html of my blog page, saved it on his server and is displaying it to readers at the domain name of his choice,
an ass-hat third party is displaying my content by framing it and surrounding it with stuff he wants to promote or
someone is doing something I can’t even guess at.

I consider both the first two behaviors to be unacceptable use of my content.

I have decided to insert a script in my content (but not my feed). The script recognizes that my non-feed content is being displayed at a site where I have not authorized display and forwards the reader back to my site.

If you believe you are viewing this at a site that is not behaving like an ‘asshat’, please copy this message and send to me. I’ll review that and code an exception. I will really need to know the self.window and top.window information.

You should be forwarded shortly. If you are not forwarded, please click http://rankexploits.com/musings/ to read my blog at my site.

Lucia

I do need you to copy the message because I need to know the “self.window” and “top.window” values. Once I know those, I can design a workaround. So copy the message and sent it to me. If 10 seconds isn’t long enough for you to do that, I can code it to 15 seconds, 20 whatever. That will give you more time to copy and paste.

I wouldn’t be surprised if the “http://newsblur.com/reader/page/1100897” changes to beginning with “file” when you use read it off line. If so, tweaking to not interfere with that would be a snap. But…. I need to know. So if you (or anyone) can do me a favor, that would be great!

But also, you can avoid the redirect or seeing the message by doing something very simple. Turn off javascript on your tool. You most likely shouldn’t need that when reading off line anyway. (People can’t use Newsblur with javascript turned off.) (Hhmmm… I just read about ReadItLater and it’s not convenient to turn javascript off because you can’t bookmark. So, please, send me the message. Of just tell me if it starts with “file:”.
(Maybe I should just code that immediately since I think it’s rather likely it does read file!)
lucia says:

August 17, 2012 at 7:37 am

Charlie–
I added something I’m guessing will fix the issue. But if it doesn’t fix it, let me know. Specifically, I think if you are reading on line, the address always starts with “http:”. So, people reading at home will not show “http:”. So, the “ass-hat” message is now programmed to only show if there the address starts with “http:”.

Thanks for telling me! ‘Cuz there are an infinite variety of methods to view things and I can’t necessarily forsee them all (and I when I’m in a hurry, I don’t even necessarily think of them!

I need nice people like you to let me know. (A french reader alerted me to the issues with Google translate– so I could fix that. )

Comments are closed.

The Blackboard

Newsblur: A modest proposal

81 thoughts on “Newsblur: A modest proposal”

Where we talk about news. :)