A Random Failure

May 5, 2013 Brandon Shollenberger 74 Comments

John Cook, proprietor of Skeptical Science, recently asked people to host links to a survey he’s running. lucia has made several posts on this, and I’ve taken a particular interest in one aspect of the survey: It’s supposed randomness. Participants are told they’re being shown 10 randomly selected abstracts to review. They aren’t. The selection isn’t random.

That’s a strong conclusion. It’s possibly a damning conclusion. As such, I’m going to take a minute to explain where the idea originated from. If you aren’t interested, just skip to the end of the post.

Believe it or not, the idea came from Skeptical Science itself. Specifically, it came from something a commenter there (chriskoz) said:

Interestingly, my sample included exactly the same “Agaves” (“Neutral”) paper that Oriolus Traillii mentioned @5, how probable is it? John, please make sure that the survey selects truly random selection for all participants (i.e. check your random generator).

He was referring to a comment ten hours earlier where a user (Oriolus Traillii) mentioned an abstract that appeared in his survey. I thought the fact both of them saw the same abstract was interesting, but I didn’t make much of it until I opened up the survey. To my amazement, I was given the exact same abstract as well.

That seemed incredible. Only a handful of people had discussed their results by this point, and two saw the same abstract I saw. There are over 12,000 abstracts to draw from, and each person is only shown 10. The similarity in our surveys was too much of a coincidence for me. I had to look into things.

Now we’ll fast forward a few days (and skip past several wrong ideas) to reach last night. Last night, I examined the HTTP requests/responses used when communicating with the server hosting the survey. I was immediately struck by an idea I didn’t want to believe. It turns out John Cook used something called a “session ID” to help control what papers were presented to people. That was a huge mistake.Â You don’t need to know the details (but go here if you want them). All you need to know is the session ID is stored on your computer. That means you can change it. That means you can change what papers you’re shown. You can do things like pull up a dozen sets of papers then pick one and go back to it.

I was shocked to find out such an obvious design flaw existed in the survey. But I was also grateful. This provided me an opportunity to try to test the random number generator (RNG) used to pick papers to display to people taking the survey. I spent a couple hours doing so. After the first few minutes, I knew something was wrong. After a few hours, I knew I could prove it. So I made this:

That is 160 numbers corresponding to 160 papers I got from 16 consecutive surveys. As you can see, 2 papers showed up three times in 16 surveys. Another 27 showed up twice. That is clearly not random. On the other hand, that is a lot less random than what most people have likely experienced. The reason for that is I picked session IDs I knew would emphasize the lack of randomness in the RNG. In other words, I exploited a flaw in the RNG.

That flaw is related to a concept called entropy, and it’s really quite simple. The sixteen session IDs I chose to use were very similar. My first session ID was 1. The next was 11. The next 111. So forth and so on. The similarity in these session IDs carried through the RNG and into the results. It shouldn’t have. The fact the RNG let that happen proves the RNG is flawed. If it weren’t, I would have gotten results more like:

The difference in these two graphs proves Oriolus Traillii, chriskoz and I did not see the same paper purely by chance. It happened because the “10 random abstracts” shown to us were not randomly chosen. That doesn’t mean the survey is complete garbage, but it does mean the survey cannot be said to be random.Â And it does mean any claims based upon the survey being random cannot be supported until this issue is dealt with.

For those interested in more information, I’ll discuss additional details and problems of the RNG in the comments section.

74 thoughts on “A Random Failure”

Brandon Shollenberger says:

May 5, 2013 at 1:05 pm

There is a detail to my test which I left out of the post due to its complexity (I was trying to make a post everyone could follow with ease). The session ID is not the only input for the RNG. There is at least one other input that changes periodically, and the test I performed depends partially upon the other input.

You can picture it via multiplication. Suppose you use 1, 11, 111, 1111… in simple multiplication. If the number you’re multiplying by is a single digit, say five, your results will have a very strong visual similarity. On the other hand, if the number you’re multiplying by is 25, the visual similarity will be smaller.

A similar issues arises in the test I performed. How obvious the effect it highlights depends on what the other input to the RNG is at any given point. That means you shouldn’t expect dramatic results like I got every time. Sometimes they may just be notable.

(For people interested in cryptography, this issue is well-known. It’s just a matter of alignment in bits.)
Brandon Shollenberger says:

May 5, 2013 at 1:17 pm

In this post, I focused on the frequency of abstracts showing up, but there are plenty of other things to look at. As such, I’m uploading the underlying data for the first graph of my post. It’s 16 rows of 10 entries each. Each row represents one survey, starting with a session ID of 1 and going to session ID of 1111111111111111. The order used by the server is preserved.

There are a number of oddities to it, and anyone who likes numbers should try looking at it. The thing I find most interesting is 14/16 rows have the entries in ascending order. 2/16 do not. It is difficult to come up with an explanation for that.
lucia says:

May 5, 2013 at 2:37 pm

Nice quantification. I did notice when fiddling around to see if I could use HideMyAss, whether I got repeat surveys, the effect of turning browser on and off that I did get repeat titles. That would have been rather improbable the random number generator was selecting uniformly over all the IPs.

Given that blocks of IPs used by *actual humans* are not uniformly distributed, this could have some odd consequences. After all: Dreamhost has a whole block of IPs. Those mostly aren’t used by people. Meanwhile ATT, verizon etc. have blocks. Those are often used by people. Similarly, there are large blocks assigned by continent (ARIN, RIPE, APNIC etc.)

I wonder how it all pans out in the end? Maybe John Cook has considered this? It would be interesting to know.
Les Johnson says:

May 5, 2013 at 2:48 pm

Lucia: FWIW, I had mentioned that in my posting, that the randomization was terrible.

http://rankexploits.com/musings/2013/dear-john-i-have-questions/#comment-112334
Peter Ellis says:

May 5, 2013 at 2:52 pm

Hang on, this proves nothing about how random the survey may or may not be. You’ve proved that similar session ids tend to be shown similar papers. This is irrelevant – what matters is whether users are shown a non-random selection of papers. Unless you also know how the session IDs are allocated to users, you therefore can’t say anything about the survey design.

Certainly it would be a problem if (say) Lucia posted a link, a hundred people from this site all went there at similar times and got given similar session ids, and so collectively saw a non-random selection of papers. The italicised bit is important.
Paul_K says:

May 5, 2013 at 2:56 pm

Hi Brandon,
If Cook wants to use the results to crowdsource his own rankings, then he owes you a big thanks for this. If he is still trying to prove bias in ideation, then…
Les Johnson says:

May 5, 2013 at 3:03 pm

Paul: That would be my guess. Cook is not using a single address; the same abstracts keep coming up; From this, I think he could do some work on the responses from specific sites.
lucia says:

May 5, 2013 at 3:05 pm

Peter Ellis–
It proves that the system is susceptible to being non-random. That means that John will at least have to explain what he thinks he did to make it random and provide tests to show it was ransom.

Since it is, one might want to do a further test. You could do this:

Visit a site like HideMyAss. Obtain a bunch of proxies. Visit a site using all these different proxies, and load the survey over and over each time clearing the session cookie so that gets filled however John Fills it. Log the titles for the papers obtained. I think filling out the survey will not be required for the purpose of this test.
Brandon Shollenberger says:

May 5, 2013 at 3:12 pm

lucia, thanks. I can’t imagine John Cook has actually thought about this. There’s no reason to use a borked RNG like this.

Peter Ellis, why would you think “this proves nothing about how random the survey may or may not be”? The amount of entropy is highly dependent upon the form of the RNG. This post indisputably proves something about how random the survey is because it shows entropy is reduced by the RNG being flawed.

That said, a person can claim the RNG is flawed but that flaw “doesn’t matter.” You can do that with any error anyone points out but doesn’t explicitly quantify the effects of. The question is, is it remotely believable? I don’t think so. I think everyone can make the inductive leap of:

1. The RNG is flawed.
2. Out of a handful of initial discussants, three drew the exact same value despite the population size being over 12,000.
3. The flaw in the RNG caused that.

It is not a formal proof, but I think it is proof enough for any reasonable individual. If you disagree, try working out the probability of three out of fewer than 10 people drawing the exact same abstract.

But if you want a real proof, ask John Cook to disclose the exact process used for generating his sample sets. That’d let us confirm/disconfirm what I’ve said very easily.
j ferguson says:

May 5, 2013 at 3:20 pm

Has John Cook responded to any of the appraisals of his survey mechanisms at any of the sites, or his? maybe by Email
Brandon Shollenberger says:

May 5, 2013 at 3:22 pm

By the way, the non-randomness of the RNG is visible even if one uses the session IDs set by the server. It’s just not as visible. The approach I’m using here is a shortcut to save time and effort. It is tedious to extract data like this so anything that reduces effort is a welcome relief.

If someone wants to go scrape ~50 surveys for this data, they can. I’m not interested. I’m content to rely on a small inductive leap, even if it means my proof isn’t a formal proof.

Though this does remind me of something I wanted to find out. I wanted to know if different IP addresses would get the same surveys if they used the same session IDs. If so, that’d rule out the possibility of user IP addresses being used as input for the RNG.

If not, and user IP addresses are used by the RNG, my proof becomes a formal proof. After all, IP addresses have a lot of similarity to them.
Brandon Shollenberger says:

May 5, 2013 at 3:30 pm

Paul_K:

If Cook wants to use the results to crowdsource his own rankings, then he owes you a big thanks for this. If he is still trying to prove bias in ideation, thenâ€¦

I’m guessing Cook will argue the non-randomness “doesn’t matter.” So what if people aren’t given random sets of abstracts? That doesn’t mean the ratings they gave were wrong. We can just accept some papers probably get more ratings than others.

That is, if he ever discusses it at all. He could always just ignore the problem or even secretly fix it.
Bob Koss says:

May 5, 2013 at 3:44 pm

I also noticed they didn’t appear to be very random. I simulated a random selection of 10000 abstracts using 12300 as the number available for selection. That would be equal to 1000 completed surveys. I also simulated Brandon’s 160 abstracts several times and about half the time there were no duplicates at all.

Hear is roughly what the abstract distribution should look like if Cook gets 1000 completed surveys.
copies, abstracts
0, 5391
1, 4528
2, 1798
3, 474
4, 94
5, 12
6, 3
The number of copies of each abstract is so small you really can’t say much of anything by comparing individual answers to the abstracts. If they reduced the quantity of abstracts to be selected from by 90-95% they might get some decent sample sizes, but then they still have the problem of figuring out how to reliably classify the person answering the survey.

I really don’t know what they expect to accomplish, but it sure looks like it will turn out to be worthless. Comparing scientist ratings of their own papers to others rating only the abstracts is apples to oranges.
MikeR says:

May 5, 2013 at 3:50 pm

“Bwahaha – our fiendish plan is working! We have given the Deniers – grr, hiss, slurp, snarl… – a lure that they cannot resist.”
“Yes, John! And soon they will all be completely distracted, chasing their own tails looking for more flaws in our, uh, ‘survey’.”
“‘Survey’ – heh-heh-heh.”
“Heh-heh-heh. And then…”
“Yes, Stephan?!”
“If we can distract them for even a few minutes, their constant efforts to distract the world community will fail, and all of mankind will instantly recognize the threat to all of nature. Their trillions in oil money won’t be used, and the world will stop using carbon! Ha-ha-ha-ha!”
“But, Stephan?”
“Yes?”
“There is still one threat to our plans.”
“Impossible! What?”
“McIntyre – grr, hiss, slurp, snarl – is still at large! His blog has not posted on the survey. He could still foil our plans by ignoring them.”
“We’re doomed! Doomed.
No – wait! There he is – http://climateaudit.org/2013/05/05/cooks-survey/ – he has been pulled into our trap as well! We rule (=save) the world! Bwahahaha…”

I’m getting in their next payyyper…
Brandon Shollenberger says:

May 5, 2013 at 3:53 pm

I decided to go ahead and address Peter Ellis’s concern. To do this, I deleted my session ID and loaded a survey. I then repeated this seven more times. That gave me a total of 80 data points. I was surprised to find out how much repetition there already was:

http://rankexploits.com/musings/wp-content/uploads/2013/05/RNG_Test2-500×207.png

As you can see, the effect displayed in my post’s graph is not caused by my choice of session IDs. Other things are caused by that though. There are patterns in the data underlying the post’s graph that are not found in the data with random session IDs. That indicates structure is introduced into the “random” results via several different components.

For documentation purposes, the session ids I was given were:

7p7h7qk1l77gamr2sfcq4sdtv4
0ouosthustn5ii01ilsm0fv4e1
1ptthj75dvdm49fl8cb2g6b4l1
2jjrta7uqtmqi4jh54n21gvkv7
35eso8i65c3eadm2skfah7cmn0
9aq6tg3lj5t0pisbj55ncbj5i1
hjb8m6p3143osskjvmhs0cqte1
m99bv05mtov0qejtdh4t1f42b3
Brandon Shollenberger says:

May 5, 2013 at 4:01 pm

Bob Koss, the same abstract cannot be selected twice in the same set. That means a random sample of 10,000 won’t actually represent 1,000 completed surveys. It shouldn’t affect your results too much, but it will have some affect.
Bob Koss says:

May 5, 2013 at 5:07 pm

Brandon,

I was aware of that possibility, but given the randomness inherent in any random simulation the error rate for a duplicate within a set is so insignificant in the scheme of things I didn’t feel it worth the effort to bother coding for it.

The probability of a duplicate abstract within a set is less than 0.004.
James says:

May 5, 2013 at 6:12 pm

Brandon, is it possible that it’s a feature not a bug? Perhaps Cook is only interested in a small sub-set of the abstracts, and has rigged the selection to make sure he gets substantial numbers of ratings for those?

The fact that the two commentators at SkS, and then you, all got the same abstract (out of 12,000) suggests greater non-randomness than your (nice) experiment demonstrates.
lucia says:

May 5, 2013 at 6:45 pm

erhaps Cook is only interested in a small sub-set of the abstracts, and has rigged the selection to make sure he gets substantial numbers of ratings for those?

If so, his definition of “random” needs to be better explained. Many are likely to assume he means that each paper is equally likely to be presented for evaluation. Of course, ‘uniform distribution’ is not the defintion of “random”. But Cook ought to be explaining more fully especially if he does not mean “each paper is equally likely to be presented”.

I asked him to better describe what he meant by random and he merely explained how he decided what belonged in the database.
TerryS says:

May 5, 2013 at 6:50 pm

I’m going to make a fool of myself now and attempt to calculate the odds of 3 people getting the same paper.

Person A picks any 10 papers.

Person B has C(12000,10) possible combinations of papers. Of these C(11990,10) possible combinations will not have any papers in common with A.
This means that the odds of person A and B having a paper in common are:

(C(12000,10)-C(11990,10))/C(12000,10) ~= 0.0083

With three people Its way to complicated (and far to many ~~decades~~ years since school) to calculate the odds of one or more papers in common between the three people so I’ll just do it for exactly one paper in common between all 3.

First, there are C(12000,10) * C(12000,10)/2 possible ways that B and C can select their 10 papers.

Next, there are C(10,1) * C(11990,9) * C(11981,9) different ways that A, B and C can have exactly one paper in common. This makes the calculation:

( C(12000,10) * C(12000,10) – 2*C(10,1) * C(11990,9) * C(11981,9) ) / C(12000,10) * C(12000,10) ~= 0.000014

Although the odds will increase slightly when you add in the odds for, say, A and B having 2 papers in common and one of those is also in common with C, it will not make any significant difference.
Skiphil says:

May 5, 2013 at 8:01 pm

re: Cook’s survey

I am not a scientist but I have commented at CA on what seems to be a fatal ambiguity about Cook’s rating system, and about any similar broad cataloging of diverse scientific papers to judge how many support an “AGW” thesis:

http://climateaudit.org/2013/05/05/cooks-survey/#comment-417819
Brandon Shollenberger says:

May 5, 2013 at 10:58 pm

James:

The fact that the two commentators at SkS, and then you, all got the same abstract (out of 12,000) suggests greater non-randomness than your (nice) experiment demonstrates.

Definitely. The non-randomness of the RNG is far worse than I’ve shown. I’ve only shown the most obvious aspect of it because I’ve been trying to keep things simple. There’s a lot more we could discuss if people watn to get into the math of randomness.

lucia:

If so, his definition of â€œrandomâ€ needs to be better explained. Many are likely to assume he means that each paper is equally likely to be presented for evaluation. Of course, â€˜uniform distributionâ€™ is not the defintion of â€œrandomâ€.

When it comes to computers (and arguably, everything else), there is no such thing as truly random. As such, we don’t use “random” to mean truly random. We use it to mean “random enough we can’t find patterns in it.” I can’t begin to imagine how John Cook could say an RNG with such obvious patterns could be “random.”

I asked him to better describe what he meant by random and he merely explained how he decided what belonged in the database.

I wish I knew what he did to make his RNG. If I had code to replicate it, I could create as much data as I want. That’d make testing the limits of its randomness much easier. Plus, seeing the algorithms might let me discern the problems structurally.

I think I’ll try e-mailing him. The page does say:

If you have any questions about the survey or encounter any technical problems, you can contact John Cook at j.cook3@uq.edu.au

I think this qualifies as a technical problem.
Brandon Shollenberger says:

May 5, 2013 at 11:22 pm

Skiphil:

I am not a scientist but I have commented at CA on what seems to be a fatal ambiguity about Cookâ€™s rating system, and about any similar broad cataloging of diverse scientific papers to judge how many support an â€œAGWâ€ thesis:

I honestly don’t see a problem in what you’re referring to. You basically say papers may assume AGW without providing evidence for AGW. That’s a non-issue. Even if a consensus wouldn’t provide support for a belief, we can still observe whether or not there is a consensus. You are wrong to say:

it can only be scientifically sound to include the papers which actually provide EVIDENCE for â€œAGWâ€

Rather than assume the survey is for one purpose it doesn’t suit, we should assume it’s for a purpose it does suit. Namely, we should assume it has something to do with examining how different people interpret the same abstracts.
samD says:

May 6, 2013 at 12:59 am

It’s possible the design includes a small set of abstracts to be used as controls. Either to identify fake or anamolous ratings, or to act as normalisation or reference points against which to evaluate other ratings (eg holdout tests). For instance he may be using 8 genuinely at random, plus 2 from a standardised set, or something similar. Just because it’s not all random doesn’t invalidate the design without knowing what he is trying to do.
Brandon Shollenberger says:

May 6, 2013 at 1:22 am

samD, it’s interesting you say that. I just read an e-mail John Cook sent me, and your emphasis on “knowing what he is trying to do” is definitely appropriate. I’ll post about his response once I’ve clarified a couple things with him.
GrantB says:

May 6, 2013 at 3:08 am

I’m less concerned than some about the randomness or not of Cook’s RNG. What I am concerned about is that I’m an Australian taxpayer forking out hard earned folding to finance this crap.
Brandon Shollenberger says:

May 6, 2013 at 5:09 am

I decided not to put off sharing John Cook’s response because I think the material I discussed in the next post it is important. And since I’ve shared that, I might as well discuss what’s he’s said about my conclusions in this post. You might remember I posted this earlier:

There are a number of oddities to it, and anyone who likes numbers should try looking at it. The thing I find most interesting is 14/16 rows have the entries in ascending order. 2/16 do not. It is difficult to come up with an explanation for that.

John Cook’s e-mail to me said:

I use an SQL query to randomly select 10 abstracts. I restricted the search to only papers that have received a â€œself-ratingâ€ from the author of the paper (a survey we ran in 2012) and also to make the survey a little easier to stomach for the participant, I restricted the search to abstracts under 1000 characters. Some of the abstracts are mind-boggingly long (which seems to defeat the purpose of having a short summary abstract but I digress). So the SQL query used was this:
SELECT * FROM papers WHERE Self_Rating > 0 AND Abstract != â€ AND LENGTH(Abstract) < 1000 ORDER BY RAND() LIMIT 10

ORDER BY RAND() is inefficient, but simple. What it does is assign a random value to each item selected, then sort the items based on that random value. That should produce randomly ordered samples every time.

It doesn’t. Surveys presented to the user are often sorted by order. It’s clearly not just by chance. The odds of randomly arranging ten numbers in ascending order are too low. So how can that happen? If John Cook’s description for how the samples are chosen is correct, there must code that sometimes sorts samples for no particular reason. Why would there be?
Brandon Shollenberger says:

May 6, 2013 at 5:42 am

An additional issue is Cook’s code doesn’t show how the RAND() function is seeded. The seed of an RNG determines its output so this is important. If you use the same seed, you’ll get the same output from RAND() every time. He obviously doesn’t do that, but without knowing what he does do, it’s difficult to say much. Similarly, he told me:

I didn’t use session ID as a seed. Session Ids were only used to detect if someone was submitting the survey more than once, in which case it showed the same 10 papers. So for someone participating in the surveys only once, Session Id was irrelevant.

This is quite important. If the session ID one uses is not used as a seed, the test I did should not have produced results any different than if I had used completely random session IDs like I did later (though that image is with a smaller sample size).
That’s odd as only two of the sixteen series I showed were unordered whereas eight of the last twenty series I’ve tested were unordered. In fact, when I performed that test, I did two additional tests (with 25 and 26 1s). Both resulted in an ordered set. There are a number of other differences between that test’s results and other samples I’ve gathered.

It’s possible that’s pure chance. It’s also possible there was an alignment like I thought, but that it was caused by whatever else is being used to seed the RNG, not my changing of the session IDs. Whatever the explanation, it’s weird that the one time I performed the test, I got more dramatic results than any other time.

By the way, I assume Cook misspoke when he said he only used session IDs to “detect if someone was submitting the survey more than once.” Presumably, he meant he used them to detect “if someone was loading the survey more than once.” After all, I didn’t submit surveys in my testing.

I’m not sure what the purpose of it would be. Was the purpose was to prevent people from loading different surveys to cheat? I could sort of get that since some people might not know how to change cookies. The problem is he exacerbated the potential for cheating for anyone who did know how to change cookies. That seems silly.
TerryS says:

May 6, 2013 at 6:05 am

His selection criteria might severely limit the number of abstracts being used. He might have 12000 abstracts in the database but by restricting the selections to those that are self rated and under 1000 characters a significant portion will never be selected.

For example, Steve McIntyre and Ross McKitrick’s 2003 article in Energy & Environment would fail the selection criteria as the abstract has 1009 characters. Mann’s 2008 reconstruction would also fail since it has 1121 characters.

John Cook should provide the results of running this query so as to determine how many abstracts are selectable.

SELECT COUNT(*) FROM papers WHERE Self_Rating > 0 AND Abstract != â€ AND LENGTH(Abstract) < 1000
Bob Lacatena says:

May 6, 2013 at 6:12 am

Brandon…

Do a little research, why don’t you?

http://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand

The MySQL documentation tells you the pattern that should be used to randomly select rows, and it does not explain how the RAND() function is seeded.

On “detect” versus “load.” That’s stupid semantics. What do you think the difference means? From a programming point of view, he needs to detect if the same person is loading the same survey, so that when it is submitted it is not treated as a separate case. Then when it is submitted (his main concern), the new submission must replace the old one rather than create a new one. Everyone only gets one set of abstracts to do. That’s a basic condition. What difference does it make if he used the word “detect” (his programmer’s point of view) versus “load” (your uninformed user’s point of view)?

Sheesh.

Why is everything an underhanded conspiracy with you people?

I can’t believe you created an entire post and thread on this. What exactly do you think he’d be accomplishing by masterfully non-randomly assigning abstracts? And how do you think he’s deciding what to give to which people?

I can’t believe that this post and comment thread even exist.
lucia says:

May 6, 2013 at 6:26 am

Brandon:
This is what John Cook told people he did:

I have compiled a database of around
12,000 papers listed in the â€˜Web Of Scienceâ€™ between 1991 to 2011 matching
the topic â€˜global warmingâ€™ or â€˜global climate changeâ€™. I am now inviting
readers from a diverse range of climate blogs to peruse the abstracts of
these climate papers with the purpose of estimating the level of consensus
in the literature regarding the proposition that humans are causing global
warming. If youâ€™re interested in having your readers participate in this
survey, please post the following link to the survey:

[ Link redacted. ]

The survey involves rating 10 randomly selected abstracts and is expected
to take 15 minutes.

Participants may sign up to receive the final results
of the survey (de-individuated so no individualâ€™s data will be published).
No other personal information is required (and email is optional).
Participants may elect to discontinue the survey at any point and results
are only recorded if the survey is completed. Participant ratings are
confidential and all data will be de-individuated in the final results so
no individual ratings will be published.

He is now saying the 10 are not randomly selected from the full database of 12,000. Rather the full database is trimmed to some unknown smaller number that
* That were rated by their authors
* Are short.

These are two different things. As some may recall I specifically asked him:

“5) Could you better describe the selection process for the papers which you describe here as â€œrandomâ€? (For example: Did you identify all papers published in some selection of journals, number each and then select using a random number generator? )”

John’s answer to me merely repeated the information about how he’d created the larger database. These rather important steps that affect ‘randomness’ or more specifically, the characteristics of the population of papers being evaluated were not revealed.
AnonyMoose says:

May 6, 2013 at 6:34 am

RAND() might not be perfectly random — but is it random enough for this purpose?

“RAND() is not meant to be a perfect random generator.”
http://dev.mysql.com/doc/refman/5.0/en/mathematical-functions.html#function_rand
Brandon Shollenberger says:

May 6, 2013 at 6:35 am

Bob Lactena:

The MySQL documentation tells you the pattern that should be used to randomly select rows, and it does not explain how the RAND() function is seeded.

What do you mean the “documentation tells you the pattern that should be used”? The documentation doesn’t point to any patterns anyone should use.

And why would you tell me it doesn’t explain how the RAND() function is seeded? How does a lack of an explanation address anything I’ve said?

On â€œdetectâ€ versus â€œload.â€ Thatâ€™s stupid semantics. What do you think the difference means? … What difference does it make if he used the word â€œdetectâ€ (his programmerâ€™s point of view) versus â€œloadâ€ (your uninformed userâ€™s point of view)?

I never contrasted “detect” and “load.” I contrasted “submit” and “load.” Detection is necessary in both.

Why is everything an underhanded conspiracy with you people?

What “underhanded conspiracy” do you think I believe exists? I can’t imagine how saying someone did a bad job of writing a program is envisioning a conspiracy.

What exactly do you think heâ€™d be accomplishing by masterfully non-randomly assigning abstracts?

If we’re just referring to programming issues, nothing. I generally don’t think making mistakes is done to accomplish things. But if we include Cook’s admission that he picked an arbitrary subset, what he’s accomplishing is an exaggeration of his work.

And how do you think heâ€™s deciding what to give to which people?

I don’t think he deciding such at all.

I canâ€™t believe that this post and comment thread even exist.

Is that why you’ve failed to say a single accurate thing about it?
Brandon Shollenberger says:

May 6, 2013 at 6:43 am

AnonyMoose:

RAND() might not be perfectly random â€” but is it random enough for this purpose?

It is, if it is used properly. Even a “perfect” function can give undesirable results if you mishandle its input/output.

lucia:

Johnâ€™s answer to me merely repeated the information about how heâ€™d created the larger database. These rather important steps that affect â€˜randomnessâ€™ or more specifically, the characteristics of the population of papers being evaluated were not revealed.

To tell the truth, I kept suspecting he might have used just a subset of his database. I never said so because I figured it was just cynicism and there would be some innocent explanation. Maybe I shouldn’t have been so charitable.

But I didn’t want to believe he’d intentionally tell people untrue things, over and over again.
lucia says:

May 6, 2013 at 6:45 am

Iâ€™m not sure what the purpose of it would be. Was the purpose was to prevent people from loading different surveys to cheat? I could sort of get that since some people might not know how to change cookies. The problem is he exacerbated the potential for cheating for anyone who did know how to change cookies. That seems silly.

He could both look at session ID and IP. One, the other, or both can slow down cheating. It’s certainly better than doing nothing (which seems to have been the Lewandowsky method.)

SteveM write “It is easy enough to access both blogs using hidemyass.com and then click on their link to the survey.” I”m tried hidemyass.com’s online proxy and found I could not view the abstracts. I didn’t try to fill out the forms and submit. So I’d assumed that John had coded something to inhibit submission of forms that were framed. If John did that, I also approve of that. (You could submit Lewandowsky surveys from forms framed in HideMyAss or other online proxy tools. )

Maybe Steve is setting up his browser to go through the proxy IPs from hidemyass and possibly that works. That would not involve framing and would mean that people who are accustomed to using proxies could submit with multiple proxy IPs. Obviously, they could clear cookies and overcome the session id’s too. The question then is: Did John think of another fall back. Of course we don’t know … yet!

To my eye, the two observable things are both great improvements over Lewandowsky even if imperfect. (Nothing can completely prevent cheating for an online survey. )

That said, John’s description of a database of 12,000, followed by a statement that 10 were selected “randomly” is at best deceptive. He is now telling us that papers are not selected from the database of 12,000 but rather from those with short abstracts rated by their authors. Assuming that the authors who rated their own surveys self-selected to at least some extent, it is bit of a stretch to simply assume that authors willing to fill out John’s survey are typical of all authors. After all: These are authors who self selected to fill out John’s survey, and those characteristics could easily make them different from others. Possibly we will learn the self-select rate when John’s paper discussing the review of papers by individual authors comes out.

But for now: We have learned that contrary to the impression conveyed by John Cooks invitation the 10 papers are not “randomly” selected from a database of 12,000– at least not if “random” is understood to mean that each paper was equally likely to be selected.
Brandon Shollenberger says:

May 6, 2013 at 6:58 am

lucia:

To my eye, the two observable things are both great improvements over Lewandowsky even if imperfect. (Nothing can completely prevent cheating for an online survey. )

I think the issues with using proxies were inadvertent, but I could be wrong. Either way, I don’t think either of these two things are wrong or bad. I just don’t get the purpose behind the one.

If you’re worried about people submitting multiple surveys, why use session IDs like this? You could accomplish just as much by checking session IDs when they submit the survey. That would accomplish just as much.

That said, Johnâ€™s description of a database of 12,000, followed by a statement that 10 were selected â€œrandomlyâ€ is at best deceptive.

I think you are being far too generous. In one sentence, he referred to “around 12,000 papers.” In the very next sentence, he said he was inviting people “to peruse the abstracts of these climate papers.” The only way to interpret that is as referring to the ~12,000 papers.
lucia says:

May 6, 2013 at 8:12 am

Brandon

If youâ€™re worried about people submitting multiple surveys, why use session IDs like this? You could accomplish just as much by checking session IDs when they submit the survey. That would accomplish just as much.

Why? Maybe merely because this is the way he thought up. It was sufficient to his purpose, he coded this method and was done.

Also, he might not want people to be able to refresh multiple times until they get a survey they prefer.

In one sentence, he referred to â€œaround 12,000 papers.â€ In the very next sentence, he said he was inviting people â€œto peruse the abstracts of these climate papers.â€ The only way to interpret that is as referring to the ~12,000 papers.

Actually, I agree. One has to torture textual interpretation to read what he said to suggest there was a major down selection step between the 12,000 papers and the 10. Moreover, note that I did ask John for clarification on this in my first reaction to getting the invitation. Also, people immediately began commenting on getting identical papers at SkS; this suggests they read the description exactly as we here did. No no ‘clarification’ about downselection was volunteered at that time.

The only thing that is eliciting ‘clarification’ is your showing that the papers are clearly not selected “randomly” from the batch of 12,000.
patrioticduo says:

May 6, 2013 at 8:45 am

The SQL RAND() function, as used in the SQL select statement, is NOT very random between survey participants. To improve the randomness, the SQL select statement RAND() function must include a seed value that is itself based upon some other variable (hopefully random) that *must* be different for each and every participant/session ID etc. Otherwise, the use of unseeded RAND() will produce the very same pseudo random number series set. Relying on an unseeded PRNG in computing is a sure way to get a non random sequence.
patrioticduo says:

May 6, 2013 at 8:57 am

In other words, web server processes are multi-threaded with a single identical thread spawned for each client access. This means that the “state” of the RAND() function is pre-initialized to the same value each and every time a different user accesses the page that generates the SQL call. So most likely, the RAND function is returning the same pseudo random number series for each and every client resulting in very similar if not identical abstract results being returned to all. Since the desire is to return random abstracts to each user access, the RAND function needs to be seeded with a unique value per client access/call so that different pseudo random number series are returned for each client.
Bob Lacatena says:

May 6, 2013 at 9:28 am

patrioticduo,

To improve the randomness, the SQL select statement RAND() function must include a seed value that is itself based upon some other variable…

No.

This means that the â€œstateâ€ of the RAND() function is pre-initialized to the same value each and every time a different user accesses the page that generates the SQL call.

No.

You are misunderstanding how it works. Re-read the documentation.

Hint: The function is seeded “under the hood” by MySQL when the MySQL daemon is first started. How they seed it is undocumented, but common practice is to use the current time in milliseconds. Subsequent calls will produce different results… consistently. If you don’t believe me, create a MySQL database and try the statement SELECT RAND();
Bob Lacatena says:

May 6, 2013 at 9:31 am

Lucia,

What value would you find in the data from a survey that offers 10 random abstracts from a pool of 12,000 papers, and the population that takes the survey is only 1,000 or 2,000 people? How many survey participants do you think you would need to be able to generate a reasonable analysis of the data using a population of 12,000 papers?
DocMartyn says:

May 6, 2013 at 9:34 am

I have deliberately not read any of the other comments in the thread, so as not to bias myself.
It is not uncommon to have a positive and negative control within a study. You can use a positive/negative control pair which should record outputs of, say, +10 and -10, as a way to filter out faulty responders.
The number of papers which are clearly ‘right’ and clearly ‘wrong’ will be low.
A positive control would be something along the lines of ‘Study of thermometer and tree ring record from 1900-2000 in New England shows weak coupling’.
A negative control would be something along the lines of ‘Alaskan pollen record and Inuit Sexuality’.

From the response to internal controls one may be able to calibrate the responders responses and get rid of the bots and blockheads.

Now I can read the actual thread, not biasing my analysis.
Bob Lacatena says:

May 6, 2013 at 9:37 am

Brandon,

Without going crazy programming everything someone can think of, which behavior do you think will produce more accurate data in a survey…?

(1) Every visitor gets a new set of abstracts, and can submit multiple copies which will be ignored behind the scenes, or else can be told after spending the time doing a second set and submitting that their submission was rejected.

(2) Every visitor gets the same set of abstracts, so that if they wish to change or correct their ratings and resubmit, they can do so.
patrioticduo says:

May 6, 2013 at 10:05 am

Bob, “Hint: … How they seed it is undocumented, but common practice is to use the current time in milliseconds. Subsequent calls will produce different resultsâ€¦ consistently. If you donâ€™t believe me, create a MySQL database and try the statement SELECT RAND();”

Thanks for that except that you don’t know how the function is being seeded. Therefore, proper use of the function would include using a unique seed for each participant to ENSURE that the returned result is randomized across participants. It is quite common for rand functions to return similar pseudo number random numbers sets when the programmer relies on the function seeding itself correctly. There are various reasons for this including seeding only once during process startup; in this case the SQL process itself.

“subsequent calls” will result in random numbers returned that are from the same pseudo random number set. That is, random if multiple calls are made for each participant but not very random at all across multiple participants if only called once per participant.
Bob Lacatena says:

May 6, 2013 at 10:43 am

patrioticduo,

Again, please read the documentation. The proper application of ORDER BY RAND() LIMIT 100; is clearly documented, with appropriate caveats, and is perfectly applicable in this situation.

You are tilting at windmills.
AnonyMoose says:

May 6, 2013 at 10:51 am

patrioticduo, the random number is coming from within the MySQL database server and not the web server. The random seeding happens when the database server is started — such as when the hardware is booted. The numbers will be different for each time RAND() is invoked.

It is initialized with the time when the server was started, and then a simple pseudorandom calculation is done each time a random number is needed.

http://stackoverflow.com/questions/15766420/mysql-rand-how-often-can-it-be-used-does-it-use-dev-random
http://forums.mysql.com/read.php?45,561503,561624#msg-561624
Kan says:

May 6, 2013 at 11:42 am

Umm, good discussion on the MySQL DB RAND() function. And we know that Cook uses a MySQL DB how? We don’t, so statements like “Read the documentation” rely on an assumption.

It is the undocumented assumptions, in the methods of this survey, that this post contributes to challenging.
AnonyMoose says:

May 6, 2013 at 12:02 pm

Kan,
Good point. It looks like you’d get nonrandom results with a MS SQL Server.
lucia says:

May 6, 2013 at 12:13 pm

Bob

Lucia,

What value would you find in the data from a survey that offers 10 random abstracts from a pool of 12,000 papers, and the population that takes the survey is only 1,000 or 2,000 people?

I don’t know. My answer could range from “a lot” to “very little” depending on other factors. I have no quibbles with studies trying to determine the average weight of 21 year old american males based on weighing 1,000 americal males selected randomly. The average could probably be determined fairly precisely if you weighed even fewer– 200 might be enough. On the other hand, if you screened them saying “Well only take the subset who are less than 5’10”,” I would be dubious of the claims. But my being dubious would not spring from our having only weighed 1000 men who are less than 5’10”. It would spring from screening out and creating a non-random sample of men to drawfrom.

How many survey participants do you think you would need to be able to generate a reasonable analysis of the data using a population of 12,000 papers?

Depends what they are trying to report or learn. If you merely want to discover the average Likert rating returned by that survey, I suspect we would need roughly 100 participants rating 10 papers each. If every one got a different paper, that would actually improve our precision.

In my men’s weights analogy, I figured we could use 200 men. It’s a guess. But I think drawing an unscreened random sample would be more important than increasing the total number of men weighted.
RokShox says:

May 6, 2013 at 12:49 pm

Why wouldn’t Cook have trimmed the abstract database using his criteria of “<1000 words" and "Has Author Self-Assessment" to begin with? Is there some rhetorical value in being able to claim "12,000" abstracts?

He could publish the final overall frequency distribution of delivered abstracts to allay concerns over the RNG. Assuming it ends up uniform.
lucia says:

May 6, 2013 at 1:35 pm

Bob Lacatena

First: It is an oddity of my blog that I discourage anyone from arguing by asking rhetorical questions. It tends to be an utter waste of time since at blogs, people end up not knowing what point anyone is trying to make.

If you goal is to try to prove some point by asking these, I suggest you simply make your point. Asking rhetorical questions is permitted as a rhetorical device, but you are very strongly encouraged to volunteer your own answers before expecting anyone else to answer your questions. This facilitates conversations.

Now: I will indulge you with my answers and for the time being I will assume the purpose of your question was to get me to provide my answers and that you were certain my answers are so obviously shared by all that you agree and full endorse my answers.

Brandon,

Without going crazy programming everything someone can think of, which behavior do you think will produce more accurate data in a surveyâ€¦?

(1) Every visitor gets a new set of abstracts, and can submit multiple copies which will be ignored behind the scenes, or else can be told after spending the time doing a second set and submitting that their submission was rejected.

(2) Every visitor gets the same set of abstracts, so that if they wish to change or correct their ratings and resubmit, they can do so.

Of the possibilities you suggest, I think

(1a) “Every visitor gets a new set of abstracts, … can be told after spending the time doing a second set and submitting that their submission was rejected” is the best way to get the most accurate data provided that all data from the submission are stored and the researchers explain their reasons for rejecting the data that was submitted when they write up their paper. This method tends to inhibit people from submitting ones they know the gig is up.

(1b) “Every visitor gets a new set of abstracts, and can submit multiple copies ” is the 2nd best way provided that all submissions are stored — as described above.

(2) “Every visitor gets the same set of abstracts, so that if they wish to change or correct their ratings and resubmit, they can do so.” I assume you mean on reloading that the visitor gets the same abstracts they got before. This is clearly a bad way since part of the survey assumes people don’t come back for do-overs.

I will now happily assume that the point of your questions was to get me to type up the much longer– and quite obviously correct answer for you, and that you utterly and totally agree with me. (If you don’t agree, I think you may now understand the wisdom of not arguing by asking rhetorical questions.)

However, I should note that Cooks survey’s do none of these things. I loaded a survey, copied the text to mull my answer over. I came back after turning my browser off and I got a different set of 10 questions. So, Cooks is some other hybrid sort of thing.

The facts is: I think online surveys open to everyone rarely don’t work if you are trying to get a scientific answer. All involve self-selection, ‘revoting’, gaming and so on. So I think none of these are good.
Brandon Shollenberger says:

May 6, 2013 at 1:47 pm

patrioticduo:

The SQL RAND() function, as used in the SQL select statement, is NOT very random between survey participants.

Be careful when saying this. John Cook provided a SQL query, but that doesn’t mean he provided everything we’d need to know what he did. My guess is he does seed the RAND() function, and I’ve asked him to show what he does to do so.

Bob Lactena:

Without going crazy programming everything someone can think of

Is it the crazy part, or the programming part you’re trying to avoid? I assume its the latter because you’ve still not addressed the absurdity of your earlier comment.

(1) Every visitor gets a new set of abstracts, and can submit multiple copies which will be ignored behind the scenes, or else can be told after spending the time doing a second set and submitting that their submission was rejected.

(2) Every visitor gets the same set of abstracts, so that if they wish to change or correct their ratings and resubmit, they can do so.

It seems remarkable you’d claim to know John Cook is taking the last survey submitted by someone. He hasn’t said anything of the sort, so unless you’re in personal contact with him, you couldn’t possibly know it to be true. Given that’s the only reason there could be any theoretical improvement in accuracy in his results…

On the upside, you’ve certainly given a novel approach to survey taking. Rather than hope people only take the survey once, we can encourage them to take it multiple times to “correct” their answers.

RoxShox:

Why wouldnâ€™t Cook have trimmed the abstract database using his criteria of â€œ<1000 words" and "Has Author Self-Assessment" to begin with? Is there some rhetorical value in being able to claim "12,000" abstracts?

Minor correction: The criteria was <1,000 characters, not words. I don't think I've ever seen an abstract two or more pages long!

Aside from rhetorical purposes, the only reason I can think of to not trim the database is he'd have to create a new database. He might have done it this way just to save himself time and trouble.
AnonyMoose says:

May 6, 2013 at 1:55 pm

Brandon,
You should ask him if he is using a MySQL server, or if he is using some other kind of database server. This SQL is only known to work properly with MySQL. And if he is using MySQL, the random seeding is done when the database server is started; the source code is available.
Brandon Shollenberger says:

May 6, 2013 at 2:17 pm

AnonyMoose:

Brandon,
You should ask him if he is using a MySQL server, or if he is using some other kind of database server. This SQL is only known to work properly with MySQL.

I would ask, but since his first response to me, he hasn’t responded again. I already sent two e-mails since his last one. I’d rather not send another without getting a response first. It’d seem like flooding. Plus, he may have decided to stop responding to me. If so, it’d just waste my time.

And if he is using MySQL, the random seeding is done when the database server is started; the source code is available.

The very first thing I asked him was about how he seeds his RAND() function. My assumption is he does something, and he just didn’t send it to me. If that’s correct, MySQL’s initialization of the RAND function is meaningless.

The reason I assumed that is If he isn’t reseeding the function, why would we get different samples at all?
Bob Lacatena says:

May 6, 2013 at 4:11 pm

lucia,

Depends what they are trying to report or learn.

Okay, so what did John Cook tell you the purpose of the survey was?
Bob Lacatena says:

May 6, 2013 at 4:15 pm

Brandon,

My assumption is he does something, and he just didnâ€™t send it to me.

Bad assumption. The only reason to provide an explicit seed in my SQL is if you want to generate a repeatable sequence of random numbers, i.e. to regenerate the exact same sequent of random numbers at a later time, by re-initializing using the same seed.

But if all you want is a pseudo-random sequence of numbers at any point in time, then you just call RAND() with no seed parameter — ever — and go with the server seeded sequence.
Bob Lacatena says:

May 6, 2013 at 4:27 pm

Lucia,

First: It is an oddity of my blog that I discourage anyone from arguing by asking rhetorical questions.

So you could tell that it was rhetorical, yet you took the time to answer, which made you think about the problem some more, even if your motivations drove your thought process relentlessly toward the conclusion you wished to reach (funny how often that happens among skeptics).

The reason I asked a rhetorical question was to point out that when designing computer software of any sort, including an online survey, one needs to make design decisions, and there are factors other than what you might be tempted to start bandying about in a blog comment thread. People here don’t understand how the RAND() function operates, they don’t know the purpose of the survey, they don’t know a lot… but they are off to the races with cries ranging from incompetence to conspiracy (Brandon says he’s not accusing John of conspiracy, and yet he goes on to say “But if we include Cookâ€™s admission that he picked an arbitrary subset, what heâ€™s accomplishing is an exaggeration of his work.” — Admission? Accomplishing? Sounds like accusatory conspiracy talk to me.)

Just recognize that you have no information, no idea what the purpose of the survey is, you haven’t had to spend time designing, coding and testing the actual programs, and the lot of you are having fun haphazardly criticizing someone without any real foundation, simply because you (apparently) don’t like him.

That’s all I want you to admit. You don’t like John Cook, so it’s fun to make up stories and make him out to be a villain, based on the most flimsy and ridiculous premise and analysis I’ve seen of late (I won’t say ever, because the blogosphere is full of giant whoppers of flimsy analyses).
Brandon Shollenberger says:

May 6, 2013 at 5:08 pm

Bob Lactena:

Just recognize that you have no information, no idea what the purpose of the survey is, you havenâ€™t had to spend time designing, coding and testing the actual programs, and the lot of you are having fun haphazardly criticizing someone without any real foundation, simply because you (apparently) donâ€™t like him.

You have no information to indicate MySQL was used, yet you act as though you know it was. There are dozens of different SQL servers, and RAND() is only atomic in some of them. If I shouldn’t assume Cook did something to seed his function, you shouldn’t assume he used a language in which he wouldn’t need to seed his function.

(Brandon says heâ€™s not accusing John of conspiracy, and yet he goes on to say â€œBut if we include Cookâ€™s admission that he picked an arbitrary subset, what heâ€™s accomplishing is an exaggeration of his work.â€ â€” Admission? Accomplishing? Sounds like accusatory conspiracy talk to me.)

Sounds like you don’t know what a conspiracy is to me. It reminds me of something.
Brandon Shollenberger says:

May 6, 2013 at 6:38 pm

Bob Lactena’s insistence on discussing MySQL got me curious about something. What PRNG does MySQL use? I’ve spent twenty minutes trying to find an answer to that question without any luck.

I’m mostly curious because some PRNGS have a flaw where the closer the seeds used are to each other, the more similar the initial sequences will be. And if MySQL uses time as a seed…
A. Sinan Unur says:

May 6, 2013 at 9:37 pm

MySQL uses a simple linear congruential generator. Those have well known flaws some of which are listed on Wikipedia.

There are approximately 1.6998883e+34 ways of choosing 10 elements out of a set of 12,000 elements. So, if you draw just five or 10 surveys, the chances of overlap ought to be miniscule.

If we charitably assume that MySQL’s PRNG can generate about 2^62 distinct values in [0,1), it is clear that the ORDER BY RAND() method cannot ever yield all possible subsets of size 10 of a set of 12,000 elements.

However, that, in and of itself, does not explain why you got such overlap in such a small number of draws.

A better way would have been to associate each abstract with a unqiue id, select a subset of those IDs in the script using a decent algorithm and a well known PRNG such as the Mersenne Twister and only retrieve abstracts with those IDs from the database. This would not only reduce load on the database server, but would also give one confidence that quirks of the database server implementation would not effect the randomization process.

FWIW, using the private browsing features of Firefox, Opera, and Chromium, I didn’t get any overlap when I followed the survey link from John Cook’s site.
AnonyMoose says:

May 6, 2013 at 9:50 pm

Brandon,
Reread my above comment. Others already had similar questions about MySQL. The PRNG is initialized when the MySQL server is started, NOT when an SQL query is processed. I linked to what is supposedly the PRNG, and that can be checked because the source code is available.
http://rankexploits.com/musings/2013/a-random-failure/#comment-112522
Brandon Shollenberger says:

May 6, 2013 at 9:54 pm

A. Sinan Unur, I agree with what you say, but we’ve figured out the (at least primary) reason for the oddity I’ve observed. It turns out John Cook was just making things up when he said the abstracts are randomly selected out of 12,000+ papers. In reality, only a small subset of the abstracts are being used.

There is still at least one open question though. When taking the survey, some samples are sorted while others are not. Why would that happen? It couldn’t happen via the SQL query Cook provided. That means it had to have happened in a different step. That’s weird because why would anyone randomly sort some samples and not others? There’s no reason for it.

I’ve got no clue on that one, but I’m content to know the random selection process wasn’t actually random. I assumed John Cook was truthful in his descriptions of the survey. That led me to assume the lack of randomness was because of the RNG. Since my assumption about John Cook was false, my assumptions about the RNG were also false. Despite that, my overall conclusion was right: The survey does not give a random selection taken from 12,000+ papers as claimed by John Cook.

And not that it matters, but I think MySQL’s PRNG generates values in the range of (0,1), not [0,1).
Brandon Shollenberger says:

May 6, 2013 at 11:33 pm

AnonyMoose, thanks for making me take a second look at that comment of yours. I had only looked at the first link in it before. I didn’t realize the second link actually had source code. I could have saved myself some time and trouble if I had.

I’m kind of amazed to see that PRNG is used by MySQL. That’s a fairly old one, and it is known to be pretty bad. It’s not as bad as RANDU (the worst PRNG ever), but I think it’s the second worst one to receive much use in the last few decades. It is known to have very non-random components.
Pingback: The Blackboard » Cookies Cookies (& Bleg.)
Steven Mosher says:

May 8, 2013 at 11:39 am

‘There is a detail to my test which I left out of the post due to its complexity (I was trying to make a post everyone could follow with ease).’

That’s deceptive and misleading.

hehe
Carrick says:

May 8, 2013 at 1:43 pm

So’s every classroom textbook & manual in that sense.

From Knuth’s TEXbook:

Another noteworthy characteristic of this manual is that it doesn’t always tell the truth. When certain concepts of TEX are introduced informally, general rules will be stated; afterwards you will nd that the rules aren’t strictly true. In general, the later chapters contain more reliable information than the earlier ones do. The author feels that this technique of deliberate lying will actually make it easier for you to learn the ideas. Once you understand a simple but false rule, it will not be hard to supplement that rule with its exceptions.

All descriptions are by their nature both incomplete and not strictly true, but that doesn’t make them misleading. Misleading is e.g. when you say you used 12,000 papers, but only actually used 400 in your study (hypothetically of course).
Brandon Shollenberger says:

May 8, 2013 at 6:07 pm

Carrick, I addressed the issue in the very first comment of the post, made just after the post went up. It is effectively the same as offering a footnote.

Mosher is just trolling.
lucia says:

May 8, 2013 at 9:08 pm

Bob Lacatena (Comment #112590)
May 6th, 2013 at 4:11 pm Edit This

lucia,

Depends what they are trying to report or learn.

Okay, so what did John Cook tell you the purpose of the survey was?

He didn’t tell me what the purpose of the survey was. Like everyone else, I’m going by what the survey claims the purpose is. Evidently that’s to assess the consensus in the literature about AGW.

I answered your question pointing to a problem that might analogous. To do that study, we’d probably need about 200 participants. But I gave you the caveat because like everyone else, I’m not convinced the purpose is to assess the consensus in the literature.
lucia says:

May 8, 2013 at 9:38 pm

Bob

So you could tell that it was rhetorical, yet you took the time to answer, which made you think about the problem some more, even if your motivations drove your thought process relentlessly toward the conclusion you wished to reach (funny how often that happens among skeptics).

I answer rhetorical questions all the time. That said: I have no idea why you think your rhetorical question made me think “more.”
But my rule holds. You are new so I am letting you get aways with a few of these. But please avoid this in future. It is not your job to assign people “homework” of “thinking more” while sparing yourself thought.

Brandon says heâ€™s not accusing John of conspiracy, and yet he goes on to say â€œBut if we include Cookâ€™s admission that he picked an arbitrary subset, what heâ€™s accomplishing is an exaggeration of his work.â€ â€” Admission? Accomplishing? Sounds like accusatory conspiracy talk to me.)

I think it only sounds like a conspiracy to someone who doesn’t think or who is trying to reinforce their preconception that the discussion is alleging a conspiracy. It is not because:

a) Something done by one person all by his lonesome is not a conspiracy.
b) People can accomplish things they didn’t intend to accomplish.

If either (a) or (b) apply to what Cook did, then it would not be a “conspiracy”. In fact, Brandon seems to be suggesting both do thats two factors that make it “not a conspiracy”.

Beyond “admitting” something has nothing to do with “conspiring”. Admitting is a verb that can be used like this: “Ok. I admit it. I ate the whole box of chocolates!”. No conspiracy implied!! One can admit all sorts of things and never have been involved in any sort of conspiracy. Unless English is not your first language, you ought to know this.

Just recognize that you have no information, no idea what the purpose of the survey is, you havenâ€™t had to spend time designing, coding and testing the actual programs, and the lot of you are having fun haphazardly criticizing someone without any real foundation, simply because you (apparently) donâ€™t like him.

Huh? No information? About anything? No idea what the purpose of the survey is? At all?

That’s ridiculous. We have access to the text of the survey which states a purpose. You too could read it.

That text also tells people they are going to be presented 10 papers selected from a database– and the wording sure as shooting sounds like it’s supposed to be a database of 12,000 papers. You can go look at the survey and see what it says. So don’t tell us that we have “no information” about these things.

With regard to the conversation here: It is quite clear that the 10 papers are not selected from a batch of 12,000, and when asked John admitted that was the case. They are selected from a smaller batch of unknown number.

We have every right to criticize the wording of the survey and note that it is misleading and this right is unaffected by whether or not we like him. If you want to go look at the wording and explain that you think it’s not misleading, have at it. But don’t try to claim we have “no” information. We do.

Thatâ€™s all I want you to admit. You donâ€™t like John Cook, so itâ€™s fun to make up stories and make him out to be a villain, based on the most flimsy and ridiculous premise and analysis Iâ€™ve seen of late (I wonâ€™t say ever, because the blogosphere is full of giant whoppers of flimsy analyses).

I have no idea why you think Brandon’s analysis is “flimsy”. He:
1) Checked to see if the 10 papers appeared to be selected randomly form a set of about 12,000 papers. (This is what the survey instructions describe as the selection process.)
2) He found the did not appear to be selected from a batch that large.
3) It turns out he was correct. When he asked John Cook, John Cook confirmed the papers were not selected from 12,000.

Brandon didn’t any of this suggested John was a villain.

I think it’s pretty funny that you throwing around accusations like “your motivations drove your thought process relentlessly toward the conclusion you wished to reach ” when it’s quite clear that you are throwing around conclusions that (a) you don’t even try to defend and (b) it’s pretty obvious if you were to exercise the three or four brain cells in your head, your conclusions cannot be supported by the facts available to all of us.

If all you want me to admit is I don’t like John Cook: Yes. I don’t like him. I said so on a previous thread. But whether I like him or not, the fact is Brandon’s analysis is not flimsy: The results were confirmed by John Cook. Whether I like John or not, your claim that we have no information is ludicrous. And– at least with regard to your behavior on this thread, you are showing zero evidence of wanting to state a position and defend it with any evidence or facts.

If you want to defend your point of view do so. But please: try to avoid doing it by making mistatement of facts or by torturing the meaning of well established words like “conspiracy”.
Shub says:

May 10, 2013 at 9:20 pm

Brandon
Just to be clear. Bob Lacatena above is commenter and moderator Sphaerica of Skepticalscience, who, for all purposes has probably ‘moderated’ or deleted some of your comments at Skepticalscience and/or Shaping Tomorrow’s World at some point. It is likely that he contributed to some of the coding behind the survey. It is also, highly likely, that he *knows* the exact internal objectives the survey hopes to accomplish – in the face of rhetorical questions he has posted here.

Interesting, isn’t it?
Shub says:

May 10, 2013 at 10:10 pm

“Just recognize that you have no information, no idea what the purpose of the survey is, you havenâ€™t had to spend time designing, coding and testing the actual programs, and the lot of you are having fun haphazardly criticizing someone without any real foundation, simply because you (apparently) donâ€™t like him.”

Dude, if people reason out and take apart something in the open, it is messy and this is exactly how it appears. Which is several times better than designing surveys to reflect pre-determined conclusions but at the same time appear as neutral as possible.

One possible reason for not ‘liking’ Cook is because he just wrote a paper collating blog comments of the people whom you now say don’t like him.
Barry Woods says:

May 11, 2013 at 2:18 am

http://beforeitsnews.com/science-and-technology/2012/09/poptech-more-shenanigans-at-sniptopicalscience-2469252.html

“The inaptly named man made global warming true believer site â€˜skepticalscienceâ€™ has been busy with the censors scissors again. I fell foul of their foul policy last year, so I have sympathy with Andrew, who runs the popular technology website and goes under the handle â€˜poptechâ€™. Andrew has been doing a sterling job keeping tabs on all the papers in the scientific literature which run counter to the MMGW meme. Something the SKS cheif book cooker John Cook doesnâ€™t like. Hereâ€™s Andrewâ€™s account of how a thread trying to deny the existence of these 1000+ scientific papers got out of John Cooks control, and resulted in â€˜poptechâ€™ having every comment heâ€™d ever placed at SKS deleted. George Orwell would refer to him as an unperson. Shame on you John Cook.

Skeptical Science: The Censorship of Poptech

â€œThe impact of that ban on PopTech was to silence him.â€ â€“ Sphaerica (Bob Lacatena) [Skeptical Science]

——-

Upshot SkS, lewandowsky, Cook think sceptics are nutty conspiracy theorists, not to be listened to by the public, therefore everything sceptics do is interpreted as showing signs of nutty conspiratorial thinking, no further debate required.

Total confirmstion bias, wishful thinking and motivated reasoning at work.

And yes all my comments on Dana articles at Shaping tomorrows world, ref Lew, all removed
Brandon Shollenberger says:

May 11, 2013 at 11:19 am

Shub, interesting. As far as I know, Bob being a moderator for SkS wasn’t known here before you said it. That’s sort of a relevant piece of information.

Barry Woods, I’d have more sympathy for Poptech if not for him being petty and rude. That detracts from his posts. It makes me wonder what sort of things he said in his deleted comments.

(Plus, for all his mocking of SkS people over even minor mistakes with Google Scholar, he actually made one of his own.)
BArry Woods says:

May 12, 2013 at 12:15 pm

Perhaps:

BUT, like with Thomas Fuller, the mentality was not just to delete some comments, but to retrospectively go back and delete ALL previous comments!! v odd behaviour.

Comments are closed.

74 thoughts on “A Random Failure”

Where we talk about news. :)