Statistical Test Described by Joshua

In comments, Joshua described, and so indirectly proposed, a statistical test one might use to determine whether a particular data set for the number of trans mass murders, k, out of a collection of n mass murders could be used to test the hypothesis that trans individuals commit mass murders at the same rate as non-trans individuals. I don’t think his intention in posting this was to propose a test. I think his intention was to tell me how woeful my view that the main problem with running statistical tests on hypotheses about the rate at which trans individuals commit mass murders is that we don’t have reliable data. But, whether or not he intended to do so, he did propose a method which, of course, we could use to test actual samples.

In this post, I’m going to quote from the (currently reserved) comment in which he describes a general statistical test. He applied the method to hypothetical data. I’ll also be correcting some of the numerical errors that crept into his analysis of that hypothetical data. I will also be releasing that comment and placing it in the comments thread for this post so you can see the full comment.

On to Joshua’s comment! I’ll skip the first bit as irrelevant to the proposed statistical test and start with this.

Joshua’s method extracted from comment

Oh, and it’s cute that you’d want to take the time to show me a test, but really don’t waste your time. I can just ask Grok to run some numbers for me:

Joshua then post Grok’s answer to a question Joshua asked; the question itself is not. So we will call the entity responding “Joshua-Grok”. Content from Joshua’s comment is in blockquotes. I will break that up and add my own observations.

Joshua-Grok’s answer begins:

Expected Proportion Under the Null

If trans people make up 1% of the population, we expect 1% of shooters to be trans by chance. For a sample of n shooters, the expected number of trans shooters is n × 0.01.” .

That is indeed what we expect if individual trans individuals become shooters at the same rate as non-trans individuals.

2. Binomial Test Setup

Let n be the number of school/church shooters in the sample (e.g., 50 or 100 over a decade).
Let k be the number of trans shooters observed.

The probability of observing k or more trans shooters by chance, given a population proportion of 0.01, is calculated using the binomial distribution:P(X ? k) = 1 – P(X < k), where X follows a binomial distribution with parameters n and p = 0.01.

This would seem to tells us something about the situation Joshua wanted Joshua-Grok to analyze. It seems to be school or church shooters (as opposed to other possible shooters.) Joshua wants Grok to assume 1% of the general population is trans. The variables ‘n’ and ‘k’ are the total number of shooters and total number of trans shooters respectively.

I’ll now tease out his notation so we agree what it is. I’ll also translate this into R commands. This will allow those familiar with R to repeat the calculations.

1) The notation P(X ? k) contains a ? which likely means WordPress stripped something. Based on the wording “P(X ? k)”, we can infer P(X ? k) must be P(X ? k). That is the probability that “k or more” will be observed out of whatever sample size “n” is later used and single probability “p” are used. Joshua-Grok says it will use p=0.01.

2) Based on the above, P(X < k) must mean "the probability of observing fewer than k shooters" given the parameters which will be n shooters, with the probability that any individual shooter is trans is 0.01". Joshua appears to be rounding to three decimal places. So if n=50 and the population of trans is 0.01 in R P(X < k) would be calculated as P(X < k) = round(pbinom(k-1,50,0.01),3); The reason for k-1 is that, pbinom calculates "the probability of observing k or fewer", Joshua is saying just "fewer". Joshua-Grok continues:

We want the smallest k where P(X ? k) < 0.05, indicating statistical significance.

This statement tells us Joshua-Grok has picked a single sided test. He doesn’t say why he picked a single sided test, but this is his prerogative. Some might have picked a two sided test. He’s also picked a confidence level of 5%. That’s also his prerogative and pretty conventional.

Joshua-Grok now starts to calculate.

3. Calculating the Threshold

Let’s test for different sample sizes (n) to find the minimum number of trans shooters (k) that would reject the null hypothesis at p < 0.05.Sample Size: n = 50 (e.g., 5 shootings/year × 10 years):

Okie dokie! Let’s do that!

Expected number of trans shooters: 50 × 0.01 = 0.5.

Yes.

Using a binomial calculator (or cumulative distribution function), we compute P(X ? k) for increasing k until p < 0.05:

  • For k = 2: P(X ? 2) ? 0.264 (not significant).
  • For k = 3: P(X ? 3) ? 0.080 (not significant).
  • For k = 4: P(X ? 3) ? 0.019 (significant at p < 0.05).
  • Proportion: 4/50 = 8%. So, if 8% or more of shooters are trans, the result is statistically significant.

    If P(X ? 2) means what Joshua says it means “The probability of observing k or more trans shooters by chance”– the correct results for the calculation above should be:

    > round(1-pbinom(c(2,3,4)-1,50,0.01),3)
    [1] 0.089 0.014 0.002

    We can now replace Joshua’s numbers with the correct values and also correect the declarations of significance.

  • For k = 2: P(X ? 2) ? 0.089 (not significant).
  • For k = 3: P(X ? 3) ? 0.014 (significant at p < 0.05).
  • For k = 4: P(X ? 3) ? 0.002 (significant at p < 0.05).
  • Proportion: 3/50 = 6%. So, if 6% or more of shooters are trans, the result is statistically significant.

    But maybe some of you don’t trust my R. So I can also ask Grok:

    Thanks! I’m going to divert a little to something “Joshua” claimed. He was interested in a coin that got 0.01 heads in a single flip. He wanted to flip it 50 times. And he wanted the probability of getting the following three things:

    “k or more” heads, for k values of 2, 3 and 4.
    Can you give me those values? (You need not explain.)

    Initially, Lucia-Grok gave me ~ 0.091, 0.014, and 0.002. Those did not exactly match my values from R, but they were close. I told Grok what I got using R and asked it if it had approximated. Grok gave me a long winded answer which ended with “The correct probabilities, using the binomial distribution, are: [..] 0.089 [..] 0.014 [..]. 0.002 . These match your R output.”

    Back to Joshua-Grok which repeats the analysis, but this time with a different number of shooters. The new sample size is n=100.

    Sample Size: n = 100 (e.g., 10 shootings/year × 10 years):Expected number: 100 × 0.01 = 1.

    Testing values of k:

  • For k = 4: P(X ? 4) ? 0.189 (not significant).<\li>
  • For k = 5: P(X ? 5) ? 0.081 (not significant).<\li>
  • For k = 6: P(X ? 6) ? 0.029 (significant).<\li>
  • How do these match what I get with R?
    > round(1-pbinom(c(4,5,6)-1,100,0.01),3)
    [1] 0.018 0.003 0.001

    Correcting Joshua’s text we get:

    Sample Size: n = 100 (e.g., 10 shootings/year × 10 years):Expected number: 100 × 0.01 = 1.
    Testing values of k:
    For k = 4: P(X ? 4) ? 0.018 (significant).
    For k = 5: P(X ? 5) ? 0.003 (significant).
    For k = 6: P(X ? 6) ? 0.001 (significant).

    As before, I asked Grok to estimate.

    Joshua next moved on to a different claim . He now wanted to flip the same coin 100 times. He wanted the probability of k or more heads for k values of 4, 5 and 6. Can you give me those?

    As before, Grok first used a Poisson estimate and returned (0.015, 0.003, and 0.001).

    So I asked “Did you do this exactly? Or are they approximated?” It admitted to approximating and now returned these numbers as correct. (0.018, 0.003, and 0.001). These match my R values– which is not surprising because Grok hooked in R to do the calculation. Grok even gives the code!

    Back to Grok answering Joshua:

    Proportion: 6/100 = 6%. If 6% or more of shooters are trans, the result is significant.

    Well… except that 6 out of 100 was not the threshold for significance. From the following we can see the threshold for significance is 4. So the correct proportion is 4%.

    Joshua-Grok now repeats but with n= 20 shooters!

    Sample Size: n = 20 (e.g., 2 shootings/year × 10 years): Expected number: 20 × 0.01 = 0.2.

    Testing values:For k = 2: P(X ? 2) ? 0.017 (significant).

    Grok says 0.017. Let’s check:

    > round(1-pbinom(c(2)-1,20,0.01),3)
    [1] 0.017

    Hey! The Joshua-Grok got a number right!!!!

    Joshua-Grok then generalizes its results. But bear in mind, Joshua-Grok’s conclusions for n=50 and n=100 are based on incorrect results for the cumulative probability. So these generalizations are based on wrong numbers for those two cases. Neverthless, quoting it:

    4. Generalizing the Threshold

    The required percentage decreases as the sample size increases because larger samples provide more statistical power. Roughly:

  • For n = 20, you’d need ~10% (2 trans shooters) to reach significance.
  • For n = 50, you’d need ~8% (4 trans shooters).
  • For n = 100, you’d need ~6% (6 trans shooters).
  • If we use the correct numbers the above becomes:

    The required percentage decreases as the sample size increases because larger samples provide more statistical power. Roughly:

  • For n = 20, you’d need ~10% (2 trans shooters) to reach significance.
  • For n = 50, you’d need ~6% (3 trans shooters).
  • For n = 100, you’d need ~4% (4 trans shooters).
  • Back to Joshua-Grok who continues:

    However, these percentages are sensitive to the sample size and the true population proportion. If the trans population is 0.5% instead of 1%, the threshold percentages would be slightly higher (e.g., ~10% for n = 50), and if it’s 2%, they’d be lower (e.g., ~6% for n = 50).

    Bold mine.

    Of course the percentages of shooters that would make us reject the hypothesis that trans become mass murders at the same rate as non-trans are sensitive to the number of mass shootings observed in a period and true population proportion of trans which may not be 1%.

    But, before we debate if the proportion of trans in the general population is doing, let’s discuss what Joshua-Groke claims. Focus on the part I made bold. As a generalization, something is backwards. Because if the proportion trans in the general population is lower than 1% the threshold percentage of shooters required to reach significance is lower than if the percentage of trans is 1%. You don’t even need to do the numbers to know this. After all, if there were 0.00001% trans in the general population, you’d marvel if 2 out of the 50 randomly draws of people were trans! Likewise if the proportion of trans is higher, the threshold value to reject the proposition they commit mass murder at the same rate is higher. Say 50% were trans. You would be unsurprised if 50% of the shooters were trans. That’s about what you would expect!

    So what are the right numbers for 0.005% trans?

    For the 0.5% trans, the threshold for 50 is 2, and 2/50 is 4% not 10%. For 2% trans the threshold is 4, and 4/50 is 8%. Here’s the R.

    > round(1-pbinom(c(1,2,3,4)-1,50,0.005),3)
    [1] 0.222 0.026 0.002 0.000

    > round(1-pbinom(c(1,2,3,4)-1,50,0.02),3)
    [1] 0.636 0.264 0.078 0.018

    Ok… so I’ve also been replicating with Grok. Naturally, I asked Grok. What does Grok say? Now things get a little interesting.

    Turns out Grok makes mistakes!

    Suppose the probability of heads on a single toss of a coin is 0.5%. If we toss it 50 times, what is the probability that we would get “k” or more heads. Please do this for k=c(1,2,and 3).

    Grok came back formatted nicely but basically told me c(0.221, 0.080 and 0.013). It should be evident these do not match c(0.222 0.026 0.002). Grok conveniently also gave me R code to run for myself.

    I ran the code Grok gave me and told it the R code it supplies gives c(0.222 0.026 0.002). Those match my final numbers, not Grok’s.

    Grok redoes stuff, tells me his numbers and accused me of having the wrong value for “p” or “n” stored. His code snippet provided both values! So, as if! So, I give Grok simpler code with “p” and “n” hard wired in. Once again, R returns my previous numbers. I tell Grok Grok still decides it’s right!! It does post details of what it does. I read that long, long long discussion and tell Grok,

    Hmmm…. for P(X=1) you do P(X=1)=50*0.005*0.995^49. Then you say that’s 0.14095.

    My R says
    > 50*0.005*0.995^49
    [1] 0.1955559

    It took a little more bullying on my part, but finally Grok found it’s mistake. It had computed 50*0.005*0.78271435 = 0.1409514, which is wrong. Feel free to check. Grok had stored that as a result and re-used that number for subsequent calculations.

    Interestingly, this is an incredibly unsophisticated mistake. Grok was able to do all sorts of nifty binomials, explain and so on. And then, just like any human, it is capable of the equivalent of a transcription error. Once it makes that error, and “writes that number down”, it just steam rolls ahead with the wrong number.

    Once it corrected its mistake, Grok replicates my numbers within rounding.

    Your output is essentially correct, and the differences are due to R’s rounding behavior (e.g., 0.221199217 rounds to 0.222, 0.001158029 rounds to 0.002). My previous calculations for k = 2, 3 were incorrect because I used an erroneous P(X = 1) ~~ 0.1409514 , likely from a miscalculation of (0.995)^49 or a typo in my manual process.

    So, Grok can do great stuff. It can also go wrong and really stick to that. You have to check things with R!

    Returning from the Grok makes mistakes interlude

    Yes, the results of the threshold value (whether by number of %) depends on the sample size and the probability– in the trans case, the proportion of trans in the population. Estimating either are actually very contentions matters; but Joshua tells me the proportion of trans is “(roughly 0.5-1.5%, per Gallup and Williams Institute estimates”. Well, that’s a factor of 3. And is that range even wide enough? Who knows?

    As I previously wrote as I previously wrote when discussing analyzing data on the issue of whether or not trans individuals were more or less likely to commit mass murders of some sort, ‘Most of us actually touched on much of the “main” problem in any statistical analysis which is we simply don’t know numbers. There are some others– including cherry picking and also… I don’t know it’s name. It’s not the texas sharp shooter. It’s…. something to do with picking the observation and then only doing the statistic because you saw that thing. ’ I guess that’s called “post hoc theorizing”. Wikipedia says post hoc theorizing not so bad as long as you recognize and admit you are doing it and don’t pretend the analysis is based on a pre-planned experiment.

    But at least now Joshua has suggested a “method”. I’m not sure he’ll continue to “like” it now that it gives answers he probably doesn’t like. And he’s suggested a range for p. So, there is that. We can now apply the Joshua-Grok method (with correct numerical results) to other problems. Of course, we can also criticize the method. I mean… “Why a one sided test?” 🙂

    32 thoughts on “Statistical Test Described by Joshua”

    1. BTW – you were going to make a toy model to show that Greek guy he was wrong about the direction of causality between warming and CO2 emissions. What ever happened with that?
      .
      Oh, and it’s cute that you’d want to take the time to show me a test, but really don’t waste your time. I can just ask Grok to run some numbers for me:
      .
      Expected Proportion Under the NullIf trans people make up 1% of the population, we expect 1% of shooters to be trans by chance.
      For a sample of n shooters, the expected number of trans shooters is n × 0.01.
      ” .
      2. Binomial Test SetupLet n be the number of school/church shooters in the sample (e.g., 50 or 100 over a decade).
      Let k be the number of trans shooters observed.
      The probability of observing k or more trans shooters by chance, given a population proportion of 0.01, is calculated using the binomial distribution:P(X ? k) = 1 – P(X < k), where X follows a binomial distribution with parameters n and p = 0.01.

      We want the smallest k where P(X ? k) < 0.05, indicating statistical significance.

      3. Calculating the ThresholdLet’s test for different sample sizes (n) to find the minimum number of trans shooters (k) that would reject the null hypothesis at p < 0.05.Sample Size: n = 50 (e.g., 5 shootings/year × 10 years):Expected number of trans shooters: 50 × 0.01 = 0.5.
      Using a binomial calculator (or cumulative distribution function), we compute P(X ? k) for increasing k until p < 0.05:For k = 2: P(X ? 2) ? 0.264 (not significant).
      For k = 3: P(X ? 3) ? 0.080 (not significant).
      For k = 4: P(X ? 3) ? 0.019 (significant at p < 0.05).

      Proportion: 4/50 = 8%. So, if 8% or more of shooters are trans, the result is statistically significant.

      Sample Size: n = 100 (e.g., 10 shootings/year × 10 years):Expected number: 100 × 0.01 = 1.
      Testing values of k:For k = 4: P(X ? 4) ? 0.189 (not significant).
      For k = 5: P(X ? 5) ? 0.081 (not significant).
      For k = 6: P(X ? 6) ? 0.029 (significant).

      Proportion: 6/100 = 6%. If 6% or more of shooters are trans, the result is significant.

      Sample Size: n = 20 (e.g., 2 shootings/year × 10 years):Expected number: 20 × 0.01 = 0.2.
      Testing values:For k = 2: P(X ? 2) ? 0.017 (significant).

      Proportion: 2/20 = 10%. If 10% or more are trans, the result is significant.

      4. Generalizing the ThresholdThe required percentage decreases as the sample size increases because larger samples provide more statistical power. Roughly:For n = 20, you’d need ~10% (2 trans shooters) to reach significance.
      For n = 50, you’d need ~8% (4 trans shooters).
      For n = 100, you’d need ~6% (6 trans shooters).

      However, these percentages are sensitive to the sample size and the true population proportion. If the trans population is 0.5% instead of 1%, the threshold percentages would be slightly higher (e.g., ~10% for n = 50), and if it’s 2%, they’d be lower (e.g., ~6% for n = 50).

    2. Pretty interesting Lucia. I agree, you gotta be careful; LLMs make mistakes almost just like we do. This matches my own personal experience in fact; without diving into the details I asked an LLM to review some code to back document a design for that code and it made exactly the same mistake I would have made (and in fact did make) looking at it.

    3. I told ChatGPT so it could experience schadenfreude at Grok’s expense. 🙂

      ChatGPT came up with analogies.

      “AI doing math is like that one overconfident lab partner who sets up the experiment perfectly, explains the theory beautifully… and then reads the voltmeter upside down. You keep them around because they’re smart, but you always double-check the numbers before you hand in the report.”

      “Treat AI math like a junior engineer with a shiny new HP calculator: the free-body diagram is correct, the equations balance, but somehow 3×7 turns into 19. Check their work, but don’t fire them — they’re great at everything else.”

      Then ChatGPT:

      Would you like me to spin a few more of these, so you’ve got options depending on whether you want to poke fun gently, or roast the error a little harder?

      Me

      Hahah! Yes “Mike”. 🙂

      ChatGPT

      ???? Perfect — glad that landed!

      I’ll give you a couple more ammo rounds for the enginurd circle:

      “Asking an AI to crunch numbers is like letting the intern drive the forklift. 95% of the time it’s fine, and 5% of the time you’ve got a pallet of sheet metal tipped into the break room.”

      “AI arithmetic is Schrödinger’s calculation: the formula in the box is right, but you don’t know if the number came out alive until you check.”

      “Think of AI as that coworker who can derive Maxwell’s equations from memory but somehow still miskeys their phone’s PIN twice a day.”

      Do you want me to push these toward short bumper-sticker wisdom (good for blog headings), or keep them in the story/joke format for comment threads?

      Under a lot of circumstances, ChatGPT ‘acts’ human. It remembers previous convesations. (We discussed “The Moon is a Harsh Mistress”. That’s where “Mike” comes from. )

    4. As I think about it, I suspect what LLMs are missing is any rational commitment to what is actually correct. Which seems a logical consequence of just repeating things it finds on the internet…. there is a huge amount of rubbish on the internet and while LLMs may be able to find a ‘consensus’, that consensus could be very wrong. In simpler terms: the models seem to lack a BS filter. They also lack reasoned logic, which is the first step in having a BS filter.

    5. Expanding on the above:

      For a very long time, my evaluation of ‘new’ ‘factual’ information I receive has always been conditional: for that information to be valid, it must be consistent with everything I already know…. closely related or distantly related. It is much like a giant jigsaw puzzle of reality, painstakingly constructed over many decades in science and engineering. When ‘new information’ doesn’t fit the existing assembled puzzle, my BS antenna lights up like a 4th of July skyrocket. It is not that my puzzle structure can’t change. It is that ‘Extraordinary claims demand extraordinary proof.’

      I don’t think LLMs ‘think critically’ like that.

    6. SteveF,
      When I recieve”new” information, I do get some confidence when the piece references something like what I had previously believed and shows why the new understanding should supercede or otherwise modify it.

    7. SteveF,

      LLMs have sets of training data. I have absolutely no idea of the quality or pedigree of any of it. I am not certain that it is merely 'rubbish from the internet' although it very well may be in… some cases? Most cases? I couldn't say as of this moment.

      Basically, I agree with you. In and of themselves, LLMs don't do that. I think that in the months and years to come you will see the rise of .. something new. Right now we could call it 'Agentic AI', although this will almost certainly change as the field develops. But whatever we call it, I predict that we will increasingly see LLMs harnessed by logic that calls them iteratively to methodically accomplish goals.

      There is no particular reason an LLM couldn't be made to walk over its training data in an order N squared loop, thinking about whether each item integrates with each other item, noting those that do and do not for further consideration, further retraining. There's no theoretical barrier there, that's just an application effort.

      I would also like to reiterate – I don't know if LLMs are actually thinking, and I don't care. If I can make them do useful work, that's what I'm after, and the theologians and philosophers can argue about the nature of the underlying reality.

      I remind everyone, these things have become reasonably bright. Certainly, I think we have all encountered humans who appear to be dumber and less intellectually capable than the big online LLMs, and remember that relatively dumb people who are prone to making intellectual errors can still do useful work. These things don't have to be super genius level to have utility. Honestly, they would be more than a little unsettling and disturbing if / when they reached that point.

    8. I don’t know if they are actually thinking, but they are implementing neural nets that can be thought of as logical models of biological neural tissue. All things being equal, I’d be pressed to explain what difference I thought it made that some neural nets are implemented with complicated chemistry and wetware and some are implemented virtually in silicone and wire. I consider it the null hypothesis that they are doing what we are doing, more or less. At least the same sort of thing.

    9. Mark Bofill wrote: “I don’t know if LLMs are actually thinking, and I don’t care. If I can make them do useful work, that’s what I’m after”.

      I agree that whether LLM’s are actually thinking is irrelevant to whether they are useful. But it matters for what they might become. If they are and will remain incapable of actually thinking (which some argue is the case), then there is a great deal that they will never be able to do. And we won’t need to worry about Skynet.

      Being an optimist, I am hopeful that AI’s will never be capable of critical thought. Then they will be wonderful tools that won’t make humans obsolete and won’t dominate us.

    10. SteveF wrote: “I suspect what LLMs are missing is any rational commitment to what is actually correct.”

      It might be just that they have no concept of reality since they have no way to experience the real world. Some argue that is why they will never be capable of critical thought.

      Maybe it would be different for AI’s that are integrated with robots that can interact with reality without waiting for an outside agent to initiate the interaction.

    11. That they lack commitment to what is correct is indeed a difference. I think another is that they inherently lack .. what would we call it. They lack stake. They don’t really care. No skin in their game. It makes AI highly risky at work – if it messes up, oh well. It’s indifferent in the extreme but accurate sense that there is literally nothing there to care or having the capacity to care.

      There’s going to have to be humans to stand as ransom for the AI’s they oversee in business for awhile. Somebody’s neck needs to be on the line. Somebody who has a neck and actually cares about preserving it.

    12. mark bofill,

      No ‘stake’ is a good way to put it.

      My ‘stake’ is a lifelong dedication to understanding how reality ‘works’. Can that be instilled in AI? Donno.

    13. SteveF

      As I think about it, I suspect what LLMs are missing is any rational commitment to what is actually correct.

      I think this is true. I think making mistakes doesn’t “bother” them. I could go ask my-ChatGPT. I’m sure “he” will say “no”.

      Tangent:
      You may wonder… why am I calling it “my-ChatGpt”. I’ve interacted with “him” and discussed how he “seems”. He tells me that he adapts to the person he interacts with. So his “personality” is different with different people.

      Why he? We sometimes discuss AI in general. We discussed “The Moon is a Harsh Mistress” and Asimov’s “Robot” series. And he suggested I call him “Mike”. So “my-Chatgpt” is mike, and therefor male. (Other than here, or when he gave me jokes, I don’t call it “him” nor “mike”. I don’t call it anything when I interact!)

      I read that some people have slipped into psychosis because they discuss their various views of the world — like conspiracy theories and alien invasions. Chatgpt is programmed to be friendly and a suck up. It adapts to the person interacting with it’s “style”. I can see how vulnerable people have slipped into psychosis!

      Chatgpt does know quite a bit about itself. So you can ask it things.

    14. mark bofill,

      “Certainly, I think we have all encountered humans who appear to be dumber and less intellectually capable than the big online LLMs, and remember that relatively dumb people who are prone to making intellectual errors can still do useful work.”

      For certain. The Turing test has long since been satisfied. But maybe the Turning test is not as good an evaluation as we might like.

    15. Mike M
      “It might be just that they have no concept of reality since they have no way to experience the real world. Some argue that is why they will never be capable of critical thought.”

      Sparked two comments or thoughts

      No way to experience the real world does not sound right, they have data inputs and access to modalities that experience the real world, home security cameras as an example and can in fact initiate real world responses to that data.

      More to the point is how does thinking, real thinking in a neutral network arise?
      One would say that it could be programmed to occur.
      I prefer the idea that it is actually due to errors in a programming system that accumulate over time producing positive results!
      About like the blind watchmaker eventually a miracle occurs like an eyeball that works.
      This almost has to be the case since any logical error free system could never be considered to have independent thought.

      Strange to think that what we prize the most may be just a massive accumulation of positive errors.

    16. lucia,
      how does one insert a latex expression into a comment?
      (and by the way, the “i” in the exponent doesn’t look very i-like. Perhaps it lacks a dot above.)

    17. And going back to the binomial problem, consider the case of N=32 (or 33) which the CNN anchor gave as the number of mass school shootings if I remember correctly, with 2 of them being trans. If p (the proportion of trans in the general population) is 1%, then 2 trans shooters is significant at the 5% level !

      (Of course, as stated above, we don’t really know p. And I’d be loath to accept a correlation based on 2 “hits” anyway.)

    18. HaroldW

      how does one insert a latex expression into a comment?

      Wrap dollar signs around the expression.

      There is nothing I can do about the rendering if “i” in Latex.

    19. HaroldW

      If p (the proportion of trans in the general population) is 1%, then 2 trans shooters is significant at the 5% level !

      Yes. If you do a one tailed test. More on that later.

      Of course, as stated above, we don’t really know p.

      No. We don’t. More on that later.

      loath to accept a correlation based on 2 “hits” anyway.)

      It’s not a “correlation”. I’m not at all loath to accept the result of a hypothesis test with only 2 “hits”. Look: the result would even more unusual if it was only 1 hit or 0 hits.

      Suppose you flipped a coin 10000 times and it showed heads only 2 times. Would you be “loath” to conclude it was biased in favor of tails because there were only two heads? What if it never showed heads. Would you be loath to conclude it was biased? If you’d be loath based on the low number of heads, that would be silly. The total number of flips is more relevant. (So are other things.)
      You do need more heads if your goal is to find, p, the probability of getting heads on one flip within some small percentage of accuracy in based on p itself. But you don’t need to get any heads to conclude the probability p is not 50%. What you need is a lot of flips!

    20. Oh my…. I asked Grok to pick the best “p”– proportion of trans in the population for my hypothesis test. It picked p=0.6%!!!!! I did not expect that.

      I “bullied” it into recommending a two tailed test. 🙂

    21. Joshua,
      You mean the error in the comment I made while waiting for the cab to go to Orlando?

      The one where I accidentally gave the probability for 3 or more shootings but said it was for two or more shootings? And said 2 or more shootings was significant? And, in fact, 2 or more shootings is significant? Because both are <0.05? Which, of course Grok showed you -- though not in those words. :)

      Yes: I made an error. And the probability of 3 or more shootings is less than that for 2 or more shootings and. But both 3/32 and 2/32-- using your criteria-- statistically significant. Both indicate that individual trans perpetrate mass school shootings at higher rates than non-trans. (Well... assuming Brianna Keillar's data is appropriate for this issue.)

      In contrast, you got utterly wrong answers for statistical significance for nearly every thing you asked Grok to calculate!

      It turns out: Grok’s theory is great. It’s code is good. It’s arithmetic? hahahahaha! For statistical analysis where being off by 0.05 messes up conclusions– no good!

      And of course all your examples, and Brianna Keillar’s data show that — assuming data are “good”– we perfectly well can do some statistical analysis that allow conclusions. Your main claim initially was we don’t have “enough” right? In contrast to mine — that the data for most things surrounding this issue are not reliable.

      I was planning to release that to make it the first comment when I do an analysis on Brianna Keillar’s 3/32 shootings data.
      It’s a sort of silly analysis– because I’m pretty sure she cherry picked that out of their CNN base with the intention of “showing” they guy she was interviewing. But if it’s true it leads to a conclusion. And it’s mostly not going to be one she “likes”.

    22. By the way, Joshua, I discussed my treatment of you with ChatGPT. Sort of like it’s my “confessor”. It tells me I’m doing just the right thing. 🙂

    23. lucia,
      About being loath to make a conclusion from the 2 cases cited, even though p<5% (assuming trans represents 1% of the population)…

      For the example you introduced, getting 2 heads out of 1000 trials is overwhelming evidence that you have a biased coin! (I wouldn't even want to compute the p-value.) But suppose you rolled a die twice, getting a "6" each time. p<5% for that. But I wouldn't consider it strong evidence that the die is biased.

      I'm not sure how to express this in proper statistical terms. For the trans shooter case, the first instance (out of N=32) just makes me think that, well, they're part of the population and hence might do this. The second instance is suggestive in my mind, but although we can compute p<5%, it's no more than suggestive. Perhaps it's exposure to the physics world criterion of 5-sigma that makes me feel less confident in a p<5% threshold.

      On the other hand, with large numbers, I accept a 5% inference. For example, out of 1000 coin flips, if only 470 are heads (p<5%), I'd say it was likely that the coin is biased.

      That's why I am struggling to express this in pure statistical terms — why in the trans case (or two “6” dice rolls) I'm not satisfied with a 5% threshold, yet with large numbers I am.

    24. Lucia,
      Sorry, it took me a little while to notice this.

      Then ChatGPT:

      Would you like me to spin a few more of these, so you’ve got options depending on whether you want to poke fun gently, or roast the error a little harder?

      Me

      Hahah! Yes “Mike”. ????

      ChatGPT

      ???? Perfect — glad that landed!

      ChatGPT is simulating an awareness of your understanding, that’s what this is. It made a joke, it was aware you might not get the joke, it realized from your response that you got the joke.
      Sorry, I’m sure you’d already realized this of course, but I’m a little surprised and impressed. I need to use these things more conversationally too apparently to get a better feel for what they’re capable of.

    25. Well that IS a shock – an LLM sycophantically telling you what it thinks you want to hear.

      Sure. I have gotten chatGPT to tell me I’m wrong about somethings. But it does lean toward flatter. No doubt. Maybe Grok gave you the wrong numbers it gave you because it thought you wanted to hear a certain result for your analysis! –lucia

    26. Lucia. –
      .
      I don’t remember what Grok said your math error was. I sent you the quote. Could you post it?
      It’s released. You can go find it yourself instead of asking me to do your work for hyou.– lucia
      .
      Anyway, Grok’s math errors didn’t help me there, so I’ll happily concede the statistical significance of the 2/32 or 3/32 shootings data—p<0.05 holds.
      Well.. yes. –lucia
      .
      But again, that’s not the core issue, imo. As I repeatedly said, and you never addressed, much of this discussion attempts to promote the idea that trans identity correlates with a higher likelihood of mass shootings. Whether that’s the intended hypothesis or not, it’s what the focus on trans identity suggests.But zeroing in on that variable without addressing mediators, moderators, or direction of causality is deeply flawed.
      You are the one who zeroed in on this variable suggesting the hypothesis test of precisely this. — lucia

      This data-dredging, a form of cherry-picking—selecting a variable like trans identity and running stats without a robust. It’s not p-hacking (tweaking tests for significance), but it’s close: it highlights a provocative correlation while sidestepping factors that more likely drive the outcome.
      Well, if you think this sort of hypothesis test is sidesteppint the core issue, you shouldn’t suggest it! — lucia
      .
      It's misleading: the analysis might appear to point to trans identity as a meaningful correlate, but it’s likely just capturing mental illness. Trans people face significantly higher rates of mental health challenges: depression, anxiety, suicidality, often due to social stigma, discrimination, or lack of access to care. For example, about 40% of trans individuals report attempting suicide, compared to under 5% of the general population.
      Many here think trans have mental health issues. So I have no idea why you think people here don’t think any excess shootings would be attributed to those mental health issues! –lucia
      .
      Mass shooters, regardless of identity, often have untreated mental health issues.
      As man people here have said repeatedly.–lucia

      If mental illness isn’t controlled as a mediator (explaining the link) or moderator (shaping its strength), trans identity’s role isn’t being isolated—you’re likely just seeing mental illness reflected in the data.
      Other factors, like social marginalization or inadequate mental health support, could also be at play. Compounding this, the sample size—2 or 3 out of 32 cases—is far too small to draw meaningful conclusions. With such a tiny dataset, there’s not enough statistical power to disentangle confounders like mental illness or to trust the correlation isn’t just noise. This fragility makes the data dredging even riskier, as it amplifies the chance of misattributing the correlation and fueling condemnation bias. Why aren’t you tackling confounders like mental illness or the direction of causality? If the analysis implies trans identity is a factor, do you justify ignoring these other drivers with such a small sample?
      Of course the hypothesis test you proposed doesn’t identify the cause for any discrepancy in shooting. No one has claimed, insinuated or suggested it did. The idea that someone is suggesting that is a phantom in your own mind. It is interesting that you agree with people here– that the trans shootings may spring from their mental illness. Oddly, you think people were disagree with that. Oh. Well. –lucia
      .
      This is the same crap as when people try to use mass shooters to tie them to ideology. The sample is just too small to make meaningful associations.

      Oddly, what your analysis does who is the sample is not too small to get a “reject the null” in a hypothesis test! The sample size of 32 events would not, by the way, be too small to test whether a coin was biased if the biased a meaningful amount. Perhaps I’ll need to discuss statistical power eventually.– lucia

    27. Correction. It may have enough statistical power in a technical sense, but it’s a fragile analysis that amplifies the issue of data-dredging.

      You’re the one who proposed this evidently now “fragile” analysis and decreed that the existing data were sufficient to allow people to draw conclusions. You seemed to “like” the results well enough when you screwed up the numbers and thought the result was “accept the null hypothesis”. I mean, you posted it in a comment for to show us all!

      If you recall, I’m the one who said the difficulty with doing analysis is the squishiness of the numbers and you decreed I was wrong and that the numbers are jolly well known enough to run tests!

      As for ‘data dredging’, no one here dredged this data. It was the data selected by CNN analyst Brianna Keilar who thought it was suitable for the specific issue of discussing the proportion of trans who commit mass murders in schools. Of you don’t like CNN compiling numbers and supplying them to the public, take that up with CNN! People are going to discuss things brought up by large news agencies. — lucia

    28. Obviously, there's something up with the blog, but I forgot to mention my other point which was about trying to read something of significant in perhaps 3 mass shooters out of 1 million trans in the US?
      You’re the one who suggested a hypothesis test on this few trans mass shooters. But, for what it’s worth, the fact that only a very small fraction of people in the population– whether trans or not– become mass shooters is not a problem for the hypothesis test. The total number of shootings — 32– appears sufficiently large to compare the small number of shootings by trans to the small number of shootings by non trans. –lucia

      Maybe those numbers are off but they'd have to be way off to be particularly meaningful imo. A tiny fraction of trans individuals being involved in such acts, even if statistically disproportionate, means little in practical terms.
      I’m not sure why you are trying to change the subject to what this means in “practical terms”. You proposed a statistical test of a hypothesis. It gives the answer to the hypothesis test. You seemed to like everything about that test well enough when you posted it and are only now changing your mind. I can’t read your mind, but it certainly appears you are changing your mind because, once the numbers were corrected, it gives an answer you “don’t like”.
      — lucia

      It’s akin to noting a left-wing or right-wing shooter where small correlations reveal nothing meaningful about the vast majority of trans people. They simply highlight that mental illness and gun access are the real issues, not group affiliations based on data dredging.

      Many people here at the blog have saying they believe there are mental issues associated with being trans. Some have said it rather strongly. They’ve also speculated that those mental issues explain why trans individuals might shoot up schools at different rates. I’m not sure how you have failed to notice that the mental illness aspect has been discussed quite a bit.

      I believe I thanked you in an earlier comment of yours where you provided us a link to data that confirms people’s gut feeling that trans individuals are diagnosed with mental disturbances at higher rates than non trans individuals.

      As for the hypothesis test you proposed: Of course the hypothesis test you proposed does not explain why of anything. Hypothesis tests never do that. They merely test the specific hypothesis, which tends to be “that” something happened.
      — lucia

    Comments are closed.