Questions to clarify contribution of spread in population to uncertainty.

In comments at TAV, Jeff Id asked Pat Frank a brilliant question. I’m going to tweak it so I can ask an extended question. Although this is to some extent a rhetorical question, I think these particular question will help clarify issues relevant to understanding the way in which the spread in temperatures across a sample does and does not affect uncertainty.

Here is my first tweaked question:

If I have 100 exactly the same cups of water filled with identical masses of water, each at a different temperature yet containing the same mass. Each cup has some unknown temperature, but we know the temperature are normally distributed and the temperature in each cup is uncorrelated with that of any other cup. I measure the temp from identical thermometers with a standard error of 0.005C; measurements errors are normally distributed and uncorrelated.

The measured temperatures are:
{59.99488 49.11746 58.40584 62.37229 50.03863 53.45862 79.07399 49.50469 68.08187 51.61895 60.63287 66.10079 62.45774 64.30218 57.69750 51.20818 67.23122 62.98295 56.96893 65.66524 46.52092 75.34275 65.48765 61.84419 62.59873 60.12299 60.42893 58.13555 75.54401 67.27035 60.98558 61.86077 50.14219 55.92671 55.57226 59.04270 67.43372 63.57600 57.12527 56.23831 64.83390 70.23839 57.13899 70.20660 62.36759 60.71702 66.63313 66.41026 56.19293 68.66530 67.39592 64.41039 83.68627 57.64337 61.74628 56.07459 64.70974 65.87532 60.48483 66.62137 66.67895 61.68526 52.22845 58.36557 64.09209 65.68400 57.14445 58.33700 55.91439 63.95999 71.51489 68.48649 71.55382 56.78854 69.41446 64.69442 59.07615 57.35855 48.09356 69.70662 53.68497 57.93970 65.39663 52.26700 71.14228 66.76215 63.78341 63.32243 62.07550 58.48683 71.70238 69.54953 58.04045 57.40616 60.45930 73.54053 66.96805 55.25662 47.82401 48.73794}
(a) What is your best estimate for the mean temperature of water in these specific 100 cups.
(b) What is your best estimate for the standard error in the mean temperature in these specific 100 cups.

Water from the 100 cups is combined and mixed in a large thermos. Assume no heat transfer occurs during mixing.

You plan to measure the temperature of the water. Before you measure the temperature:
(a) provide your best estimate for the temperature that will be recorded by your thermometer.
(b) provide your best estimate of the uncertainty your ability to predict the temperature that will be recorded by your thermometer. (Note: express the uncertainty in the form of standard errors; i.e. 1 σ uncertainties.)
(c) is your answer to (b) closer to 0.001C, 0.005C , or 7 C?
(d) Did the spread in the temperatures in the cups contribute to the estimate of the uncertainty? If yes, did the estimate involve the square root of 100?


Explain your answer.

(Note: the purpose of this question is to determine whether anyone literally believes the spread in the temperature over the cups literally affects our ability to determine the uncertainty. It is essential this be a finite sample, and that we focus on the problem involving these 100 cups.)

Second Tweaked Question

Suppose that the 100 cups of water above were randomly selected from a set of 10^4 cups of water, all with equal volumes. The temperature in the 10^4 cups of water is known to be normally distributed.)

When water in the 10^4 cups are combined. No heat transfer occurs and you intend to measure the temperature. Before you predict the temperature of the batch:
(a) provide your best estimate for the temperature that will be recorded by your thermometer after the water is mixed.
(b) provide your best estimate of the uncertainty your ability to predict the temperature that will be recorded by your thermometer. (Note: express the uncertainty in the form of standard errors; i.e. 1 σ uncertainties.)
(c) is your answer to (b) closer to 0.001C, 0.005C , 0.05, 0.5 or 7 C?
(d) did the spread in the temperatures in the cups contribute to the estimate of the uncertainty. If yes, did the estimate involve the square root of 100?

After answering: I’d like you to relate the correct answer and how you obtained it to Pat Frank’s discussion of uncertainty in “case 2” in his first paper.

Other questions
Note that these questions do not incorporate every possible complication involved in computing a monthly mean temperature anomaly from data. The first is a simple case. The second is slightly more complicated. We can move on to more and more complicated problems involving things like correlation in temperatures in the 100 cups or the 10^4 cups etc. to see how they change the answer. But for now, I’d like to see people answer these questions. The reason is: If a method doesn’t work for these simpler cases, it ain’t gonna magically work for more complicated cases.

43 thoughts on “Questions to clarify contribution of spread in population to uncertainty.”

  1. In both list “(c) is your answer to (c)…” should be answer to (b) unless you are going all Hoftstadter on us 🙂

  2. I’ll bite.

    a) The mean temperature of the 25 cups is 61.753 C
    b) The standard error of the mean is 0.001 C

    a) The mean temperature of the combined cups is 61.753 C
    b) The predicted uncertainty in the measurement is 1σ = 0.0051
    c) 0.005
    d) no

    Since no heat is assumed to be lost and the mass in each cup is identical, the mean temperature of the cups will be the mean temperature of the cups combined. The predicted uncertainty is the combination of the uncertainty of the mean and the measurement uncertainty which is obtained by summing the variances and taking the square root. The uncertainty of the mean is much smaller than than the measurement uncertainty so the measurement uncertainty dominates. The spread in temperature has no effect on the uncertainty of the mean. It doesn’t matter whether the cups have different temperatures or all have the same temperature because the precision of the measurement is the same.

    For the second question:

    a) The estimated mean temperature is still the same 61.75 C
    b) The standard error (1σ) = 1.45 C
    c) 0.5
    d) yes and yes

    Pat Frank posits that the uncertainty in the estimate of the temperature is not a function of the number of samples, but that’s obviously wrong. If the entire population was sampled, as in the 25 cup case, we know the mean to within the error of measurement and the standard deviation has no effect. The standard deviation in case 2 has an effect only because we are sampling a subset of the population so we only have an estimate of the mean and that estimate will have an uncertainty determined by the standard deviation of the population. We also only have an estimate of the standard deviation of the population, so that has to be corrected for the 24 degrees of freedom, about a 2% increase. The estimated standard deviation is then divided by the square root of the number of samples to determine the estimated standard deviation of the mean.

    I think. Prediction intervals tend to be larger and I may have missed something there.

  3. The first thing is that the spread of the data is non-Gaussian.
    Rank data and plot data vs rank number, not a Sigma at all.
    It is quite clear that the lowest 15 or so data points fall on a different line to the majority of the data, whereas the top 10 or so are dodgy.
    Add ranked data points, 1 with 100, 2 with 99, 3 with 98. Then plot these against rank #, we see that the discontinuity is at the top and tail.
    Assume that we have a reading artifact and clip dataset of 15 max and 15 min;
    mean =61.68961577
    SD=3.879048031
    S.E.M 0.463634918

    Test using mode (50+51/2) = 61.85248

    I would go for this as the mixed temperature.

    I based my S.E.M. on root 70, clipping the top and bottom 15 points.

    This data-set would indicate either a operator error or a systemic error. When measuring fluorescence in 96 well plates we often find that the end columns give higher F.
    I would see if there was any correlation between the position of the cups and the readings; are the ones at the center over-temp and the ones at the edges colder?
    Are we looking at a change in temperature as the reading are made?

    Anyway, the main conclusion is that this data-set is non-Gaussian, but looks Gaussian when clipped.

  4. That’ll teach me to pay more attention. 100 data points, not 25. Also, the data does not appear to be normally distributed by both the Shapiro-Wilkes and Jarque Bera tests. I think my reasoning is correct, though.

  5. Doc–
    I didn’t test for gaussian– but I generated with rnorm! Generally, that should give gaussian, so this must be one of the false positives where gaussian is rejected even though the 25 were drawn from a Gaussian population. (Or rounding made a difference. But it shouldn’t have; the rounding is trivial compared to the sd of the population.)

    Even if it’s not gaussian, you should be able to answer the questions.

    Out of curiosity, why would a non-gaussian distribution indicate operator error? Lots of things in nature aren’t gaussian.

  6. Dewitt–
    The 25/100 problem is obviously mine. I initially wrote the script with 25 cups drawn from a full sample of 100. Then, I wanted to change the 100 to 10^4 but then grabbed a group of 100 to paste in.

  7. Re: lucia (Jul 15 05:45),

    Ignoring the normality issue, some of the estimated errors change in my answer because they will be divided by 10 instead of 5. The 1σ uncertainty in the mean of the 100 cups would be 0.0005 rather than 0.001 for example. Also, with that many degrees of freedom the t statistic correction to the uncertainty becomes negligible as does the contribution to the uncertainty of the measurement of the thermos temperature from the uncertainty of the mean. Still no excuse for my not having noticed the actual number of samples.

  8. DeWitt–
    Yes. Agreed. Your reasoning is correct! Of course, I should have noticed I pasted in 25 instead of 100. It wasn’t meant to be a trick question.

    But the lack of normality is odd….

    Do you use an R package to test for normality? I’m curious to generate 1000 samples of 100 and see how often we reject normality.

  9. “lucia (Comment #79113)

    Out of curiosity, why would a non-Gaussian distribution indicate operator error?”

    When you have a non-Gaussian distribution there are four typical reasons:-
    1) You have screw-up in preparation. We humans do things linearly and in sequence, it is typical that the assays are performed so that the assay mix goes into column 1 before column 12. We can then have settling between additions, so column 1 got more Assay mix factor ‘Y’ than does column 12.
    All manner of time dependent artifacts creep into work. People SHOULD, have a pseudo-randomized layout of their testing matrix, but they very rarely do. Often the control is first, and then increasing levels of ‘X’ are in 2, 3, 4, e.c.t., they then add test solutions and also read in the SAME order. So they are adding a linear/exponential component to their readings.
    2) You have a time based bias in you reading, typically temperatures change during the course of a reading or you ethanol is evaporating during the read.
    3) You have a truncated distribution because something running-out; most assays are based on [A] + E->[B] and you read [B]. As long as [A] is infinite, no problem. However, [A] is never infinite and normally costs money. Unless you know the dynamic envelope of your assay it is quite easy to use too little [A] and then have a rectangular hyperbola relationship between E and [A] going to [B]. Topping out is very common, so that 50 x2 =100, but 500×2 =750 and 750×2=1000. Unless you do positive controls, you can’t know you envelope. Most people today buy assay kits, and do not know the character of the assay they are working with, they just follow the recipe.
    4) You have a non-Gaussian!
    Typically a mixed population of 2 or more Gaussian’s. This is very bad. This means that you have to estimate the fraction, means and SD of the two or more populations, which means huge n numbers. So, hide that fact in the text and pretend that it is Gaussian anyway.

    Top and tailing, removing the the top/bottom 10 or 15% of the data points is sometimes used. I makes a lot of sense if you are going to be doing ratios or similar transformations. The graphical rule of thumb is to add 1 to 100, 2 to 99, e.t.c. and then plot that; If you have a flattish line, then you are good.

  10. w.r.t.
    “The 1ĂŹĆ’ uncertainty in the mean of the 100 cups would be 0.0005 rather than 0.001 for example”

    Where on Earth do you get that from?

    You are aware that this value is less than the temperature gradient within a cup of water? Evaporative cooling will mean that the liquid layer at the top of the cup is colder than the center. Depending on cup geometry, you will have circulation currents where the colder water passes down the cup walls, then to the center of the base, and then up.

    The number you quote is truly meaningless in physical terms. What is the point of making and honing tools that are useless?
    It is rather like basing ones purchases of gasoline on the reading of the mileometer, the manufacturers quoted m.p.g. and your locals fraction of urban/highway density.

    I am asking you in all seriousness, what does this numbers mean

    “The 1ĂŹĆ’ uncertainty in the mean .. would be 0.0005 rather”

    and what use can you make of it?

  11. Doc–
    The cups could all be thermoses, as could the final cup.:)

    Anyway, there is a point in exploring where uncertainties come from. Error and uncertainty are introduced from the fact that the assumption of “no heat transfer” is violated. But DeWitt is computing the statistical uncertainty only.

    There is always a point in knowing this. Had it been the dominant one, that’s the one we would use. If it turns out to be tiny relative to the problems in assumptions like “no heat transfer” or “no mixing”, then we’d have to fix those assumptions. But fixing that doesn’t come from statistics, so the way the question is set up, DeWitt’s reasoning is correct. He is limiting his answer to the things that can be answered using statistics.

  12. but does he not get the number, based on faith?

    He gets the number from the statement:-

    “measure the temp from identical thermometers with a standard error of 0.005C”

    so he is just reinterpreted a number. So what are real errors, in real life.

    Add three volumes together, 1 ml buffer, 100 ul DNP-PO4 and, at t=0, 5 ul Alkaline Phosphatase. Measure the formation of DNP in a spectrophotometer, all the way to 100%
    All pipettes give +/- 2.5% errors; run your assays.
    The error from the 5 ul addition will be the only one that counts, you measure a rate, DNP-PO4 is in excess, +/- 2.5 is nothing, buffer is in excess, +/- 2.5 is nothing. However, the amount of enzyme is directly proportional to the final reading, so almost all the variance is due to enzyme levels.

  13. I’m going to wade in carefully, because I haven’t had time to do the math. But I think your analogy is flawed.

    To begin with, from a physical perspective the concept of “average temperature” has no physical meaning. Temperatures aren’t additive, hence an arithmetic mean (or any similar arithmetic statistic) has no meaning.

    Temperature is a measure of the relative molecular vibrational energy in a substance. for a given substance, or given mixture, the more energy the higher the temperature. But, when comparing different substances or mixtures that is no longer true. A substance with a lower heat capacity may have a lower energy than a substance at a lower temperature that has a higher heat capacity.

    Hence a sample of air from Colorado Springs, CO (elev 7000′) may have less heat than a sample from Huntsville AL at a lower temperature.

    So a measure of temp is only a proxy for the heat content, not the actual thing and an average temp is merely a proxy for average heat content.

    Just off the top of my head, that seems to fundamentally change the discussion.

  14. DocMatryn:

    4) You have a non-Gaussian!

    Many real-world processes are closer to log-normal than normal. In fact non-Gaussian is the norm in many meteorological measurements.

    (I can show you examples of multi-modal distributions too…these happen when you have very different time scales associated with different processes, a typical situation in climatology)

  15. John Vetterling–
    I’m not going to debate whether average temperature does or does not have physical meaning. It may or may not depending on one’s point of view. The main thing is: it doesn’t matter whether the average has a physical meaning. Averages have meaning even when they have no physical meaning.

    Just off the top of my head, that seems to fundamentally change the discussion.

    No. Because whether average temperature has physical meaning is not relevant to discussing the uncertainty in an average.

  16. Re: lucia (Jul 15 09:31),

    The R package ‘tseries’ has the Jarque Bera test of skewness and kurtosis as well as the Shapiro Wilk test. There’s also the D’Agostino test of skewness that I haven’t used (‘moments’ package if it isn’t in tseries).

    The calls are
    jarque.bera.test(x)
    shapiro.test(x)
    agostino.test(x)

    They all return a test statistic and a p value that the null hypothesis of normality can be rejected.

    However, this link says that normality tests tend to be too sensitive and should be used with caution.

    Millidegree, or possibly sub-millidegree precision and accuracy are needed to measure ocean heat content. Obviously that’s not measuring water in cups, but the stated precision of the measurement was 0.005 degrees so that’s what I used.

  17. As you have posed the question here, I agree with you.

    But if I recall correctly, the original hypothesis was different. So one way of looking at this is that temperature is a proxy for some underlying physical quantity (heat content?). Then we have the uncertainty in our measurement of temperature plus we have the uncertainty in how well temperature represents the actual physical quantity.

    If the uncertainty in the proxy is temperature dependent then it is possible that the uncertainty would increase with the range of temperature. Which I think was the original premise.

    there’s a lot of ifs there.

  18. John–
    Who do you think posed the original hypothesis and why do you think it was posed differently?

    I’m not looking at temperature as a proxy for the heat content. I don’t think Pat Frank’s paper does. And even if someone does, that doesn’t change anything here.

    Which I think was the original premise.

    It’s not the premise of CRU– which Frank comments on. I don’t think it’s mentioned in Frank’s paper. I haven’t mentioned it. So, I don’t know why you think it’s “the original premise”.

  19. Carrick, think on this. One is typically measuring a pseudo-steady state that changes in response to a large number of variables.
    In Lucia’s 100 cups, the rate at which they cool will be a function of the thickness of the cups polystyrene, how it is physically arranged on the table, the local humidity above the cup, the air velocity, and the absolute purity of the water. By the time one has measured the tenth cup, the first has already cooled.
    Take something we know, a 50-mL pipette has a tolerance of ±0.05 mL, pretty good. However, if it is actually used by someone in a time sensitive manner, you will be lucky to get ±0.1.

    “identical thermometers with a standard error of 0.005C”

    When? When they were manufactured, possibly. But after being placed in dry ice/ethanol or into an oil bath in the general lab for a month or so; no.
    You can get precision, but not in the numbers you want. I guarantee that if you checked 100 temperature sensors on the USA, you would find that the vast majority were off spec.

  20. DocMartyn:

    One is typically measuring a pseudo-steady state that changes in response to a large number of variables

    Part of the problem is this assumption (of a pseudo-steady state) is violated over the time scales typical of meteorological measurements. That’s what leads to the log-normal-like behavior in the measurement noise.

    I guarantee that if you checked 100 temperature sensors on the USA, you would find that the vast majority were off spec.

    I would put money this is small compared to other noise sources. Remember that for any instrument, you are subtracting the offset when computing mean temperature, so simple calibration errors don’t have any significant effect.

  21. Re: DocMartyn (Jul 19 12:17),

    In Lucia’s 100 cups, the rate at which they cool will be a function of the thickness of the cups polystyrene, how it is physically arranged on the table, the local humidity above the cup, the air velocity, and the absolute purity of the water. By the time one has measured the tenth cup, the first has already cooled.

    It’s a gedankenexperiment, not the real world. In a thought experiment you can have things like sealed perfectly insulated containers with identical thermometers. It’s meant to illustrate the point that the uncertainty in measurement is separate from differences in the real quantity measured.

    Or do you think Pat Frank is actually correct?

  22. Doc–
    In the gedanken experiment, there is no heat transfer because the cups are made from super-insulating un-obtainium which was brought to us by super-intelligent aliens from an advanced civilization. There are no temperature gradients inside the cups because the contents were perfectly mixed and allowed to stand in the perfectly insulated cups to further reduce any teensie beensie temperature gradients. The thermometers were also manufactured by super-intelligent leprechauns who know how to make measurement devices that never go out of calibration.

    What we are exploring here is whether the specific issue discussed in Pat’s papers contributes to uncertainty in the mean. We are not currently worrying about other issues nor whether the specified values of the measurement uncertainty are possible with current technology made by humans. The specific level of measurement uncertainty that is achievable in any real experiment depends on details of that experiment. In this gedankan experiment, we get to make cups out of unobtanium.

  23. Dewitt,

    Or do you think Pat Frank is actually correct?

    LOL. But I doubt Doc will appreciate the humor in that question.
    .
    Lucia,
    Yes Pat is very wrong. The weird thing is that even after Jeff Id identified/explained how and where ‘s’ finds it’s way into the uncertainty estimate of the mean, Pat refuses to budge…. or even really address what Jeff says. Like I said, weird.

  24. “Dewitt,

    Or do you think Pat Frank is actually correct?

    LOL. But I doubt Doc will appreciate the humor in that question.”

    I think he was trying to to formalize a method to quantify both error and signal in a complex data-set. I think his original idea, but perhaps not his formalization, has merit.

    Allow me to give you a demonstration.

    Yesterday I examined the levels of an enzyme, Lactate dehydrogenase (LDH), in the cells grown in 250 ul of media in 96 wells of a growth plate. I had a total of 8 different plates (which are a diseased child vs an unaffected fraternal twin).
    The cells had been treated +/- two effectors, in each quadrant of the plate.

    This is what I did.
    First I dispersed the cells using a 100 ul 8-well pipette. Then removed 50 ul of cells and placed then in a reading plate.

    Then I had to add a total of 50 ul of assay mix. The mix consists of w/v ratios:
    0.6 Lactate: 0.1 NADH: 0.0004 Rezazurin: 0.02364 Tris: 0.3575 HEPES: 0.02922 NaCl: 0.000001 Diaphorase: 0.05 Tween-20: 50 H2O.
    The 100ul of solution is maintained at 37 degrees centigrade and read for 20-30 minutes on a plate reader.

    I could have performed this assay in a number of ways. I could have added each component separately, into each of the 96-wells; but I did not.
    I made a 50 ml stock solution and kept it in an incubator. I then added 50 ul to each row using a 8-channel pipette, using pre-wetted tips.
    I examined the trace and typically used the rate between 6 and 16 minutes, but in one case I used the rate between 8 and 18 minutes.
    I do this because I know that the plate temperature changes during the bench-top additions and takes 4-7 minutes to reach 37, and that the assay is temperature sensitive.
    Finally, I analyzed the data, keeping all of it, but concentrating on 4×15 wells, and not 4×24 wells, as I know that the outer 36 wells give me slight differences in fluorescence levels, compared with the inner 60 wells. Ignoring the outer wells generally changes the SD from 8% (n=24) to 2.4% (n=15).
    Only when I have all my files saved, date stamped, and provisionally analyzed do I ask my technician to unblind me and tell me which plate is which, affected and twin of affected, and which quadrant is which.
    I do this because pipetteing error runs at 0.5%, and every other plate produces a rate outside the normal range (which I don’t actually know) and has to be manually eliminated.

    So, all you statisticians.
    Why do I pool all my additions into a 50 ml stock solution, rather than adding them individually to each well?
    Surely, the addition errors of adding 9 fractions of 50 ul, which constitute the assay mixture, would even-out? Surely, making up one stock solution and using that throughout increases the chance of a systematic error?

    So why do I do it?

    Why don’t I add the individual components?

    The assay does not start until everything has been added, so it is not slight differences in starting time that worries me.

    Why do I ALWAYS pool my assay solutions and make sure that I design my experiment so that add only one volume to one one volume?

    According to you statistical analyst’s, it should really make no difference.

  25. Re: DocMartyn (Jul 20 12:01),

    Why do I ALWAYS pool my assay solutions and make sure that I design my experiment so that add only one volume to one one volume?

    According to you statistical analyst’s, it should really make no difference.

    Of course it would make a difference. In your experiment, the pipetting error is 1% and if you were to pipette the components separately, the variances would add to increase the overall error. You would have increased uncertainty in both the dilution factor and the component ratios, not to mention homogeneity. That is completely different from lucia’s thought experiment where there is only one measurement per cup with a relative uncertainty of 0.01% in the measurement.

    I’m not a statistician or statistical analyst, I’m an analytical chemist. I’m well aware of propagation of errors in real measurements. I even did thermometer calibrations against an NIST calibrated standard platinum resistance thermometer using a water triple point cell to correct for offset errors. So I know something about millidegree precision temperature measurements too.

  26. Why do I pool all my additions into a 50 ml stock solution, rather than adding them individually to each well?

    No idea. You have provided a lot of picky details, but have not told us what goal you have for your experiment. That is: I have no idea what observation you wish to report nor do I know what statistics you wish to report about it.

    You’re also discussing things that are familiar to you but not necessarily other people.
    I also don’t know what it means to “dispersed the cells”. Do you mean you suck up 50 ml of stuff from 8 wells and the spit that stuff out onto a plate making 8 little dots? Or did you blend all of this on one plate? Or what?

    You say “I examined the trace and typically used the rate between 6 and 16 minutes, but in one case I used the rate between 8 and 18 minutes.” I have no idea what “examining the trace” means. Did you record some number for something? (A time? What?)

    According to you statistical analyst’s, it should really make no difference.

    I have no idea whether according to our statistical analysis whatever you do or don’t do should make a difference nor what difference it would make. I can’t even tell if the details you are discussing have something to do with the statistics or not.

    I know that when you were discussing the simpler problem of the cups, you brought up features that have nothing to do with the statistical issue we are focusing on. They have to do with whether or not the uncertainty in the temperature measurements were realistic given current technology, and whether you could envision someone being able to take all the measurements on a time scale in which temperature didn’t change.

    Those are all factors that could affect whether we could measure the temperature in each cup to ±0.05C , but they have nothing to do with estimating the uncertainty in the average given a known uncertainty of ±0.05C.

    That said: though I’m not sure I understand what you are doing, mentioning “twins” makes me suspect you have pairs of samples. So:
    (1a, 1b)
    (2a, 2b)
    (3a, 3b)

    You are trying to set up an experiment where the only difference between the “a” and the “b”s is whatever thing you are trying to study. Other than that you want twins “1” to be as identical as possible. So, by mixing whatever it this “stock solution” is, you are ensuring that even if your mixology was imperfect the solution added to (1a and 1b) is as similar as possible.

    But if this is so, it has nothing to do with the cup problem. We could set up a simpler cup problem for you if you wished. But first: do I get the gist of what you are tyring to do? Or have i been so utterly confused that I can’t figure out whether you are discussing a forest or a fruit tree farm based on your discussion of what you did to a bunch of leaves?

  27. Re: lucia (Jul 20 12:50),

    I know what he’s doing and it’s vastly more complicated than measuring temperature. He’s measuring the concentration of an enzyme, lactate dehydrogenase, by measuring the rate of change of fluorescence of NADH (the reduced form of nicotinamide adenine dinucleotide). Kinetic methods are notoriously picky. That’s why you measure a large number of replicates in 96 well plates and throw out a lot of those. Measuring temperature to millidegree precision is trivial by comparison.

  28. not quite DeWitt, I use the plant enzyme diaphorase to reduce nonfluorescent resazurin, with the generated NADH, to highly red fluorescent resorufin.
    This increases the signal to noise by about 3-4 orders of magnitude, given all sorts of crap absorbs in the uv, and we are clear in the red.

  29. DeWitt

    Measuring temperature to millidegree precision is trivial by comparison.

    This makes given an example based on temperatures more suited for elucidating statistical issues. Whatever “X” is, using simple things lets you discuss individual contributions to error in “X” separately. You can always drill down later. (For example, in the temperature problem above, we can also discuss how we know the uncertainty in an individual measurement etc. )

  30. Sorry Lucia, I was trying to make a point from the other end, I will reply later (probably tomorrow).

    However, I was responding to this:-

    “Those are all factors that could affect whether we could measure the temperature in each cup to ±0.05C , but they have nothing to do with estimating the uncertainty in the average given a known uncertainty of ±0.05C. ”

    You see, I don’t live in a world where I have “given a known uncertainty of ±0.05C”.
    You see, in my world, I can never believe any known uncertainty. My balance comes with a set of weights. Every 4-6 months I put by powder free gloves on and weigh each one of my (very expensive weights).
    After this, I place a weighing boat into the balance and dispense ‘known’ amounts of distilled water into the boat. This way I can check the calibration. Normally, I have to do an adjustment, about 2-3%; but I have fixed ones that have been out by 10%.
    I use a neutral density filter to check my spectrophotometer.
    I just invented a method to calibrate fluorescence microscopes.
    I don’t live in your cup world, I have never had an instrument that was ±0.05 in a 0-100 scale.
    I live in a world where we place out thermometer in boiling DDI water and in DDI iced water to make damned sure they read properly. On the whole, alcohol thermometers are pretty crap, at least compared to the good old mercury ones.
    I live in a world where we don’t trust, but we do verify.

  31. Re: DocMartyn (Jul 20 15:12),

    My balance comes with a set of weights. Every 4-6 months I put by powder free gloves on and weigh each one of my (very expensive weights).

    I doubt that’s often enough. You should be measuring a reference weight daily or weekly and keeping a control chart. Analytical balances have a potential precision of about 1 ppm (0.0001 mg resolution and a range of 200 g). So part of your world does contain high precision measurements.

    After this, I place a weighing boat into the balance and dispense ‘known’ amounts of distilled water into the boat. This way I can check the calibration. Normally, I have to do an adjustment, about 2-3%; but I have fixed ones that have been out by 10%.

    A weighing boat, really? What about evaporation? You should be following ISO 8655-6:2002 or some other standard procedure. Gilson has a kit you could use. When you calibrate your volumetric ware, you also didn’t mention buoyancy or temperature correction. The density of the calibration weights is 8 g/cm³ while water has a density of ~1 g/cm³. Depending on the air density, that can make a difference at the parts per thousand level or lower. It probably won’t make a significant difference for microliter pipettes, but it certainly will for a 50 mL volumetric flask. The thermal expansion coefficient of water is ~2E-04/C.

    A delivered volume error of 2-3% is unacceptably bad for an air displacement pipette. For example, the Gilson Pipetman®F 50 μL air displacement pipette specifications has a systematic error of ±0.40 μL while the random error is ≤0.15μL. If you’re not getting that, then your pipette needs to be repaired or replaced or your technique is really bad. Usually cleaning and greasing the plunger and changing the filter will fix a pipette.

  32. Doc,

    I don’t live in your cup world, I have never had an instrument that was ±0.05 in a 0-100 scale.

    I’ve known scales that can weigh to ±5 lbs over 100 lbs.

    live in a world where we place out thermometer in boiling DDI water and in DDI iced water to make damned sure they read properly.

    I’ve done that for some experiments too. Others, no. I don’t know what the scale is for measuring temperature, but I think room temperature can often be reported to ±1K even though the full “scale” might be thought to be 300K.

    I appreciate that getting low uncertainties in individual measurements can be difficult– particularly if you work with very small quantities of stuff. But that’s not especially important to the topic of the post itself. I appreciate that you might not trust, you want to verify. But I don’t see how that is relevant to the specific issue discussed in the blog post.

    If you want me to understand how your concerns about individual measurements is relevant, you are going to have to say so in a direct fashion instead of explaining how difficult it is to take accurate and precise measurements in your field. Yes. It may well be. So?

  33. Re: DocMartyn (Jul 20 15:12),

    I live in a world where we place out thermometer in boiling DDI water and in DDI iced water to make damned sure they read properly. On the whole, alcohol thermometers are pretty crap, at least compared to the good old mercury ones.

    Alcohol thermometers are indeed not very good. Thread separation is common. The freezing point of DDI water is a good reference point, especially if you have a slurry of crushed ice and water. Boiling water is not. Condensing steam is better, but you still have to correct for barometric pressure. You also have to know whether your thermometer is calibrated for partial or total immersion. Even if it’s calibrated for partial immersion, you still may have to do stem correction for the part of the column that’s not immersed.

  34. I came into this late because I actually spent a lot of time studying Pat’s 2 papers instead of going off half cocked.

    For the analytical chemist bloggers, there is a most important paper, “Evaluation of Lunar Elemental Analyses” by George H Morrison of Cornell, “Analytical Chemistry”. 43, 7, June 1971 starting at p 23A.

    Few materials were as rare and important as the approx 800 lb of moon rock and soil brought back on Apollo missions. The paper summarises the work of over 30 chosen laboratories using a selection of methods to analyse finally some 80 elenents.

    Unfortunately, sometimes the within-lab variance was greater than the between-lab variance. This is relevant to Pat Frank’s observations. Relative standard deviations ranged from 1.8% for Aluminium to 16% for Zirconium, in one table. Pat is saying (unless he contadicts me) that the achievable measurement of station and global temperatures contains errors that that have been forgotten, underestimated or combined wrongly into the final estimate.

    Therefore, the glasses of water experiment assumes a lesser importance, because measurement trumps model (although of course, there are some principles that are unchallenged in model work with sufficient caveats).

    One cannot use the moon rock analysis too closely, although it was the best work at the time, because thermometer temperatures have an unrestricted theoretical range, while an exact chemical analysis set has to add to 100%. Therefore, different data distributions are indicated.

    The moon rock paper is a good read because it brings one back to the world of practical achievement as opposed to wishful performance. I’ve advised Pat of another error that seems to have been missed in the global data set, possibly about 10% more to be added to his variance estimates. Unfortunately, we cannot reverse engineer a correction for this one because we lack a link to the time of a past event, but we can now estimate its magnitude since instrument use has changed since mercury days. Because in general reverse engineering uses a lot of subjectivity, it would be wise to look at actual performance rather than theorised performance capability.

  35. Geoff–
    What you observe is not relevant to Pat’s fundamental mistake encapsulated in what he write in case 2.

    That there might be some unrecogized problems that result in error is true. But this is no excuse for computing the error from recognized factors incorrectly– as Pat does by infecting his entire analysis with the mistake in case 2. The uncertainty due to recognized factors need to be estimated carefully and more important correctly. These are estimated incorrectly in Pat’s paper as a result of his fundamental mistake in case 2 which pervades his analysis. His error is to mistake variance across the sample for uncertainty in the our estimate of the mean.

    He makes this mistake both explicitly (by saying so in paper 1), in application (by apply the equations in paper 1 to his results in paper 2) and in interpretation of his result (by– in his paper– interpreting the resulting uncertainties as if they are uncertainties in the mean– not merely the spread of temperatures).

    That he doesn’t see the mistake even more clearly when he switches from Temperature to a general property (the way I discuss it in my post) and that he seems to think my general criticism is somehow a symptom of my worrying about “climatological” significance of temperature shows just how confused he is.

    If you would look at case 2 in his paper 1 and address that instead of speculating that maybe there are other unrecognized errors, please do. Moving on to the possiblity that unknown errors might exist issues is not fruitful because the only thing we can say about those is that they would make it impossible to compute uncertainties. This is clealry not what Pat claims as he shows uncertainty intervals in his paper and he claims they come from somewhere.

    Until Pat recognizes that his case 2– and by extension case 3b is a blunder and admits those have affected his final results, there is no point in considering how unrecognized errors might modified his base results.

  36. This argument is rather like the prediction by modellers that there would be a tropospheric hot spot in the tropical atmosphere if gobal warming theory was correct. They set up a thought experiment that predicted a rise, but in the cold hard natural world, they found no to very small increase.

    One of your uncertainties is to use idealised thought experiments that have to be so tightly constrained that you reach a sometimes-used but incorrect variant of the Heisenberg Principle, that the intrusive act of measurement can affect the measurement result.

    I do not yet see the mistake to which you refer. I see instead a great many papers whose accuracy is artificially improved by incorrect use of statistics. I’ve read through tAV as well but I’m afraid I can’t see where you claim Pat went wrong. Any chance you spell it out in more detail?

  37. Geoff–
    The discussion of where Pat first went wrong is in a different post. It’s here.

    The post you are reading is a question for Pat to permit him to show what answer he would get on this question.

    Of course real experiments have issues not present in Gedanken experiments. Nevertheless, Gedanken experiments are useful for showing statistical concepts. It is not an uncertainty to use an idealized thought experiment to discover how someone like Pat would compute uncertainty intervals in a given, controlled situation where things are known.

  38. Please permit a couple of questions that could be asked of the following multiple-pendulum system.

    http://www.youtube.com/watch?v=yVkdfJ9PkRQ&feature=player_embedded

    (a) Can the position of a single pendulum be estimated at a given time?
    (b) Can the uncertaintty of that position be estimated at a given time?
    (c) Can a time series be constructed for the position, with uncertainly bars, of a single pendulum?
    (d) Can there be an estimate of the average position of the sum of all of the pendulums in the group at a given time?
    (e) Can there be an estimate of the uncertainty of that average position, formed into a time line with uncertainty bars?
    (f) Will the uncertanty estimate be constant in case (e)?

    Does this resemble the type of distinction that Pat and Lucia are debating?

  39. Geoff–
    Before we move on to b-c, I need you to clarify what you mean by question (a) because I want to be sure I engage what you mean by “estimate” correct. It could be “estimate” in the sense of predict or “estimate” in the sense of “by observing or measuring”.

    Do you mean: If I were to freeze the frames in the youtube video and digitize the frame and define a coordinate system with — say x pointing horizontal, y up and z into the image, could I estimate the position of the center of a particular sphere at a particular time?

    The answer to that; I could devise a method estimate the x and y positions based on observation (i.e. measure it?). I can’t estimate the z. (FWIW, if that’s the question, the answer to b is yes. The answer to c is yes. the answer to d is that I can estimate the average position and/or the sum of all positions the answer to (e) is I can find uncertainty intervals for the position of each average computed at any time and (e) probably yes, but I’m not sure. I can’t think of why the uncertainty would vary very much in time, but maybe I’m overlooking something subtle.

    The answer to the unlettered question: is ‘No, I think it doesn’t resemble the question Pat and I are debating.”

    The question pat and I are debating is how do we compute (d) the uncertainty in the “average of the position of all pendulumns”.

    How we compute (d) would affect our answers to (e).

  40. In the video, the guy starts the motion at time zero. At any time after that it’s relatively easy to estimate the position of an individual pendulum if you assume ideal pendulum behaviour. However, these do not seem to be ideal because near the end of the video, they more or less line up, but not so exactly. So, the condition is like a proxy construction from a measured time period (the length of the viodeo, here). Can we predict this fascinating behaviour into the future? Let’s say yes, for the purposes of the following questions and reserve discussion for the no case.
    You say “I can’t think of why the uncertainty would vary very much in time, but maybe I’m overlooking something subtle.” I’m trying to phrase a way to say that the variance looks low when all of the pendums are in a line, but higher when they are all over the place. It depends on what you mean by variance, so we are back to what you write “The question pat and I are debating is how do we compute (d) the uncertainty in the “average of the position of all pendulumns”.
    I do not know the answer, but I thought it might be helpful to provide a visual that illustrates the question.

  41. Geoff-

    Can we predict this fascinating behaviour into the future?

    I don’t know how well we could do that. But that has nothing to do with the discussion with Pat which related to claims about uncertainty in measured things.

    I’m trying to phrase a way to say that the variance looks low when all of the pendums are in a line, but higher when they are all over the place.

    The variance in the distribution of locations is high when the spheres are all over place. However, the uncertainty in the location in the center of mass of all spheres collectively is not much higher when they are in different locations. These are two different things.

    Pat is calling the variance in the locations the ‘uncertainty’ in the location of the mean of position of the center of mass of the “N” sphere. The the variance in the position of the center of mass is not the uncertainty in the mean of the center of mass. Any engineer who needed to know the position center of mass of the entire collection of sphere for an application further down the road and who mistook the variance for the uncertainty would be incompetent.

    Interestingly enough, in your problem, if we have measurements from the image, the variance of the center of masses of the N spheres is is clearly not uncertainty in the measured position of any individual center of mass. We would determine the measured position by understanding our digitization process not by computing the varince in the center of masses of the N spheres.

    So, the variance in the center of mass of each sphere is barely involved in the calculation of the uncertainty in the center of mass of the spheres. (If we had no measurements, but only knew the experiment was underway, then the variance would be involved in estimating the uncertainty of the positions. But that is in no way analogous to anything being discussed by Pat or me. Pat is reporting uncertainties in things that were observed, not things that were not observed. )

Comments are closed.