{"id":21160,"date":"2012-10-22T15:12:45","date_gmt":"2012-10-22T21:12:45","guid":{"rendered":"http:\/\/rankexploits.com\/musings\/?p=21160"},"modified":"2012-10-22T15:12:45","modified_gmt":"2012-10-22T21:12:45","slug":"pooled-method-of-testing-a-projection","status":"publish","type":"post","link":"https:\/\/rankexploits.com\/musings\/2012\/pooled-method-of-testing-a-projection\/","title":{"rendered":"Pooled Method of Testing a Projection"},"content":{"rendered":"<p>The purpose of this post is to discuss a method of testing a projection that I think has greater power than testing trends alone. (Yes, Paul_K. We discussed this before. I now think it can be done!! \ud83d\ude42 ) To discuss this, I&#8217;m going to discuss how I might test a hypothetical prediction. Suppose I&#8217;ve been winning 0 quatloos for a long time, and there is no trend in my quatloo winning. If I do nothing, most people assume I&#8217;m going to continue to win nothing. Below, to the right of the vertical red line I have plotted my quatloo winnings in each bet (blue circles), along with the mean value  and the trend in time (solid blue line):<\/p>\n<p><a href=\"http:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/BareProjection.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/BareProjection-500x500.png\" alt=\"\" title=\"BareProjection\" width=\"500\" height=\"500\" class=\"aligncenter size-medium wp-image-21190\" srcset=\"https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/BareProjection-500x500.png 500w, https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/BareProjection-300x300.png 300w, https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/BareProjection.png 1008w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><br \/>\nThe dashed lines represent the standard error in the mean quatloo winnings per bet.<\/p>\n<p>To the right of the red line, is my <I>prediction<\/i> of number of quatloos I will earn in the future if I don&#8217;t come up with some nifty system to improve my betting performance.<\/p>\n<p>As you can see, in this example, I am predicting the future &#8220;Quatloo anomaly&#8221; will be zero. The uncertainty intervals are the uncertainty in the <i>mean over the baseline<\/i> (left) and the uncertainty in the <i>mean over the forecast period<\/i> which has not yet occurred.  <\/p>\n<p>We&#8217;ll now pretend I concoct a &#8216;system&#8217; that I think will improve my ability to win Quatloos. It will take practice, so I think my winnings will tend to increase at a rate of Q(i)=mi where &#8216;m&#8217; is the trend.  Of course, my null  hypothesis remains m=0, and I don&#8217;t really know how large &#8216;m&#8217; is. I merely hope it&#8217;s positive.<\/p>\n<p>I&#8217;m going to use my system for 120 plays. At the end of that time, I want to test the null hypothesis that <\/p>\n<ol>\n<li>I will earn 0 Quatloos each month in the future period. That is E[Q]=0<\/li>\n<li>The trend m is zero. That is m=0.  <\/li>\n<\/ol>\n<p>I can test either using an appropriate &#8216;t&#8217; test. Each would involve finding a &#8216;t&#8217; value for the data in the upcoming period and comparing that to a student t-distribution. I can call the two t value t<sub>mean<\/sub> and t<sub>trend<\/sub>.  <\/p>\n<p>I now have two possible tests. Strictly speaking, I am testing two separate hypotheses both of which are related since the amount to &#8220;the system doesn&#8217;t work&#8221;. So, I really only want 1 test. <\/p>\n<p>If I&#8217;m limited to one test or the other,  I&#8217;d really like is to pick <i> the more statistically powerful<\/I> of the two tests.  That is: When applied at the same statistical significance (that is a pre-defined type I error rate) , I&#8217;d like the test that results in the greatest statistical power (which is the same as the lowest type II error rate.)  <\/p>\n<p>Oddly enough, I also know that I can show the <i>errors<\/i> in both tests are independent.  That is, if the null hypotheses are true and both m=0 and E[Q]=0, then the slight deviations in the actual &#8216;m&#8217; and average quatloo winnings I will observe during the upcoming periods are statistically independent. (This was discussed here <a href=\"http:\/\/rankexploits.com\/musings\/2011\/whats-uncorrelated-with-what-for-paulk\/\">here<\/a>.)<br \/>\nSo the fact that there are two methods and the errors in the two methods are statistically independent leads me to an interesting situation. <\/p>\n<p>There should be some method of combining the tests to create a pooled test that is more powerful than either alone.  That is, there should be a method of creating a pooled <\/p>\n<p>t<sub>pooled<\/sub>=w<sub>trend<\/sub>t<sub>trend<\/sub>+w<sub>mean<\/sub>t<sub>mean<\/sub><\/p>\n<p>with weights selected such that the standard deviation of the pooled ts is 1 and the statistical power of the test is maximized. (Note that the standard deviations of t<sub>trend<\/sub> and t<sub>mean<\/sub> are one by definition. That&#8217;s a requirement for t values.) <\/p>\n<p>So, I concocted the following rule, which seems to work:<\/p>\n<ol>\n<li>Define a effect size for an alternate hypothesis: that is the hypothesis that might be true. This can be any value other than zero since the magnitude cancels: I pick m<sub>e<\/sub>=0.1 quatloo\/decade. If that hypothesis is true, I find I will win Q=m*120 months quatloos.  <\/li>\n<li>Based on data available prior to the forecast period, estimate the &#8216;t<sub>e,m<\/sub>&#8216; and t<sub>e,mean<\/sub> that would exist <I>if<\/I> this size effect is observed during the upcoming prediction period. To estimate these I need to estimate the variability of 120 month trends given data prior to the forecast periods, and I need to estimate the variability in the difference between the mean in the prediction and baseline period. (Those are provided in the figure and are based on the noise prior to the forecast.)  <\/li>\n<li>Compute relative weights for each test variable &#8216;i&#8217; as as w<sub>i<\/sub>= t<sub>e,i<\/sub>^2\/ sum(t<sub>e,j<\/sub>^2). (Notice the size effect cancels out when the t&#8217;s are normalized.)<\/li>\n<li>Collect data during the sample period. Afterwards, compute the t value based on the sample data (using normal methods.)  Then create a pooled t using t<sub>pooled<\/sub>=w<sub>trend<\/sub>t<sub>s,trend<\/sub>+w<sub>mean<\/sub>t<sub>s,mean<\/sub>. Here the &#8216;s&#8217; denotes sample data. <\/li>\n<\/ol>\n<p>I&#8217;ve done a number of test and this weighting does appear to <I>always<\/I> result in a more powerful test than <i>either<\/i> the trend or mean tests alone provided the weights are based on the size effect <I>during the forecast period<\/i> and the individual t-tests properly account for all errors.   For the current case, I collected data on Quatloo winnings and the results are as below:<\/p>\n<p><a href=\"http:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/TestPooled.png\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/TestPooled-500x500.png\" alt=\"\" title=\"TestPooled\" width=\"500\" height=\"500\" class=\"aligncenter size-medium wp-image-21191\" srcset=\"https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/TestPooled-500x500.png 500w, https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/TestPooled-300x300.png 300w, https:\/\/rankexploits.com\/musings\/wp-content\/uploads\/2012\/10\/TestPooled.png 1008w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>The actually important part of the graph is the information in the lower left. What I want you to notice is the based on synthetic data I created,  &#8216;t&#8217; for the mean anomaly test was 18.44.  The t for the trend test was 12.57.  I weighted the two t&#8217;s using relative weights of (0.66,0.34)\/sqrt(0.66^2+0.34^2).  this resulted in a t<sub>pooled<\/sub> of 22.13.  The significant feature is 22 is larger than either 18.4 or 12.6. This isn&#8217;t important for the synthetic data I show which has so much power it&#8217;s ridiculous. But the feature is very important if we have <I>just barely enough data<\/I> to generate sufficient power to hope for &#8220;fail to rejects&#8221;. <\/p>\n<p> Synthetic tests using 10000 repeat trials indicate that this pooled t has the proper characteristics and is&#8211; on average&#8211; larger than either of the two other &#8216;t&#8217;. That means the resulting test has higher power than either the test comparing trends or the test for comparing anomalies alone. Consequently, it is preferred. <\/p>\n<p>So, this is sort of a &#8220;goldilocks&#8221; test! <\/p>\n<p>I know this is sketchy. I&#8217;ll be happy to answer questions. But this post is mostly so I have a place holder to remember what I did. (I uploaded the really badly organized script with embarrassing incomprehensible notes and a bunch of side tests. And&#8230; yes&#8230; this can be applied to model tests. I haven&#8217;t done it yet because I hadn&#8217;t verified I knew how to weight the two tests. Did model tests with weights of (0.5, 0.5 )\/sqrt(0.5). Now I need to repeat that with the correct weights. \ud83d\ude42  <\/p>\n<p>==========<br \/>\n<b>Note:<\/b>:I&#8217;ve discussed this before and  <a href=\"http:\/\/rankexploits.com\/musings\/2011\/relative-statistical-power-of-3-tests\/\">Paul_K was dubious.<\/a>  So, this result is something of a turn about because while the first test I though up way back then did not  seem to optimize this one does. I didn&#8217;t upload that script, so I&#8217;m not sure if I applied the test entirely properly in that post. I may have forgotten to account for the &#8216;running start&#8217; aspect of testing the means making the test of the means appear more powerful than it ought to.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The purpose of this post is to discuss a method of testing a projection that I think has greater power than testing trends alone. (Yes, Paul_K. We discussed this before. I now think it can be done!! \ud83d\ude42 ) To discuss this, I&#8217;m going to discuss how I might test a hypothetical prediction. Suppose I&#8217;ve &hellip; <a href=\"https:\/\/rankexploits.com\/musings\/2012\/pooled-method-of-testing-a-projection\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Pooled Method of Testing a Projection<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-21160","post","type-post","status-publish","format-standard","hentry","category-statistics"],"_links":{"self":[{"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/posts\/21160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/comments?post=21160"}],"version-history":[{"count":0,"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/posts\/21160\/revisions"}],"wp:attachment":[{"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/media?parent=21160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/categories?post=21160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rankexploits.com\/musings\/wp-json\/wp\/v2\/tags?post=21160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}