in

K65419096 I. Stratification II. Multiple Testing Problem and Data Mining

Last post 09-28-2008 6:42 PM by pwestfal. 1 replies.
Page 1 of 1 (2 items)
Sort Posts: Previous Next
  • 09-26-2008 10:58 PM

    K65419096 I. Stratification II. Multiple Testing Problem and Data Mining

    I. During Thursday’s class, you discussed stratification of groups.  Is this related to quota sampling?  If we couldn’t survey every single graduate of graduate school but could only survey a small percentage, and we still wanted to know about ethnic groups, we would make sure that we “oversampled” ethnic minorities.  And then, more importantly, is this tying to the variance reduction example you went into next in the lecture because each group is more alike with each other than with the population as a whole, thus each group will have less variance? 

     

    II. There was a good article about debunking equity market calendar effects and I liked they way the authors put it – the probability of the union of a set of events is smaller than (or equal to) the sum of their probabilities.  Hence the experiment-wide alpha <= each alpha times # tests.  But now anything related to data mining makes me suspicious.  Is the familywise error rate appropriately taken into account in this methodology?

     

    Ref: Greenstone, M. and P. Oyer. “Are there Sectoral Anomalies Too?  The Pitfalls of Unreported Multiple Hypothesis Testing and a Simple Solution,” Review of Quantitative Finance and Accounting, 15 (2000): 37-55  
  • 09-28-2008 6:42 PM In reply to

    Re: K65419096 I. Stratification II. Multiple Testing Problem and Data Mining

    Anonymous:
    I. During Thursday’s class, you discussed stratification of groups.  Is this related to quota sampling?  If we couldn’t survey every single graduate of graduate school but could only survey a small percentage, and we still wanted to know about ethnic groups, we would make sure that we “oversampled” ethnic minorities.  And then, more importantly, is this tying to the variance reduction example you went into next in the lecture because each group is more alike with each other than with the population as a whole, thus each group will have less variance? 

    Sure, stratification reduces variance, but you only see the benefit if you include the stratification variable (ethnic group in your example) in the model.

    Seems more like a general question. Please, "dig in".  The specific question is supposed to be detailed and specific.

    100 90 70 40

    II. There was a good article about debunking equity market calendar effects and I liked they way the authors put it – the probability of the union of a set of events is smaller than (or equal to) the sum of their probabilities.  Hence the experiment-wide alpha <= each alpha times # tests.  But now anything related to data mining makes me suspicious.  Is the familywise error rate appropriately taken into account in this methodology?

    Ref: Greenstone, M. and P. Oyer. “Are there Sectoral Anomalies Too?  The Pitfalls of Unreported Multiple Hypothesis Testing and a Simple Solution,” Review of Quantitative Finance and Accounting, 15 (2000): 37-55  

    "Data mining" is completely distinct from "data snooping."  Data mining is essentially no different from regression analysis, as the goals isto obtain a predictive model.  And with massive amounts of data, you can use hold-out samples to make sure you aren't fooling yourself.  The methodology is quite sound, and obviously useful if you want to turn a profit.   

    "Data snooping" , on the other hand, refers to the attempt to discover artifacts through excessive and/or unreasonable data manipulation.  It implies a perspective on the part of the researcher that is more in line with selfish researcher goals than goals that are more likely to benefit others.  See the article Edward L. Glaeser that I posted on the web site.

    You might note the obvious connection between  "experiment-wide alpha <= each alpha times # tests" and the Bonferroni method that we discussed.   Note that FWER is "experiment-wide alpha".  Also, note that the terminology "experiment-wide alpha" is not standard.   

    100 90 100 90

    Professor
Page 1 of 1 (2 items)
Powered by Community Server (Commercial Edition), by Telligent Systems