in

D66201843 I. Confounder/variance reduction-results II. Data snooping or legitimate research?

Last post 09-28-2008 6:21 PM by pwestfal. 1 replies.
Page 1 of 1 (2 items)
Sort Posts: Previous Next
  • 09-26-2008 10:11 AM

    D66201843 I. Confounder/variance reduction-results II. Data snooping or legitimate research?

    I. Specific Question (Interpreting Results-confounding and variance reduction)

    In reviewing Thursday’s lecture and rerunning the SAS code on the datasets, I’m struggling for the interpretation of the SAS results. 

    First, we ran analysis on the dataset ancova_confound with three variables: outcome, confound and group.  For the univariate analysis, we obtained:

    Source  DF     Type III SS     Mean Square    F Value    Pr > F

     

    group   1     5452.345600     5452.345600     208.84    <.0001

    To interpret this, would we say: We find a significant difference in the outcome between group 1 and 2?

     

    Then for the results of the proc glm, we obtained:

    Source     DF     Type III SS     Mean Square    F Value    Pr > F

     

    group      1        0.119511        0.119511       0.11    0.7384

     confounder 1     2455.264537     2455.264537    2304.38    <.0001

    How do you interpret the first line?  I don’t know how to interpret and I’m not sure what we’re looking at?

    To interpret the second line, would we say:  We find a significant difference between the outcome between group 1 and 2 after controlling for the confounding variable (or after controlling the variability between the groups with the covariate)?

     

    Then, we ran analysis on the dataset ancova_reduce with three variables: outcome, pretest and group.  For the univariate analysis, we obtained:

    Source DF     Type III SS     Mean Square    F Value    Pr > F

     group  1      3.02760000      3.02760000       0.12    0.7283

    To interpret this, would we say: We don’t find a significant difference in the outcome between group 1 and 2?

     

    Then for the results of the proc glm, we obtained:

    Source  DF     Type III SS     Mean Square    F Value    Pr > F

     

    group   1       26.750838       26.750838      32.50    <.0001

     pretest 1     2365.704678     2365.704678    2874.09    <.0001

    How do you interpret the first and second line?  For these results I understand we are working with the concept of variance reduction and maybe the interpretation lies somewhere in that the pretest and then grouping results in a reduction in the variance? 

     II. General Question   (Data snooping or legitimate research?)

    Based on the research methods classes I have taken, there seems to be a fine line with regards to data snooping and conducting legitimate research and there seems to be a strong ethical component in the discussion of data snooping. 

     

    I think an example of data snooping is where the researcher has a dataset but doesn’t exactly know what he is looking for so he runs analysis on the data, comes up with some significant results and then builds theory from there.  He forms the hypotheses after running analysis. 

    For conducting legitimate research, I think the researcher has theory and an idea of what he thinks about a subject matter, then builds hypotheses, and then runs analysis on the dataset to determine if what he thinks is correct.  I think the researcher might have to run considerable analysis on the data so he covers all angles with respect to the subject matter but I wouldn’t think this would be considered data snooping. 

     

    In both of these cases, do the intentions of the researcher come into play to determine if it is data snooping or legitimate research?  Is it data snooping if the researcher didn’t know he was data snooping?  Is it easy for an experienced editor/reviewer of a journal to spot data snooping?   In one of the research methods classes I took, we had to read the ethical standards of membership for the Academy of Management and that was the only place it discussed the idea of data snooping but it was vague.  I have also not seen any mention of data snooping in the submission guidelines for journals.  So there doesn’t seem to be strict rules on this.   

     

  • 09-28-2008 6:21 PM In reply to

    Re: D66201843 I. Confounder/variance reduction-results II. Data snooping or legitimate research?

    Anonymous:
    I. Specific Question (Interpreting Results-confounding and variance reduction)

    In reviewing Thursday’s lecture and rerunning the SAS code on the datasets, I’m struggling for the interpretation of the SAS results. 

    First, we ran analysis on the dataset ancova_confound with three variables: outcome, confound and group.  For the univariate analysis, we obtained:

    Source  DF     Type III SS     Mean Square    F Value    Pr > F

     

    group   1     5452.345600     5452.345600     208.84    <.0001

    To interpret this, would we say: We find a significant difference in the outcome between group 1 and 2?

     

    Then for the results of the proc glm, we obtained:

    Source     DF     Type III SS     Mean Square    F Value    Pr > F

     

    group      1        0.119511        0.119511       0.11    0.7384

     confounder 1     2455.264537     2455.264537    2304.38    <.0001

    How do you interpret the first line?  I don’t know how to interpret and I’m not sure what we’re looking at?

    To interpret the second line, would we say:  We find a significant difference between the outcome between group 1 and 2 after controlling for the confounding variable (or after controlling the variability between the groups with the covariate)?

     

    Then, we ran analysis on the dataset ancova_reduce with three variables: outcome, pretest and group.  For the univariate analysis, we obtained:

    Source DF     Type III SS     Mean Square    F Value    Pr > F

     group  1      3.02760000      3.02760000       0.12    0.7283

    To interpret this, would we say: We don’t find a significant difference in the outcome between group 1 and 2?

     

    Then for the results of the proc glm, we obtained:

    Source  DF     Type III SS     Mean Square    F Value    Pr > F

     

    group   1       26.750838       26.750838      32.50    <.0001

     pretest 1     2365.704678     2365.704678    2874.09    <.0001

    How do you interpret the first and second line?  For these results I understand we are working with the concept of variance reduction and maybe the interpretation lies somewhere in that the pretest and then grouping results in a reduction in the variance? 

     

    Please review the audio – the interpretations are given there.  But as always, a small p-value means here is evidence of an effect.  If the “group” test is significant (p<.05), then we conclude that the difference between the groups that is larger than can be explained by chance alone. 

     

    If there is a covariate in the model, and if the result is significant, then we conclude that the difference is real after controlling for the covariate (refer back to the minority/nonminority discussion, and how we wish to control for the effect of time (year).

     

    Any significance for the covariate (either pretest or confounder in those examples) refers to the fact that the covariate is significantly related to the response.  It does not say anything about whether the groups differ.

     

    If the p-value is less than .05, then we do not find a significant difference between groups.   If there is a covariate in the model, then we say that we do not find a significant difference between the groups after controlling for the covariate.

     

     

    You might also look at the "solutions" output - it's just like regression.  So if you know how to interpret regression output, then you know how to interpret glm output.

     

     

    It might not be a bad idea to come by my office sometime and discuss all this.  It's pretty important, as your question suggests.

     

    100 100 100 80

     II. General Question   (Data snooping or legitimate research?)

    Based on the research methods classes I have taken, there seems to be a fine line with regards to data snooping and conducting legitimate research and there seems to be a strong ethical component in the discussion of data snooping. 

     

    I think an example of data snooping is where the researcher has a dataset but doesn’t exactly know what he is looking for so he runs analysis on the data, comes up with some significant results and then builds theory from there.  He forms the hypotheses after running analysis. 

    For conducting legitimate research, I think the researcher has theory and an idea of what he thinks about a subject matter, then builds hypotheses, and then runs analysis on the dataset to determine if what he thinks is correct.  I think the researcher might have to run considerable analysis on the data so he covers all angles with respect to the subject matter but I wouldn’t think this would be considered data snooping. 

     

    In both of these cases, do the intentions of the researcher come into play to determine if it is data snooping or legitimate research?  Is it data snooping if the researcher didn’t know he was data snooping?  Is it easy for an experienced editor/reviewer of a journal to spot data snooping?   In one of the research methods classes I took, we had to read the ethical standards of membership for the Academy of Management and that was the only place it discussed the idea of data snooping but it was vague.  I have also not seen any mention of data snooping in the submission guidelines for journals.  So there doesn’t seem to be strict rules on this.   

    All good points.  It would be nice to see some of those ethical guidelines here.  The issue of data snooping really does dovetail with researcher incentives; see the paper that I posted by Edward L. Glaeser . In light of the data snooping concerns, it becomes especially important to have a solid theory to back up results.  It is difficult to establish whether results are real or just data-snooping artifacts in practice, so replication of studies is an essential practice to establish scientific validity of conclusions.


    That having been said, in my experience the bigger problem with students is that they haven’t done enough playing around with data analysis, so they don’t understand how the software works well enough, they don’t have experience with data to klnow how to anticipate how things will look, and they don’t have the breadth to know what questions to ask or what theories to develop.  So, at this stage, I have to exhort you (all of you) to do a lot more playing around with data.  Be curious!  Explore!  

    100 90 90 100

    Professor
Page 1 of 1 (2 items)
Powered by Community Server (Commercial Edition), by Telligent Systems