I. Specific Question (Interpreting Results-confounding and variance reduction)
In reviewing Thursday’s lecture and rerunning the SAS code on the datasets, I’m struggling for the interpretation of the SAS results.
First, we ran analysis on the dataset ancova_confound with three variables: outcome, confound and group. For the univariate analysis, we obtained:
Source DF Type III SS Mean Square F Value Pr > F
group 1 5452.345600 5452.345600 208.84 <.0001
To interpret this, would we say: We find a significant difference in the outcome between group 1 and 2?
Then for the results of the proc glm, we obtained:
Source DF Type III SS Mean Square F Value Pr > F
group 1 0.119511 0.119511 0.11 0.7384
confounder 1 2455.264537 2455.264537 2304.38 <.0001
How do you interpret the first line? I don’t know how to interpret and I’m not sure what we’re looking at?
To interpret the second line, would we say: We find a significant difference between the outcome between group 1 and 2 after controlling for the confounding variable (or after controlling the variability between the groups with the covariate)?
Then, we ran analysis on the dataset ancova_reduce with three variables: outcome, pretest and group. For the univariate analysis, we obtained:
Source DF Type III SS Mean Square F Value Pr > F
group 1 3.02760000 3.02760000 0.12 0.7283
To interpret this, would we say: We don’t find a significant difference in the outcome between group 1 and 2?
Then for the results of the proc glm, we obtained:
Source DF Type III SS Mean Square F Value Pr > F
group 1 26.750838 26.750838 32.50 <.0001
pretest 1 2365.704678 2365.704678 2874.09 <.0001
How do you interpret the first and second line? For these results I understand we are working with the concept of variance reduction and maybe the interpretation lies somewhere in that the pretest and then grouping results in a reduction in the variance?
II. General Question (Data snooping or legitimate research?)
Based on the research methods classes I have taken, there seems to be a fine line with regards to data snooping and conducting legitimate research and there seems to be a strong ethical component in the discussion of data snooping.
I think an example of data snooping is where the researcher has a dataset but doesn’t exactly know what he is looking for so he runs analysis on the data, comes up with some significant results and then builds theory from there. He forms the hypotheses after running analysis.
For conducting legitimate research, I think the researcher has theory and an idea of what he thinks about a subject matter, then builds hypotheses, and then runs analysis on the dataset to determine if what he thinks is correct. I think the researcher might have to run considerable analysis on the data so he covers all angles with respect to the subject matter but I wouldn’t think this would be considered data snooping.
In both of these cases, do the intentions of the researcher come into play to determine if it is data snooping or legitimate research? Is it data snooping if the researcher didn’t know he was data snooping? Is it easy for an experienced editor/reviewer of a journal to spot data snooping? In one of the research methods classes I took, we had to read the ethical standards of membership for the Academy of Management and that was the only place it discussed the idea of data snooping but it was vague. I have also not seen any mention of data snooping in the submission guidelines for journals. So there doesn’t seem to be strict rules on this.