Adjusting for Multiple Comparisons
Frequently studies report results that are not the primary or secondary outcome measures—sometimes because the finding is not anticipated, is unusual or judged to be important by the authors. How should these findings be assessed? A common belief is that if outcomes are not pre-specified, serious attention to them is not warranted. But is this the case? Kenneth J. Rothman in 1990 wrote an article that we feel is very helpful in such situations.
- Rothman points out that making statistical adjustments for multiple comparisons is similar to the problem of statistical significance testing where the investigator uses the P-value to estimate the probability of a study demonstrating an effect size as great or greater than the one found in the study, given that the null hypothesis is true—i.e., that there is truly no difference between the groups being studied (with alpha as the arbitrary cutoff for clinical significance which is frequently set at 5%). Obviously if the risk for rejecting a truly null hypothesis is 5% for every hypothesis examined, then examining multiple hypotheses will generate a larger number of falsely positive statistically significant findings because of the increasing number of hypotheses examined.
- Adjusting for multiple comparisons is thought by many to be desirable because it will result in a smaller probability of erroneously rejecting the null hypothesis. Rothman argues this “pay for peeking” at more data by adjusting P-values with multiple comparisons is unnecessary and can be misleading. Adjusting for multiple comparisons might be paying a penalty for simply appropriately doing more comparisons, and there is no logical reason (or good evidence) for doing statistical adjusting. Rather, the burden is on those who advocate for multiple comparison adjustments to show there is a problem requiring a statistical fix.
- Rothman’s conclusion: It is reasonable to consider each association on its own for the information it conveys—he believes that there is no need for adjusting P-values with multiple comparisons.
Delfini Comment: Reading his paper is a bit difficult, but he make some good points about our not really understanding what chance is all about and that evaluating study outcomes for validity requires critical appraisal for the assessment of bias and other factors as well as the use of statistics for evaluating chance effects.
Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990 Jan;1(1):43-6. PubMed PMID: 2081237.