Adjusting for Multiple Comparisons

Status

Adjusting for Multiple Comparisons

Frequently studies report results that are not the primary or secondary outcome measures—sometimes because the finding is not anticipated, is unusual or judged to be important by the authors. How should these findings be assessed? A common belief is that if outcomes are not pre-specified, serious attention to them is not warranted. But is this the case? Kenneth J. Rothman in 1990 wrote an article that we feel is very helpful in such situations.[1]

  • Rothman points out that making statistical adjustments for multiple comparisons is similar to the problem of statistical significance testing where the investigator uses the P-value to estimate the probability of a study demonstrating an effect size as great or greater than the one found in the study, given that the null hypothesis is true—i.e., that there is truly no difference between the groups being studied (with alpha as the arbitrary cutoff for clinical significance which is frequently set at 5%).  Obviously if the risk for rejecting a truly null hypothesis is 5% for every hypothesis examined, then examining multiple hypotheses will generate a larger number of falsely positive statistically significant findings because of the increasing number of hypotheses examined.
  • Adjusting for multiple comparisons is thought by many to be desirable because it will result in a smaller probability of erroneously rejecting the null hypothesis. Rothman argues this “pay for peeking” at more data by adjusting P-values with multiple comparisons is unnecessary and can be misleading. Adjusting for multiple comparisons might be paying a penalty for simply appropriately doing more comparisons, and there is no logical reason (or good evidence) for doing statistical adjusting. Rather, the burden is on those who advocate for multiple comparison adjustments to show there is a problem requiring a statistical fix.
  • Rothman’s  conclusion: It is reasonable to consider each association on its own for the information it conveys—he believes that there is no need for adjusting P-values with multiple comparisons.

Delfini Comment: Reading his paper is a bit difficult, but he make some good points about our not really understanding what chance is all about and that evaluating study outcomes for validity requires critical appraisal for the assessment of bias and other factors as well as the use of statistics for evaluating chance effects.

Reference

Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology.  1990 Jan;1(1):43-6. PubMed PMID: 2081237.

 

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Empirical Evidence of Attrition Bias in Clinical Trials

Status

Empirical Evidence of Attrition Bias in Clinical Trials

The commentary, “Empirical evidence of attrition bias in clinical trials,” by Juni et al [1] is a nice review of what has transpired since 1970 when attrition bias received attention in a critical appraisal of a non-valid trial of extracranial bypass surgery for transient ischemic attack. [2] At about the same time Bradford Hill coined the phrase “intention-to-treat.”  He wrote that excluding patient data after “admission to the treated or control group” may affect the validity of clinical trials and that “unless the losses are very few and therefore unimportant, we may inevitably have to keep such patients in the comparison and thus measure the ‘intention-to-treat’ in a given way, rather than the actual treatment.”[3] The next major development was meta-epidemiological research which assessed trials for associations between methodological quality and effect size and found conflicting results in terms of the effect of attrition bias on effect size.  However, as the commentary points out, the studies assessing attrition bias were flawed. [4,5,6].

Finally a breakthrough in understanding the distorting effect of loss of subjects following randomization was seen by two authors evaluating attrition bias in oncology trials.[7] The investigators compared the results from their analyses which utilized individual patient data, which invariably followed the intention-to-treat principle with those done by the original investigators, which often excluded some or many patients. The results showed that pooled analyses of trials with patient exclusions reported more beneficial effects of the experimental treatment than analyses based on all or most patients who had been randomized. Tierney and Stewart showed that, in most meta-analyses they reviewed based on only “included” patients, the results favored the research treatment (P = 0.03). The commentary gives deserved credit to Tierney and Stewart for their tremendous contribution to critical appraisal and is a very nice, short read.

References

1. Jüni P, Egger M. Commentary: Empirical evidence of attrition bias in clinical  trials. Int J Epidemiol. 2005 Feb;34(1):87-8. Epub 2005 Jan 13. Erratum in: Int J Epidemiol. 2006 Dec;35(6):1595. PubMed PMID: 15649954.

2. Fields WS, Maslenikov V, Meyer JS, Hass WK, Remington RD, Macdonald M. Joint study of extracranial arterial occlusion. V. Progress report of prognosis following surgery or nonsurgical treatment for transient cerebral ischemic attacks. PubMed PMID: 5467158.

3. Bradford Hill A. Principles of Medical Statistics, 9th edn. London: The Lancet Limited, 1971.

4. Schulz KF, Chalmers I, Hayes RJ, Altman D. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408–12. PMID: 7823387

5. Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in metaanalyses. Ann Intern Med 2001;135:982–89. PMID 11730399

6. Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA. 2002 Jun 12;287(22):2973-82. PubMed PMID: 12052127.

7. Tierney JF, Stewart LA. Investigating patient exclusion bias in meta-analysis. Int J Epidemiol. 2005 Feb;34(1):79-87. Epub 2004 Nov 23. PubMed PMID: 15561753.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

A Caution When Evaluating Systematic Reviews and Meta-analyses

Status

A Caution When Evaluating Systematic Reviews and Meta-analyses

We would like to draw critical appraisers’ attention to an infrequent but important problem encountered in some systematic reviews—the accuracy of standardized mean differences in some reviews. Meta-analysis of trials that have used different scales to record outcomes of a similar nature requires data transformation to a uniform scale, the standardized mean difference (SMD). Gøtzsche and colleagues, in a review of 27 meta-analyses utilizing SMD found that a high proportion of meta-analyses based on SMDs contained meaningful errors in data extraction and calculation of point estimates.[1] Gøtzsche et al. audited two trials from each review and found that, in 17 meta-analyses (63%), there were errors for at least 1 of the 2 trials examined. We recommend that critical appraisers be aware of this issue.

1. Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007 Jul 25;298(4):430-7. Erratum in: JAMA. 2007 Nov 21;298(19):2264. PubMed PMID:17652297.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Improving Results Reporting in Clinical Trials: Case Study—Time-to-Event Analysis and Hazard Ratio Reporting Advice

Status

Improving Results Reporting in Clinical Trials: Case Study—Time-to-Event Analysis and Hazard Ratio Reporting Advice

We frequently see clinical trial abstracts—especially those using time-to-event analyses—that are not well-understood by readers. Fictional example for illustrative purposes:

In a 3-year randomized controlled trial (RCT) of drug A versus placebo in women with advanced breast cancer, the investigators presented their abstract results in terms of relative risk reduction for death (19%) along with the hazard ratio (hazard ratio = 0.76, 95% confidence interval [CI] 0.56 to 0.94, P = 0.04). They also stated that, “This reduction represented a 5-month improvement in median survival (24 months in the drug A group vs. 19 months in the placebo group).” Following this information, the authors stated that the three-year survival probability was 29% in the drug A group versus 21.0% in the placebo group.

Many readers do not understand hazard ratios and will conclude that a 5 month improvement in median survival is not clinically meaningful. We believe it would have been more useful to present mortality information (which the authors frequently present in  results section, but is not easily found by many readers).

A much more meaningful abstract statement would go something like this: After 3 years, the overall mortality was 59% in the drug A group compared with 68% in the placebo group which represents an absolute risk reduction (ARR) of 9%, P=0.04, number needed to treat (NNT) 11.  This information is much more impressive and much more easily understood than a 5-month increase in median survival and uses statistics familiar to clinicians.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email