“Reading” a Clinical Trial Won’t Get You There—or Let’s Review (And Apply) Some Basics About Assessing The Validity of Medical Research Studies Claiming Superiority for Efficacy of Therapies
An obvious question raised by the title is, “Get you where?” Well, the answer is, “To where you know it is reasonable to think you can trust the results of the study you have just finished reading.” In this blog, our focus is on how to critically appraise medical research studies which claim superiority for efficacy of a therapy.
Because of Lack of Understanding Medical Science Basics, People May Be Injured or Die
Understanding basic requirements for valid medical science is very important. Numbers below are estimates, but are likely to be close or understated—
- Over 63,000 people with heart disease died after taking encainide or flecainide because many doctors thought taking these drugs “made biological sense,” but did not understand the simple need for reliable clinical trial information to confirm what seemed to “make sense” [Echt 91].
- An estimated 60,000 people in the United States died from a nonsteroidal anti-inflammatory drug despite important benefit and safety information reported in the abstract of the pivotal trial used for FDA approval [Graham].
- In another example, roughly 42,000 women with advanced breast cancer suffered excruciating side effects without any proof of benefit, many of them dying as a result, and at a cost of $3.4 billion dollars [Mello].
- At least 64 deaths out of 751 cases in nearly half the United States were linked to fungal meningitis thought to be caused by a contaminated treatment that is used for back and radicular pain—but there is no reliable scientific evidence of benefit from that treatment [CDC].
In the above instances, these were preventable deaths and harms—from common treatments—which patients might have avoided if their physicians had better understood the importance and methods of evaluating medical science.
Failures to Understand Medical Science Basics
Many health care professionals don’t know how to quickly assess a trial for reliability and clinical usefulness—and yet mastering the basics is not difficult. Over the years, we have given a pre-test of 3 simple questions to more than a thousand physicians, pharmacists and others who have attended our training programs. Approximately 70% fail—”failure” being defined as missing 2 or 3 of the questions.
One pre-test question is designed to see if people recognize the lack of a comparison group in a report of the “effectiveness” of a new treatment. Without a comparison group of people with similar prognostic characteristics who are treated exactly the same except for the intervention under study, you cannot discern cause and effect of an intervention because a difference between groups may explain or affect the results.
A second pre-test question deals with presenting results as relative risk reduction (RRR) without absolute risk reduction (ARR) or event rates in the study groups. A “relative” measure raises the question, “Relative to what?” Is the reported RRR in our test question 60 percent of 100 percent? Or 60 percent of 1 percent?
The last of our pre-test questions assesses attendees’ basic understanding of only one of the two requirements to qualify as an Intention-to-Treat (ITT) analysis. The two requirements are that people should be randomized as analyzed and that all people should be included in the analysis whether they have discontinued, are missing or have crossed over to other treatment arms. The failure rate at knowing this last requirement is very high. (We will add that this last requirement means that a value has to be assigned if one is missing—and so, one of the most important aspects of critically appraising an ITT analysis is the evaluation of the methods for “imputing” missing data.)
By the end of our training programs, success rates have always markedly improved. Others have reported similar findings.
There is a Lot of Science + Much of It May Not Be Reliable
Each week more than 13,000 references are added to the world’s largest library—the National Library of Medicine (NLM). Unfortunately, many of these studies are seriously flawed. One large review of 60,352 studies reported that only 7 percent passed criteria of high quality methods and clinical relevancy [McKibbon]. We and others have estimated that up to (and maybe more than) 90% of the published medical information that health care professionals rely on is flawed [Freedman, Glasziou].
Bias Distorts Results
We cannot know if an intervention is likely to be effective and safe without critically appraising the evidence for validity and clinical usefulness. We need to evaluate the reliability of medical science prior to seriously considering the reported therapeutic results because biases such as lack of or inadequate randomization, lack of successful blinding or other threats to validity—which we will describe below—can distort reported result by up to 50 percent or more [see Risk of Bias References].
Patients Deserve Better
Patients cannot make informed choices regarding various interventions without being provided with quantified projections of benefits and harms from valid science.
Some Simple Steps To Critical Appraisal
Below is a short summary of our simplified approach to critically appraising a randomized superiority clinical trial. Our focus is on “internal validity” which means “closeness to truth” in the context of the study. “External validity” is about the likelihood of reaching truth outside of the study context and requires judgment about issues such as fit with individuals or populations in circumstances other than those in the trial.
You can review and download a wealth of freely available information at our website at www.delfini.org including checklists and tools at http://www.delfini.org/delfiniTools.htm which can provide you with much greater information. Most relevant to this blog is our short critical appraisal checklist which you can download here—http://www.delfini.org/Delfini_Tool_StudyValidity_Short.pdf
The Big Questions
In brief, your overarching questions are these:
- Is reading this study worth my time? If the results are true, would they change my practice? Do they apply to my situation? What is the likely impact to my patients
- Can anything explain the results other than cause and effect? Evaluate the potential for results being distorted by bias (anything other than chance leading away from the truth) or random chance effects.
- Is there any difference between groups other than what is being studied? This is automatically a bias.
- If the study appears to be valid, but attrition is high, sometimes it is worth asking, what conditions would need to be present for attrition to distort the results? Attrition does not always distort results, but may obscure a true difference due to the reduction in sample size.
There are four stages of a clinical trial, and you should ask several key questions when evaluating bias in each of the 4 stages.
- Subject Selection & Treatment Assignment—Evaluation of Selection Bias
Important considerations include how were subjects selected for study, were there enough subjects, how were they assigned to their study groups, and were the groups balanced in terms of prognostic variables?
Your critical appraisal to-do list includes—
a) Checking to see if the randomization sequence was generated in an acceptable manner. (Minimization may be an acceptable alternative.)
b) Determining if the investigators adequately concealed the allocation of subjects to each study group? Meaning, is the method for assigning treatment hidden so that an investigator cannot manipulate the assignment of a subject to a selected study group?
c) Examining the table of baseline characteristics to determine whether randomization was likely to have been successful, i.e., that the groups are balanced in terms of important prognostic variables (e.g., clinical and demographic variables).
- The Intervention & Context—Evaluation of Performance Bias
What is being studied, and what is it being compared to? Was the intervention likely to have been executed successfully? Was blinding likely to have been successful? Was duration reasonable for treatment as well as for follow-up? Was adherence reasonable? What else happened to study subjects in the course of the study such as use of co-interventions? Were there any differences in how subjects in the groups were treated?
Your to-do list includes evaluating:
a) Adequacy of blinding of subjects and all working with subjects and their data—including likely success of blinding;
b) Subjects’ adherence to treatment;
c) Inter-group differences in treatment or care except for the intervention(s) being studied.
- Data Collection & Loss of Data—Evaluation of Attrition Bias
What information was collected, and how was it collected? What data are missing and is it likely that missing data could meaningfully distort the study results?
Your to-do list includes evaluating—
a) Measurement methods (e.g., mechanisms, tools, instruments, means of administration, personnel issues, etc.)
b) Classification and quantification of missing data in each group (e.g., discontinuations due to ADEs, unrelated deaths, protocol violations, loss to follow-up, etc.)
c) Whether missing data are likely to distort the reported results? This is the area that the evidence on the distorting risk of bias provides the least help. And so, again, often it is worthwhile asking, “What conditions would need to be present for attrition to distort the results?”
- Results & Assessing The Differences In The Outcomes Of The Study Groups—Evaluating Assessment Bias
Were outcome measures reasonable, pre-specified and analyzed appropriately? Was reporting selective? How was safety assessed? Remember that models are not truth.
Your to-do list includes evaluating—
a) Whether assessors were blinded.
b) How the effect size was calculated (e.g., absolute risk reduction, relative risk, etc.). You especially want to know benefit or risk with and without treatment.
c) Were confidence intervals included? (You can calculate these yourself online, if you wish. See our web links at our website for suggestions.)
d) For dichotomous variables, was a proper intention-to-treat (ITT) analysis conducted with a reasonable choice for imputing values for missing data?
e) For time-to-event trials, were censoring rules unbiased? Were the number of censored subjects reported?
After you have evaluated a study for bias and chance and have determined that the study is valid, the study results should be evaluated for clinical meaningfulness, (e.g., the amount of clinical benefit and the potential for harm). Clinical outcomes include morbidity; mortality; symptom relief; physical, mental and emotional functioning; and, quality of life—or any surrogate outcomes that have been demonstrated in valid studies to affect a clinical outcome.
It is not difficult to learn how to critically appraise a clinical trial. Health care providers owe it to their patients to gain these skills. Health care professionals cannot rely on abstracts and authors’ conclusions—they must assess studies first for validity and second for clinical usefulness. Authors are often biased, even with the best of intentions. Remember that authors’ conclusions are opinions, not evidence. Authors frequently use misleading terms or draw misleading conclusions. Physicians and others who lack critical appraisal skills are often mislead by authors’ conclusions and summary statements. Critical appraisal knowledge is required to evaluate the validity of a study which must be done prior to seriously considering reported results.
For those who wish to go more deeply, we have books available and do training seminars. See our website at www.delfini.org.
Risk of Bias References
- Juni P, Altman DG, Egger M (2001) Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323: 42-6. PubMed PMID: 11440947.
- Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999 Sep 15;282( 11): 1054-60. PubMed PMID: 10493204.
- Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in metaanalyses. Ann Intern Med 2001;135: 982– 89. PMID 11730399.
- Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998 Aug 22;352( 9128): 609-13. PubMed PMID: 9746022.
- Poolman RW, Struijs PA, Krips R, Inger N. Sierevelt IN, et al. (2007) Reporting of outcomes in orthopaedic randomized trials: Does blinding of outcome assessors matter? J Bone Joint Surg Am. 89: 550– 558. PMID 17332104.
- Savovic J, Jones HE, Altman DG, et al. Influence of Reported Study Design Characteristics on Intervention Effect Estimates From Randomized, Controlled Trials. Ann Intern Med. 2012 Sep 4. doi: 10.7326/ 0003-4819-157-6-201209180-00537. [Epub ahead of print] PubMed PMID: 22945832.
- van Tulder MW, Suttorp M, Morton S, et al. Empirical evidence of an association between internal validity and effect size in randomized controlled trials of low-back pain. Spine (Phila Pa 1976). 2009 Jul 15;34( 16): 1685-92. PubMed PMID: 19770609.
- CDC: http://www.cdc.gov/HAI/outbreaks/meningitis.html
- Echt DS, Liebson PR, Mitchell LB, Peters RW, Obias-Manno D, Barker AH, Arensberg D, Baker A, Friedman L, Greene HL, et al. Mortality and morbidity in patients receiving encainide, flecainide, or placebo. The Cardiac Arrhythmia Suppression Trial. N Engl J Med. 1991 Mar 21;324(12):781-8. PubMed PMID: 1900101.
- Freedman, David H. Lies, Damn Lies and Bad Medical Science. The Atlantic. November, 2010. www.theatlantic.com/ magazine/ archive/ 2010/ 11/ lies-damned-lies-and-medical-science/ 8269/, accessed 11/ 07/ 2010.
- Glasziou P. The EBM journal selection process: how to find the 1 in 400 valid and highly relevant new research articles. Evid Based Med. 2006 Aug; 11( 4): 101. PubMed PMID: 17213115.
- Graham Natural News: http://www.naturalnews.com/011401_Dr_David_Graham_the_FDA.html
- McKibbon KA, Wilczynski NL, Haynes RB. What do evidence-based secondary journals tell us about the publication of clinically important articles in primary health care journals? BMC Med. 2004 Sep 6;2: 33. PubMed PMID: 15350200.
- Mello MM, Brennan TA. The controversy over high-dose chemotherapy with autologous bone marrow transplant for breast cancer. Health Aff (Millwood). 2001 Sep-Oct;20(5):101-17. PubMed PMID: 11558695.