Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies

Status

Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies

We have previously mentioned the enormous contributions made by John Ioannidis MD in the area of understanding the reliability of medical evidence. [Ioannidis, Delfini Blog, Giannakakis] We want to draw your attention to a recent publication dealing with the risks of relying on observational data for cause and effect conclusions. [Hemkens] In this recent study, Hemkens, Ioannidis and other colleagues assessed differences in mortality effect size reported in observational (routinely collected data [RCD]) studies as compared with results reported in RCTs.

Eligible RCD studies used propensity scores in an effort to address confounding bias in the observational studies. The authors compared the results of RCD and RCTs. The analysis included only RCD studies conducted before any RCT was published on the same topic. They assessed the risk of bias for RCD studies and randomized controlled trials (RCTs) using The Cochrane Collaboration risk of bias tools. The direction of treatment effects, confidence intervals and effect sizes (odds ratios) were compared between RCD studies and RCTs. The relative odds ratios were calculated across all pairs of RCD studies and trials.

The authors found that RCD studies systematically and substantially overestimated mortality benefits of medical treatments compared with subsequent trials investigating the same question. Overall, RCD studies reported significantly more favorable mortality estimates by a relative 31% than subsequent trials (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65; I2 (I square)=0%)).

These authors remind us yet again that If no randomized trials exist, clinicians and other decision-makers should not trust results from observational data from sources such as local or national databases, registries, cohort or case-control studies.

References
Delfini Blog: https://delfini.org/blog/?p=292

Giannakakis IA, Haidich AB, Contopoulos-Ioannidis DG, Papanikolaou GN, Baltogianni MS, Ioannidis JP. Citation of randomized evidence in support of guidelines of therapeutic and preventive interventions. J Clin Epidemiol. 2002 Jun;55(6):545-55. PubMed PMID: 12063096.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JP. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ. 2016 Feb 8;352:i493. doi: 10.1136/bmj.i493. PubMed PMID: 26858277.

Ioannidis JPA. Why Most Published Research Findings are False. PLoS Med 2005; 2(8):696-701 PMID: 16060722

“Reading” a Clinical Trial Won’t Get You There

Status

“Reading” a Clinical Trial Won’t Get You There—or Let’s Review (And Apply) Some Basics About Assessing The Validity of Medical Research Studies Claiming Superiority for Efficacy of Therapies

An obvious question raised by the title is, “Get you where?” Well, the answer is, “To where you know it is reasonable to think you can trust the results of the study you have just finished reading.” In this blog, our focus is on how to critically appraise medical research studies which claim superiority for efficacy of a therapy.

Because of Lack of Understanding Medical Science Basics, People May Be Injured or Die

Understanding basic requirements for valid medical science is very important. Numbers below are estimates, but are likely to be close or understated—

Over 63,000 people with heart disease died after taking encainide or flecainide because many doctors thought taking these drugs “made biological sense,” but did not understand the simple need for reliable clinical trial information to confirm what seemed to “make sense” [Echt 91].
An estimated 60,000 people in the United States died and another 140,000 experienced a heart attack resulting from the use of a nonsteroidal anti-inflammatory drug despite important benefit and safety information reported in the abstract of the pivotal trial used for FDA approval [Graham].
In another example, roughly 42,000 women with advanced breast cancer suffered excruciating side effects without any proof of benefit, many of them dying as a result, and at a cost of $3.4 billion dollars [Mello].
At least 64 deaths out of 751 cases in nearly half the United States were linked to fungal meningitis thought to be caused by a contaminated treatment that is used for back and radicular pain—but there is no reliable scientific evidence of benefit from that treatment [CDC].

In the above instances, these were preventable deaths and harms—from common treatments—which patients might have avoided if their physicians had better understood the importance and methods of evaluating medical science.

Failures to Understand Medical Science Basics

Many health care professionals don’t know how to quickly assess a trial for reliability and clinical usefulness—and yet mastering the basics is not difficult. Over the years, we have given a pre-test of 3 simple questions to more than a thousand physicians, pharmacists and others who have attended our training programs. Approximately 70% fail—”failure” being defined as missing 2 or 3 of the questions.

One pre-test question is designed to see if people recognize the lack of a comparison group in a report of the “effectiveness” of a new treatment. Without a comparison group of people with similar prognostic characteristics who are treated exactly the same except for the intervention under study, you cannot discern cause and effect of an intervention because a difference between groups may explain or affect the results.

A second pre-test question deals with presenting results as relative risk reduction (RRR) without absolute risk reduction (ARR) or event rates in the study groups. A “relative” measure raises the question, “Relative to what?” Is the reported RRR in our test question 60 percent of 100 percent? Or 60 percent of 1 percent?

The last of our pre-test questions assesses attendees’ basic understanding of only one of the two requirements to qualify as an Intention-to-Treat (ITT) analysis. The two requirements are that people should be randomized as analyzed and that all people should be included in the analysis whether they have discontinued, are missing or have crossed over to other treatment arms. The failure rate at knowing this last requirement is very high. (We will add that this last requirement means that a value has to be assigned if one is missing—and so, one of the most important aspects of critically appraising an ITT analysis is the evaluation of the methods for “imputing” missing data.)

By the end of our training programs, success rates have always markedly improved. Others have reported similar findings.

There is a Lot of Science + Much of It May Not Be Reliable
Each week more than 13,000 references are added to the world’s largest library—the National Library of Medicine (NLM). Unfortunately, many of these studies are seriously flawed. One large review of 60,352 studies reported that only 7 percent passed criteria of high quality methods and clinical relevancy [McKibbon]. We and others have estimated that up to (and maybe more than) 90% of the published medical information that health care professionals rely on is flawed [Freedman, Glasziou].

Bias Distorts Results
We cannot know if an intervention is likely to be effective and safe without critically appraising the evidence for validity and clinical usefulness. We need to evaluate the reliability of medical science prior to seriously considering the reported therapeutic results because biases such as lack of or inadequate randomization, lack of successful blinding or other threats to validity—which we will describe below—can distort reported result by up to 50 percent or more [see Risk of Bias References].

Patients Deserve Better
Patients cannot make informed choices regarding various interventions without being provided with quantified projections of benefits and harms from valid science.

Some Simple Steps To Critical Appraisal
Below is a short summary of our simplified approach to critically appraising a randomized superiority clinical trial. Our focus is on “internal validity” which means “closeness to truth” in the context of the study. “External validity” is about the likelihood of reaching truth outside of the study context and requires judgment about issues such as fit with individuals or populations in circumstances other than those in the trial.

You can review and download a wealth of freely available information at our website at www.delfini.org including checklists and tools at http://www.delfini.org/delfiniTools.htm which can provide you with much greater information. Most relevant to this blog is our short critical appraisal checklist which you can download here—http://www.delfini.org/Delfini_Tool_StudyValidity_Short.pdf

The Big Questions
In brief, your overarching questions are these:

Is reading this study worth my time? If the results are true, would they change my practice? Do they apply to my situation? What is the likely impact to my patients
Can anything explain the results other than cause and effect? Evaluate the potential for results being distorted by bias (anything other than chance leading away from the truth) or random chance effects.
Is there any difference between groups other than what is being studied? This is automatically a bias.
If the study appears to be valid, but attrition is high, sometimes it is worth asking, what conditions would need to be present for attrition to distort the results? Attrition does not always distort results, but may obscure a true difference due to the reduction in sample size.

Evaluating Bias

There are four stages of a clinical trial, and you should ask several key questions when evaluating bias in each of the 4 stages.

Subject Selection & Treatment Assignment—Evaluation of Selection Bias

Important considerations include how were subjects selected for study, were there enough subjects, how were they assigned to their study groups, and were the groups balanced in terms of prognostic variables?

Your critical appraisal to-do list includes—

a) Checking to see if the randomization sequence was generated in an acceptable manner. (Minimization may be an acceptable alternative.)

b) Determining if the investigators adequately concealed the allocation of subjects to each study group? Meaning, is the method for assigning treatment hidden so that an investigator cannot manipulate the assignment of a subject to a selected study group?

c) Examining the table of baseline characteristics to determine whether randomization was likely to have been successful, i.e., that the groups are balanced in terms of important prognostic variables (e.g., clinical and demographic variables).

The Intervention & Context—Evaluation of Performance Bias

What is being studied, and what is it being compared to? Was the intervention likely to have been executed successfully? Was blinding likely to have been successful? Was duration reasonable for treatment as well as for follow-up? Was adherence reasonable? What else happened to study subjects in the course of the study such as use of co-interventions? Were there any differences in how subjects in the groups were treated?

Your to-do list includes evaluating:

a) Adequacy of blinding of subjects and all working with subjects and their data—including likely success of blinding;

b) Subjects’ adherence to treatment;

c) Inter-group differences in treatment or care except for the intervention(s) being studied.

Data Collection & Loss of Data—Evaluation of Attrition Bias

What information was collected, and how was it collected? What data are missing and is it likely that missing data could meaningfully distort the study results?

Your to-do list includes evaluating—

a) Measurement methods (e.g., mechanisms, tools, instruments, means of administration, personnel issues, etc.)

b) Classification and quantification of missing data in each group (e.g., discontinuations due to ADEs, unrelated deaths, protocol violations, loss to follow-up, etc.)

c) Whether missing data are likely to distort the reported results? This is the area that the evidence on the distorting risk of bias provides the least help. And so, again, often it is worthwhile asking, “What conditions would need to be present for attrition to distort the results?”

Results & Assessing The Differences In The Outcomes Of The Study Groups—Evaluating Assessment Bias

Were outcome measures reasonable, pre-specified and analyzed appropriately? Was reporting selective? How was safety assessed? Remember that models are not truth.

Your to-do list includes evaluating—

a) Whether assessors were blinded.

b) How the effect size was calculated (e.g., absolute risk reduction, relative risk, etc.). You especially want to know benefit or risk with and without treatment.

c) Were confidence intervals included? (You can calculate these yourself online, if you wish. See our web links at our website for suggestions.)

d) For dichotomous variables, was a proper intention-to-treat (ITT) analysis conducted with a reasonable choice for imputing values for missing data?

e) For time-to-event trials, were censoring rules unbiased? Were the number of censored subjects reported?

After you have evaluated a study for bias and chance and have determined that the study is valid, the study results should be evaluated for clinical meaningfulness, (e.g., the amount of clinical benefit and the potential for harm). Clinical outcomes include morbidity; mortality; symptom relief; physical, mental and emotional functioning; and, quality of life—or any surrogate outcomes that have been demonstrated in valid studies to affect a clinical outcome.

Final Comment

It is not difficult to learn how to critically appraise a clinical trial. Health care providers owe it to their patients to gain these skills. Health care professionals cannot rely on abstracts and authors’ conclusions—they must assess studies first for validity and second for clinical usefulness. Authors are often biased, even with the best of intentions. Remember that authors’ conclusions are opinions, not evidence. Authors frequently use misleading terms or draw misleading conclusions. Physicians and others who lack critical appraisal skills are often mislead by authors’ conclusions and summary statements. Critical appraisal knowledge is required to evaluate the validity of a study which must be done prior to seriously considering reported results.

For those who wish to go more deeply, we have books available and do training seminars. See our website at www.delfini.org.

Risk of Bias References

Juni P, Altman DG, Egger M (2001) Systematic reviews in health care: assessing the quality of controlled clinical trials. BMJ 2001;323: 42-6. PubMed PMID: 11440947.
Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. JAMA. 1999 Sep 15;282( 11): 1054-60. PubMed PMID: 10493204.
Kjaergard LL, Villumsen J, Gluud C. Reported methodological quality and discrepancies between large and small randomized trials in metaanalyses. Ann Intern Med 2001;135: 982– 89. PMID 11730399.
Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998 Aug 22;352( 9128): 609-13. PubMed PMID: 9746022.
Poolman RW, Struijs PA, Krips R, Inger N. Sierevelt IN, et al. (2007) Reporting of outcomes in orthopaedic randomized trials: Does blinding of outcome assessors matter? J Bone Joint Surg Am. 89: 550– 558. PMID 17332104.
Savovic J, Jones HE, Altman DG, et al. Influence of Reported Study Design Characteristics on Intervention Effect Estimates From Randomized, Controlled Trials. Ann Intern Med. 2012 Sep 4. doi: 10.7326/ 0003-4819-157-6-201209180-00537. [Epub ahead of print] PubMed PMID: 22945832.
van Tulder MW, Suttorp M, Morton S, et al. Empirical evidence of an association between internal validity and effect size in randomized controlled trials of low-back pain. Spine (Phila Pa 1976). 2009 Jul 15;34( 16): 1685-92. PubMed PMID: 19770609.

Other References

CDC: http://www.cdc.gov/HAI/outbreaks/meningitis.html
Echt DS, Liebson PR, Mitchell LB, Peters RW, Obias-Manno D, Barker AH, Arensberg D, Baker A, Friedman L, Greene HL, et al. Mortality and morbidity in patients receiving encainide, flecainide, or placebo. The Cardiac Arrhythmia Suppression Trial. N Engl J Med. 1991 Mar 21;324(12):781-8. PubMed PMID: 1900101.
Freedman, David H. Lies, Damn Lies and Bad Medical Science. The Atlantic. November, 2010. www.theatlantic.com/ magazine/ archive/ 2010/ 11/ lies-damned-lies-and-medical-science/ 8269/, accessed 11/ 07/ 2010.
Glasziou P. The EBM journal selection process: how to find the 1 in 400 valid and highly relevant new research articles. Evid Based Med. 2006 Aug; 11( 4): 101. PubMed PMID: 17213115.
Graham Natural News: http://www.naturalnews.com/011401_Dr_David_Graham_the_FDA.html
McKibbon KA, Wilczynski NL, Haynes RB. What do evidence-based secondary journals tell us about the publication of clinically important articles in primary health care journals? BMC Med. 2004 Sep 6;2: 33. PubMed PMID: 15350200.
Mello MM, Brennan TA. The controversy over high-dose chemotherapy with autologous bone marrow transplant for breast cancer. Health Aff (Millwood). 2001 Sep-Oct;20(5):101-17. PubMed PMID: 11558695.

Progression Free Survival (PFS) in Oncology Trials

Status

Progression Free Survival (PFS) in Oncology Trials

Progression Free Survival (PFS) continues to be a frequently used endpoint in oncology trials. It is the time from randomization to the first of either objectively measured tumor progression or death from any cause. It is a surrogate outcome because it does not directly assess mortality, morbidity, quality of life, symptom relief or functioning. Even if a valid trial reports a statistically significant improvement in PFS and the reported effect size is large, PFS only provides information about biologic activity of the cancer and tumor burden or tumor response. Even though correlational analysis has shown associations between PFS and overall survival (OS) in some cancers, we believe that extreme caution should be exercised when drawing conclusions about efficacy of a new drug. In other words, PFS evidence alone is insufficient to establish a clinically meaningful benefit for patients or even a reasonable likelihood of net benefit. Many tumors do present a significant clinical burden for patients; however, clinicians frequently mistakenly believe that simply having a reduction in tumor burden equates with clinical benefit and that delaying the growth of a cancer is a clear benefit to patients.

PFS has a number of limitations which increases the risk of biased results and is difficult for readers to interpret. Unlike OS, PFS does not “identify” the time of progression since assessment occurs at scheduled visits and is likely to overestimate time to progression. Also, it is common to stop or add anti-cancer therapies in PFS studies (also a common problem in trials of OS) prior to documentation of tumor progression which may confound outcomes. Further, measurement errors may occur because of complex issues in tumor assessment. Adequate blinding is required to reduce the risk of performance and assessment bias. Other methodological issues include complex calculations to adjust for missed assessments and the need for complete data on adverse events.

Attrition and assessment bias are made even more difficult to assess in oncology trials using time-to-event methodologies. The intention-to-treat principle requires that all randomly assigned patients be observed until they experience the end point or the study ends. Optimal follow-up in PFS trials is to follow each subject to both progression and death.

Delfini Comment

FDA approval based on PFS may result in acceptance of new therapies with greater harms than benefits. The limitations listed above, along with a concern that investigators may be less willing to conduct trials with OS as an endpoint once a drug has been approved, suggest that we should use great caution when considering evidence from studies using PFS as the primary endpoint. We believe that PFS should be thought of as any other surrogate marker—i.e., it represents extremely weak evidence (even in studies judged to be at low risk of bias) unless it is supported by acceptable evidence of improvements in quality of life and overall survival.

When assessing the quality of a trial using PFS, we suggest the following:

Remember that although in some cases PFS appears to be predictive of OS, in many cases it is not.
In many cases, improved PFS is accompanied by unacceptable toxicity and unacceptable changes in quality of life.
Improved PFS results of several months may be due to methodological flaws in the study.
As with any clinical trial, assess the trial reporting PFS for bias such as selection, performance, attrition and assessment bias.
Compare characteristics of losses (e.g., due to withdrawing consent, adverse events, loss to follow-up, protocol violations) between groups and, if possible, between completers and those initially randomized.
Pay special attention to censoring due to loss-to-follow-up. Administrative censoring (censoring of subjects who enter a study late and do not experience an event) may not result in significant bias, but non-administrative censoring (censoring because of loss-to-follow-up or discontinuing) is more likely to pose a threat to validity.

References

Carroll KJ. Analysis of progression-free survival in oncology trials: some common statistical issues. Pharm Stat. 2007 Apr-Jun;6(2):99-113. Review. PubMed PMID: 17243095.

D’Agostino RB Sr. Changing end points in breast-cancer drug approval—the Avastin story. N Engl J Med. 2011 Jul 14;365(2):e2. doi: 10.1056/NEJMp1106984. Epub 2011 Jun 27. PubMed PMID: 21707384.

Fleming TR, Rothmann MD, Lu HL. Issues in using progression-free survival when evaluating oncology products. J Clin Oncol. 2009 Jun 10;27(17):2874-80. doi: 10.1200/JCO.2008.20.4107. Epub 2009 May 4. PubMed PMID: 19414672

Lachin JM. (John M. Lachin, Sc.D., Professor of Biostatistics and Epidemiology, and of Statistics, The George Washington University personal communication)

Lachin JM. Statistical considerations in the intent-to-treat principle. Control Clin Trials. 2000 Jun;21(3):167-89. Erratum in: Control Clin Trials 2000 Oct;21(5):526. PubMed PMID: 10822117.

Network Meta-analyses—More Complex Than Traditional Meta-analyses

Status

Network Meta-analyses—More Complex Than Traditional Meta-analyses

Meta-analyses are important tools for synthesizing evidence from relevant studies. One limitation of traditional meta-analyses is that they can compare only 2 treatments at a time in what is often termed pairwise or direct comparisons. An extension of traditional meta-analysis is the “network meta-analysis” which has been increasingly used—especially with the rise of the comparative effectiveness movement—as a method of assessing the comparative effects of more than two alternative interventions for the same condition that have not been studied in head-to-head trials.

A network meta-analysis synthesizes direct and indirect evidence over the entire network of interventions that have not been directly compared in clinical trials, but have one treatment in common.

Example
A clinical trial reports that for a given condition intervention A results in better outcomes than intervention B. Another trial reports that intervention B is better than intervention C. A network meta-analysis intervention is likely to report that intervention A results in better outcomes than intervention C based on indirect evidence.

Network meta-analyses, also known as “multiple-treatments meta-analyses” or “mixed-treatment comparisons meta-analyses” include both direct and indirect evidence. When both direct and indirect comparisons are used to estimate treatment effects, the comparison is referred to as a “mixed comparison.” The indirect evidence in network meta-analyses is derived from statistical inference which requires many assumptions and modeling. Therefore, critical appraisal of network meta-analyses is more complex than appraisal of traditional meta-analyses.

In all meta-analyses, clinical and methodological differences in studies are likely to be present. Investigators should only include valid trials. Plus they should provide sufficient detail so that readers can assess the quality of meta-analyses. These details include important variables such as PICOTS (population, intervention, comparator, outcomes, timing and study setting) and heterogeneity in any important study performance items or other contextual issues such as important biases, unique care experiences, adherence rates, etc. In addition, the effect sizes in direct comparisons should be compared to the effect sizes in indirect comparisons since indirect comparisons require statistical adjustments. Inconsistency between the direct and indirect comparisons may be due to chance, bias or heterogeneity. Remember, in direct comparisons the data come from the same trial. Indirect comparisons utilize data from separate randomized controlled trials which may vary in both clinical and methodological details.

Estimates of effect in a direct comparison trial may be lower than estimates of effect derived from indirect comparisons. Therefore, evidence from direct comparisons should be weighted more heavily than evidence from indirect comparisons in network meta-analyses. The combination of direct and indirect evidence in mixed treatment comparisons may be more likely to result in distorted estimates of effect size if there is inconsistency between effect sizes of direct and indirect comparisons.

Usually network meta-analyses rank different treatments according to the probability of being the best treatment. Readers should be aware that these rankings may be misleading because differences may be quite small or inaccurate if the quality of the meta-analysis is not high.

Delfini Comment
Network meta-analyses do provide more information about the relative effectiveness of interventions. At this time, we remain a bit cautious about the quality of many network meta-analyses because of the need for statistical adjustments. It should be emphasized that, as of this writing, methodological research has not established a preferred method for conducting network meta-analyses, assessing them for validity or assigning them an evidence grade.

References
Li T, Puhan MA, Vedula SS, Singh S, Dickersin K; Ad Hoc Network Meta-analysis Methods Meeting Working Group. Network meta-analysis-highly attractive but more methodological research is needed. BMC Med. 2011 Jun 27;9:79. doi: 10.1186/1741-7015-9-79. PubMed PMID: 21707969.

Salanti G, Del Giovane C, Chaimani A, Caldwell DM, Higgins JP. Evaluating the quality of evidence from a network meta-analysis. PLoS One. 2014 Jul 3;9(7):e99682. doi: 10.1371/journal.pone.0099682. eCollection 2014. PubMed PMID: 24992266.

Sounding the Alarm (Again) in Oncology

Status

Sounding the Alarm (Again) in Oncology

Five years ago Fojo and Grady sounded the alarm about value in many of the new oncology drugs [1]. They raised the following issues and challenged oncologists and others to get involved in addressing these issues:

There is a great deal of uncertainty and confusion about what constitutes a benefit in cancer therapy; and,
How much should cost factor into these deliberations?

The authors review a number of oncology drug studies reporting increased overall survival (OS) ranging from a median of a few days to a few months with total new drug costs ranging from $15,000 to $90,000 plus. In some cases, there is no increase in OS, but only progression free survival (PFS) which is a weaker outcome measure due to its being prone to tumor assessment biases and is frequently assessed in studies of short duration. Adverse events associated with the new drugs are many and include higher rates of febrile neutropenia, infusion-related reactions, diarrhea, skin toxicity, infections, hypertension and other adverse events.

Fojo and Grady point out that—

“Many Americans would likely not regard a 1.2-month survival advantage as ‘significant’ progress, the much revered P value notwithstanding. But would an individual patient agree? Although we lack the answer to this question, we would suggest that the death of a mother of four at age 37 years would be no less painful were it to occur at age 37 years and 1 month, nor would the passing of a 67-year-old who planned to travel after retiring be any less difficult for the spouse were it to have occurred 1 month later.”

In a recent article [2] (thanks to Dr. Richard Lehman for drawing our attention to this article in his wonderful BMJ blog) Fojo and colleagues again point out that—

Cancer is the number one cause of mortality worldwide, and cancer cases are projected to rise by 75% over the next 2 decades.
Of the 71 therapies for solid tumors receiving FDA approval from 2002 to 2014, only 30 of the 71 approvals (42%) met the American Society of Clinical Oncology Cancer Research Committee’s “low hurdle” criteria for clinically meaningful improvement. Further, the authors tallied results from all the studies and reported very modest collective median gains of 2.5 months for PFS and 2.1 months for OS. Numerous surveys have indicated that patients expect much more.
Expensive therapies are stifling progress by (1) encouraging enormous expenditures of time, money, and resources on marginal therapeutic indications; and, (2) promoting a me-too mentality that is stifling innovation and creativity.

The last bullet needs a little explaining. The authors provide a number of examples of “safe bets” and argue that revenue from such safe and profitable therapies rather than true need has been a driving force for new oncology drugs. The problem is compounded by regulations—e.g., rules which require Medicare to reimburse patients for any drug used in an “anti-cancer chemotherapeutic regimen”—regardless of its incremental benefit over other drugs—as long as the use is “for a medically accepted indication” (commonly interpreted as “approved by the FDA”). This provides guaranteed revenues for me-too drugs irrespective of their marginal benefits. The authors also point out that when prices for drugs of proven efficacy fall below a certain threshold, suppliers often stop producing the drug, causing severe shortages.

What can be done? The authors acknowledge several times in their commentary that the spiraling cost of cancer therapies has no single villain; academia, professional societies, scientific journals, practicing oncologists, regulators, patient advocacy groups and the biopharmaceutical industry—all bear some responsibility. [We would add to this list physicians, P&T committees and any others who are engaged in treatment decisions for patients. Patients are not on this list (yet) because they are unlikely to really know the evidence.] This is like many other situations when many are responsible—often the end result is that “no one” takes responsibility. Fojo et al. close by making several suggestions, among which are—

Academicians must avoid participating in the development of marginal therapies;
Professional societies and scientific journals must raise their standards and not spotlight marginal outcomes;
All of us must also insist on transparency and the sharing of all published data in a timely and enforceable manner;
Actual gains of benefit must be emphasized—not hazard ratios or other measures that force readers to work hard to determine actual outcomes and benefits and risks;
We need cooperative groups with adequate resources to provide leadership to ensure that trials are designed to deliver meaningful outcomes;
We must find a way to avoid paying premium prices for marginal benefits; and,
We must find a way [federal support?] to secure altruistic investment capital.

Delfini Comment
While the authors do not make a suggestion for specific responsibilities or actions on the part of the FDA, they do make a recommendation that an independent entity might create uniform measures of benefits for each FDA-approved drug—e.g., quality-adjusted life-years. We think the FDA could go a long way in improving this situation.

And so, as pointed out by Fojo et al., only small gains have been made in OS over the past 12 years, and costs of oncology drugs have skyrocketed. However, to make matters even worse than portrayed by Fojo et al., many of the oncology drug studies we see have major threats to validity (e.g., selection bias, lack of blinding and other performance biases, attrition and assessment bias, etc.) raising the question, “Does the approximate 2 month gain in median OS represent an overestimate?” Since bias tends to favor the new intervention in clinical trials, the PFS and OS reported in many of the recent oncology trials may be exaggerated or even absent or harms may outweigh benefits. On the other hand, if a study is valid, since a median is a midpoint in a range of results and a patient may achieve better results than indicated by the median, some patients may choose to accept a new therapy. The important thing is that patients are given information on benefits and harms in a way that allows them to have a reasonable understanding of all the issues and make the choices that are right for them.

Resources & References

Resource

The URL for Dr. Lehman’s Blog is—
http://blogs.bmj.com/bmj/category/richard-lehmans-weekly-review-of-medical-journals/
The URL for his original blog entry about this article is—
http://blogs.bmj.com/bmj/2014/11/24/richard-lehmans-journal-review-24-november-2014/

References

Fojo T, Grady C. How much is life worth: cetuximab, non-small cell lung cancer, and the $440 billion question. J Natl Cancer Inst. 2009 Aug 5;101(15):1044-8. Epub 2009 Jun 29. PMID: 19564563
Fojo T, Mailankody S, Lo A. Unintended Consequences of Expensive Cancer Therapeutics-The Pursuit of Marginal Indications and a Me-Too Mentality That Stifles Innovation and Creativity: The John Conley Lecture. JAMA Otolaryngol Head Neck Surg. 2014 Jul 28. doi: 10.1001/jamaoto.2014.1570. [Epub ahead of print] PubMed PMID: 25068501.

Comparative Effectiveness Research (CER), “Big Data” & Causality

Status

Comparative Effectiveness Research (CER), “Big Data” & Causality

For a number of years now, we’ve been concerned that the CER movement and the growing love affair with “big data,” will lead to many erroneous conclusions about cause and effect. We were pleased to see the following blog from Austin Frakt, an editor-in-chief of The Incidental Economist: Contemplating health care with a focus on research, an eye on reform—

Ten impressions of big data: Claims, aspirations, hardly any causal inference

http://theincidentaleconomist.com/wordpress/ten-impressions-of-big-data-claims-aspirations-hardly-any-causal-inference/

Five more big data quotes: The ambitions and challenges

http://theincidentaleconomist.com/wordpress/five-more-big-data-quotes/

Cochrane Risk Of Bias Tool For Non-Randomized Studies

Status

Cochrane Risk Of Bias Tool For Non-Randomized Studies

Like many others, our position is that, with very few exceptions, cause and effect conclusions regarding therapeutic interventions can only be drawn when valid RCT data exists. However, there are uses for observational studies which may be used to answer additional questions, and non-randomized studies (NRS) are often included in systematic reviews.

In September 2014, Cochrane published a tool for assessing bias in NRS for systematic review authors [1]. It may be of interest to our colleagues. The tool is called ACROBAT-NRSI (“A Cochrane Risk Of Bias Assessment Tool for Non-Randomized Studies”) and is designed to assist with evaluating the risk of bias (RoB) in the results of NRS that compare the health effects of two or more interventions.

The tool focuses on internal validity. It covers seven domains through which bias might be introduced into a NRS. The domains provide a framework for considering any type of NRS, and are summarized in the table below, and many of the biases listed here are described and explanations of how they may cause bias are presented in the full document, and you can see our rough summary here: http://www.delfini.org/delfiniClick_Observations.htm#robtable

Response options for each bias include: low risk of bias; moderate risk of bias; serious risk of bias; critical risk of bias; and no information on which to base a judgment.

Details are available in the full document which can be downloaded at—https://sites.google.com/site/riskofbiastool/

Delfini Comment
We again point out that non-randomized studies often report seriously misleading results even when treated and control groups appear similar in prognostic variables and agree with Deeks that, for therapeutic interventions ,“non-randomised studies should only be undertaken when RCTs are infeasible or unethical”[2]—and even then, buyer beware. Studies do not get “validity grace” because of scientific or practical challenges.

Furthermore, we are uncertain that this tool is of great value when assessing NRS. Deeks [2] identified 194 tools that could be or had been used to assess NRS. Do we really need another one? While it’s a good document for background reading, we are more comfortable approaching the problem of observational data by pointing out that, when it comes to efficacy, high quality RCTs have a positive predictive value of about 85% whereas well-done observational trials have a positive predictive value of about 20% [3].

References

Sterne JAC, Higins JPT, Reves BC on behalf of the development group for ACROBAT- NRSI. A Cochrane Risk Of Bias Asesment Tol: for Non-Randomized Studies of Interventions (ACROBAT- NRSI), Version 1.0.0, 24 September 2014. Available from htp:/www.riskofbias.info [accessed 10/11/14.

Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG; International Stroke Trial Collaborative Group; European Carotid Surgery Trial Collaborative Group. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii-x, 1-173. Review. PubMed PMID: 14499048.

Ioannidis JPA. Why Most Published Research Findings are False. PLoS Med 2005; 2(8):696-701 PMID: 16060722.

Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports

Status

Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports

Many critical appraisers assess bias using tools such as the Cochrane risk of bias tool (Higgins 11) or tools freely available from us (http://www.delfini.org/delfiniTools.htm). Internal validity is assessed by evaluating important items such as generation of the randomization sequence, concealment of allocation, blinding, attrition and assessment of results.

Jefferson et al. recently compared the risk of bias in 14 oseltamivir trials using information from previous assessments based on the study publications and the newly acquired, more extensive clinical study reports (CSRs) obtained from the European Medicines Agency (EMA) and the manufacturer, Roche.

Key findings include the following:

Evaluations using more complete information from the CSRs resulted in no difference in the number of previous assessment of “high” risk of bias.
However, over half (55%, 34/62) of the previous “low” risk of bias ratings were reclassified as “high.”
Most of the previous “unclear” risk of bias ratings (67%, 28/32) were changed to “high” risk of bias ratings when CSRs were available.

The authors discuss the idea that the risk of bias tools are important because they facilitate the process of critical appraisal of medical evidence. They also call for greater availability of the CSRs as the basic unit available for critical appraisal.

Delfini Comment

We believe that both sponsors and researchers need to provide more study detail so that critical appraisers can provide more precise ratings of risk of bias. Study publications frequently lack information needed by critical appraisers.

We agree that CSRs should be made available so they can be used to improve their assessments of clinical trials. However, our experience has been the opposite of that experienced by the authors. When companies have invited us to work with them to assess the reliability of their studies and made CSRs available to us, frequently we have found important information not otherwise available in the study publication. When this happens, studies otherwise given a rating at higher risk of bias have often been determined to be at low risk of bias and of high quality.

References

1. Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savovic J, Schulz KF, Weeks L, Sterne JA; Cochrane Bias Methods Group; Cochrane Statistical Methods Group. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011 Oct 18;343:d5928. doi: 10.1136/bmj.d5928. PubMed PMID: 22008217.

2. Jefferson T, Jones MA, Doshi P, Del Mar CB, Hama R, Thompson MJ, Onakpoya I, Heneghan CJ. Risk of bias in industry-funded oseltamivir trials: comparison of core reports versus full clinical study reports. BMJ Open. 2014 Sep 30;4(9):e005253. doi: 10.1136/bmjopen-2014-005253. PubMed PMID: 25270852.

Cross-over Designs

Status

Cross-over Designs

Critical appraisal considerations for cross-over trial designs are freely available at our website. See entry at 03/17/2014: http://www.delfini.org/delfiniNew.htm

More on Attrition Bias: Update on Missing Data Points: Difference or No Difference — Does it Matter?

Posted on January 14, 2014 by Sheri Strite, Co-author of Basics for Evaluating Medical Research Studies: A Simplified Approach

Attrition Bias Update 01/14/2014: Missing Data Points: Difference or No Difference — Does it Matter?

A colleague recently wrote us to ask us more about attrition bias. We shared with him that the short answer is that there is less conclusive research on attrition bias than on other key biases. Attrition does not necessarily mean that attrition bias is present and distorting statistically significant results. Attrition may simply result in a smaller sample size which, depending upon how small the remaining population is, may be more prone to chance due to outliers or false non-significant findings due to lack of power.

If randomization successfully results in balanced groups, if blinding is successful including concealed allocation of patients to their study groups, if adherence is high, if protocol deviations are balanced and low, if co-interventions are balanced, if censoring rules are used which are unbiased, and if there are no differences between the groups except for the interventions studied, then it may be reasonable to conclude that attrition bias is not present even if attrition rates are large. Balanced baseline comparisons between completers provides further support for such a conclusion as does comparability in reasons for discontinuation, especially if many categories are reported.

On the other hand, other biases may result in attrition bias. For example, imagine a comparison of an active agent to a placebo in a situation in which blinding is not successful. A physician might encourage his or her patient to drop out of a study if they know the patient is on placebo, resulting in biased attrition that, in sufficient numbers, would potentially distort the results from what they would otherwise have been.

Delfini Group :: Blog

Evidence-based clinical improvement, critical appraisal & medical decision-making help

Tag Archives: study quality

Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies

Status

“Reading” a Clinical Trial Won’t Get You There

Status

Progression Free Survival (PFS) in Oncology Trials

Status

Network Meta-analyses—More Complex Than Traditional Meta-analyses

Status

Sounding the Alarm (Again) in Oncology

Status

Comparative Effectiveness Research (CER), “Big Data” & Causality

Status

Cochrane Risk Of Bias Tool For Non-Randomized Studies

Status

Comparison of Risk of Bias Ratings in Clinical Trials—Journal Publications Versus Clinical Study Reports

Status

Cross-over Designs

Status

More on Attrition Bias: Update on Missing Data Points: Difference or No Difference — Does it Matter?