Centrum—Spinning the Vitamins?


Centrum—Spinning the Vitamins?

Scott K. Aberegg, MD, MPH, has written an amusing and interesting blog about a recently published randomized controlled trial (RCT) on vitamins and cancer outcomes[1]. In the blog, he critiques the Physicians’ Health Study II and points out the following:

  • Aberegg wonders why, with a trial of 14,000 people, you would adjust the baseline variables.
  • The lay press reported a statistically significant 8% reduction in subjects taking Centrum multivitamins; the unadjusted Crude Log Rank p-value, however, was 0.05—not statistically significant.
  • The adjusted p-value was 0.04 for the hazard ratio which means that the 8% was a relative risk reduction.
  • His own calculations reveals an absolute risk reduction of 1.2% and, by performing a simple sensitivity analysis—by adding 5 cancers and then 10 cancers to the placebo group—the p-value changes to 0.0768 and 0.0967, demonstrating that small changes have a big impact on the p-value.

He concludes that, “…without spin, we see that multivitamins (and other supplements) create both expensive urine and expensive studies – and both just go right down the drain.”

A reminder that, if the results had indeed been clinically meaningful, then the next step would be to perform a critical appraisal to determine if the study were valid or not.


[1] http://medicalevidence.blogspot.com/2012/10/a-centrum-day-keeps-cancer-at-bay.html accessed 10/25/12.

[2] Gaziano JM et al. Multivitamins in the Prevention of Cancer in Men The Physicians’ Health Study II Randomized Controlled Trial. JAMA. 2012;308(18):doi:10.1001/jama.2012.14641.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Interesting Comparative Effectiveness Research (CER) Case Study: “Real World Data” Hypothetical Migraine Case and Lack of PCORI Endorsement


Interesting Comparative Effectiveness Research (CER) Case Study: “Real World Data” Hypothetical Migraine Case and Lack of PCORI Endorsement

In the October issue of Health Affairs, the journal’s editorial team created a fictional set of clinical trials and observational studies to see what various stakeholders would say about comparative effectiveness evidence of two migraine drugs.[1]

The hypothetical set-up is this:

The newest drug, Hemikrane, is an FDA-approved drug that has recently come on the market. It was reported in clinical trials to reduce both the frequency and the severity of migraine headaches. Hemikrane is taken once a week. The FDA approved Hemikrane based on two randomized, double-blind, controlled clinical trials, each of which had three arms.

  • In one arm, patients who experienced multiple migraine episodes each month took Hemikrane weekly.
  • In another arm, a comparable group of patients received a different migraine drug, Cephalal, a drug which was reported to be effective in earlier, valid studies. It is taken daily.
  • In a third arm, another equivalent group of patients received placebos.

The study was powered to find a difference between Hemikrane and placebo if there was one and if it were at least as effective as Cephalal. Each of the two randomized studies enrolled approximately 2,000 patients and lasted six months. They excluded patients with uncontrolled high blood pressure, diabetes, heart disease, or kidney dysfunction. The patients received their care in a number of academic centers and clinical trial sites. All patients submitted daily diaries, recording their migraine symptoms and any side effects.

Hypothetical Case Study Findings: The trials reported that the patients who took Hemikrane had a clinically significant reduction in the frequency, severity, and duration of headaches compared to placebo, but not to Cephalal.

The trials were not designed to evaluate the comparative safety of the drugs, but there were no safety signals from the Hemikrane patients, although a small number of patients on the drug experienced nausea.

Although the above studies reported efficacy of Hemikrane in a controlled environment with highly selected patients, they did not assess patient experience in a real-world setting. Does once weekly dosing improve adherence in the real world? The monthly cost of Hemikrane to insurers is $200, whereas Cephalal costs insurers $150 per month. (In this hypothetical example, the authors assume that copayments paid by patients are the same for all of these drugs.)

A major philanthropic organization with an interest in advancing treatments for migraine sufferers funded a collaboration among researchers at Harvard; a regional health insurance company, Trident Health; and, Hemikrane’s manufacturer, Aesculapion. The insurance company, Trident Health, provided access to a database of five million people, which included information on medication use, doctor visits, emergency department evaluations and hospitalizations. Using these records, the study identified a cohort of patients with migraine who made frequent visits to doctors or hospital emergency departments. The study compared information about patients receiving Hemikrane with two comparison groups: a group of patients who received the daily prophylactic regimen with Cephalal, and a group of patients receiving no prophylactic therapy.

The investigators attempted to confirm the original randomized trial results by assessing the frequency with which all patients in the study had migraine headaches. Because the database did not contain a diary of daily symptoms, which had been collected in the trials, the researchers substituted as a proxy the amount of medications such as codeine and sumatriptan (Imitrex) that patients had used each month for treatment of acute migraines. The group receiving Hemikrane had lower use of these symptom-oriented medications than those on Cephalal or on no prophylaxis and had fewer emergency department visits than those taking Cephalal or on no prophylaxis.

Although the medication costs were higher for patients taking Hemikrane because of its higher monthly drug cost, the overall episode-of-care costs were lower than for the comparison group taking Cephalal. As hypothesized, the medication adherence was higher in the once-weekly Hemikrane patients than in the daily Cephalal patients (80 percent and 50 percent, respectively, using the metric of medication possession ratio, which is the number of days of medication dispensed as a percentage of 365 days).

The investigators were concerned that the above findings might be due to the unique characteristics of Trident Health’s population of covered patients, regional practice patterns, copayment designs for medications, and/or the study’s analytic approach. They also worried that the results could be confounded by differences in the patients receiving Hemikrane, Cephalal, or no prophylaxis. One possibility, for example, was that patients who experienced the worst migraines might be more inclined to take or be encouraged by their doctors to take the new drug, Hemikrane, since they had failed all previously available therapies. In that case, the results for a truly matched group of patients might have shown even more pronounced benefit for Hemikrane.

To see if the findings could be replicated, the investigators contacted the pharmacy benefit management company, BestScripts, that worked withTrident Health, and asked for access to additional data. A research protocol was developed before any data were examined. Statistical adjustments were also made to balance the three groups of patients to be studied as well as possible—those taking Hemikrane, those taking Cephalal, and those not on prophylaxis—using a propensity score method (which included age, sex, number of previous migraine emergency department visits, type and extent of prior medication use and selected comorbidities to estimate the probability of a person’s being in one of the three groups) to balance the groups.

The pharmacy benefit manager, BestScripts, had access to data covering more than fifty million lives. The findings in this second, much larger, database corroborated the earlier assessment. The once-weekly prophylactic therapy with Hemikrane clearly reduced the use of medications such as codeine to relieve symptoms, as well as emergency department visits compared to the daily prophylaxis and no prophylaxis groups. Similarly, the Hemikrane group had significantly better medication adherence than the Cephalal group. In addition, BestScripts had data from a subset of employers that collected work loss information about their employees. These data showed that patients on Hemikrane were out of work for fewer days each month than patients taking Cephalal.

In a commentary, Joe Selby, executive director of the Patient-Centered Outcomes Research Institute (PCORI), and colleagues provided a list of problems with these real world studies including threats to validity. They conclude that these hypothetical studies would be unlikely to have been funded or communicated by PCORI.[2]

Below are several of the problems identified by Selby et al.

  • Selection Bias
    • Patients and clinicians may have tried the more familiar, less costly Cephalal first and switched to Hemikrane only if Cephalal failed to relieve symptoms, making the Hemikrane patients a group, who on average, would be more difficult to treat.
    • Those patients who continued using Cephalal may be a selected group who tolerate the treatment well and perceived a benefit.
    • Even if the investigators had conducted the study with only new users, it is plausible that patients prescribed Hemikrane could differ from those prescribed Cephalal. They may be of higher socioeconomic status, have better insurance coverage with lower copayments, have different physicians, or differ in other ways that could affect outcomes.
  • Performance Biases or Other Differences Between Groups is possible.
  • Details of any between-group differences found in these exploratory analyses should have been presented.

Delfini Comment

These two articles are worth reading if you are interested in the difficult area of evaluating observational studies and including them in comparative effectiveness research (CER). We would add that to know if drugs really work, valid RCTs are almost always needed. In this case we don’t know if the studies were valid, because we don’t have enough information about the risk of selection, performance, attrition and assessment bias and other potential methodological problems in the studies. Database studies and other observational studies are likely to have differences in populations, interventions, comparisons, time treated and clinical settings (e.g., prognostic variables of subjects, dosing, co-interventions, other patient choices, bias from lack of blinding) and adjusting for all of these variables and more requires many assumptions. Propensity scores do not reliably adjust for differences. Thus, the risk of bias in the evidence base is unclear.

This case illustrates the difficulty of making coverage decisions for new drugs with some potential advantages for some patients when several studies report benefit compared to placebo, but we already have established treatment agents with safety records. In addition new drugs frequently are found to cause adverse events over time.

Observational data is frequently very valuable. It can be useful in identifying populations for further study, evaluating the implementation of interventions, generating hypotheses, and identifying current condition scenarios (e.g., who, what, where in QI project work; variation, etc.). It is also useful in providing safety signals and for creating economic projections (e.g., balance sheets, models). In this hypothetical set of studies, however, we have only gray zone evidence about efficacy from both RCTs and observational studies and almost no information about safety.

Much of the October issue of Health Affairs is taken up with other readers’ comments. Those of you interested in the problems with real world data in CER activities will enjoy reading how others reacted to these hypothetical drug studies.


1. Dentzer S; the Editorial Team of Health Affairs. Communicating About Comparative Effectiveness Research: A Health Affairs Symposium On The Issues. Health Aff (Millwood). 2012 Oct;31(10):2183-2187. PubMed PMID: 23048094.

2. Selby JV, Fleurence R, Lauer M, Schneeweiss S. Reviewing Hypothetical Migraine Studies Using Funding Criteria From The Patient-Centered Outcomes Research Institute. Health Aff (Millwood). 2012 Oct;31(10):2193-2199. PubMed PMID: 23048096.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

The Elephant is The Evidence—Epidural Steroids


The Elephant is The Evidence—Epidural Steroids: Edited & Updated 1/7/2013

Epidural steroids are commonly used to treat sciatica (pinched spinal nerve) or low back pain.  As of January 7, 2013 at least 40 deaths have been linked to fungal meningitis thought to be caused by contaminated epidural steroids, and 664 cases in 19 states have been identified with a clinical picture consistent with fungal infection [CDC]. Interim data show that all infected patients received injection with preservative-free methylprednisolone acetate (80mg/ml) prepared by New England Compounding Center, located in Framingham, MA. On October 3, 2012, the compounding center ceased all production and initiated recall of all methylprednisolone acetate and other drug products prepared for intrathecal administration.

Thousands of patients receive epidural steroids without significant side effects or problems every week. In this case, patients received steroids that were mixed by a “compounding pharmacy” and contamination of the medication appears to have occurred during manufacture. But let’s consider other patients who received epidural steroids from uncontaminated vials. How much risk and benefit are there with epidural steroids? The real issue is the effectiveness of epidural steroids. Yes, there are risks with epidural steroids beyond contamination—e.g., a type of headache that occurs when the dura (the sac around the spinal cord) is punctured and fluid leaks out. This causes a pressure change in the central nervous system and a headache. Bleeding is also a risk. But people with severe pain from sciatica are frequently willing to take those risks if there are likely to be benefits. But, in fact, for many patients who receive epidural steroids the likelihood of benefit is very low. For example, patients with bone problems (spinal stenosis) rather than lumbar disc disease are less likely to benefit. Patients who have had a long history of sciatica are less likely to benefit.

We don’t know how many of these patients were not likely to benefit from the epidural steroids, but if the infected patients had been advised about the unproven benefits of epidural steroids in certain cases and the known risks, some patients may have chosen to avoid the injections and possibly be alive today.  This is an example of the importance of good information as the basis for decision-making. Basing decisions on poor quality or incomplete information and intervening with unproven—yet potentially risky treatments puts millions of people at risk every week.

Let’s look at the evidence. Recently, a fairly large, well-conducted RCT published in the British Medical Journal (BMJ) reported that there is no meaningful benefit from epidural steroid injections in patients who have had long term (26 to 57 weeks) of sciatica [Iverson].  As pointed out in an editorial, epidural steroids have been used for more than 50 years to treat low back pain and sciatica and are the most common intervention in pain clinics throughout the world [Cohen]. And yet, despite their widespread use, their efficacy for the treatment of chronic sciatica remains unproven. (We should add here that many times lacking good evidence of benefit does not mean a treatment does not work.) Iverson et al conclude that, “Caudal epidural steroid or saline injections are not recommended for chronic lumbar radiculopathy [Iverson].”

Of more than 30 controlled studies evaluating epidural steroid injections, approximately half report some benefit. Systematic reviews also report conflicting results. Reasons for these discrepancies include differences in study quality, treatments, comparisons, co-interventions, study duration and patient selection. Results appear to be better for people with short term sciatica, but improvement should not be considered to be curative with epidural steroids. In this situation, it is very important that patients understand this fuzzy benefit-to-risk ratio. For many who are completely informed, the decision will be to avoid the risk.

With this recent problem of fungal meningitis from epidural steroids, it is important for patients to be informed about the world of uncertainty that surrounds risk, especially when science tells us that the evidence for benefit is not strong.  Since health care professionals frequently act as the eyes of the patient, we must seriously consider for every intervention we offer whether benefits clearly outweigh potential harms—and we must help patients understand details regarding the risks and benefits and be supportive when patients are “on the fence” about having a procedure. Remember Vioxx, arthroscopic lavage, vertebroplasy, encainide and flecainide, Darvon and countless other promising new drugs and other interventions? They seemed promising, but harms outweighed benefits for many patients.


1. http://www.cdc.gov/HAI/outbreaks/meningitis.html accessed 12/10/12

2.  Cohen SP. Epidural steroid injections for low back pain. BMJ. 2011 Sep 13;343:d5310. doi: 10.1136/bmj.d5310. PubMed PMID: 21914757.

3.  Iversen T, Solberg TK, Romner B, et al.   Effect of caudal epidural steroid or saline injection  in chronic lumbar radiculopathy: multicentre, blinded, randomised controlled trial. BMJ. 2011 Sep 13;343:d5278. doi: 10.1136/bmj.d5278. PubMed PMID: 21914755.


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Early Termination of Clinical Trials—2012 Update


Early Termination of Clinical Trials—2012 Update

Several years ago we presented the increasing evidence of problems with early termination of clinical trials for benefit after interim analyses.[1] The bottom line is that results are very likely to be distorted because of chance findings.  A useful review of this topic has been recently published.[2] Briefly, this review points out that—

  • Frequently trials stopped early for benefit report results that are not credible, e.g., in one review, relative risk reductions were over 47% in half, over 70% in a quarter. The apparent overestimates were larger in smaller trials.
  • Stopping trials early for apparent benefit is highly likely to systematically overestimate treatment effects.
  • Large overestimates were common when the total number of events was less than 200.
  • Smaller but important overestimates are likely with 200 to 500 events, and trials with over 500 events are likely to show small overestimates.
  • Stopping rules do not appear to ensure protection against distortion of results.
  • Despite the fact that stopped trials may report chance findings that overestimate true effect sizes—especially when based on a small number of events—positive results receive significant attention and can bias clinical practice, clinical guidelines and subsequent systematic reviews.
  • Trials stopped early reduce opportunities to find potential harms.

The authors provide 3 examples to illustrate the above points where harm is likely to have occurred to patients.

Case 1 is the use of preoperative beta blockers in non-cardiac surgery in 1999 a clinical trial of bisoprolol in patients with vascular disease having non-cardiac surgery with a planned sample size of 266 stopped early after enrolling 112 patients—with 20 events. Two of 59 patients in the bisoprolol group and 18 of 53 in the control group had experienced a composite endpoint event (cardiac death or myocardial infarction). The authors reported a 91% reduction in relative risk for this endpoint, 95% confidence interval (63% to 98%). In 2002, a ACC/AHA clinical practice guideline recommended perioperative use of beta blockers for this population. In 2008, a systematic review and meta-analysis, including over 12,000 patients having non-cardiac surgery, reported a 35% reduction in the odds of non-fatal myocardial infarction, 95% CI (21% to 46%), a twofold increase in non-fatal strokes, odds ratio 2.1, 95% CI (2.7 to 3.68), and a possible increase in all-cause mortality, odds ratio 1.20, 95% CI (0.95 to 1.51). Despite the results of this good quality systematic review, subsequent guidelines published in 2009 and 2012 continue to recommend beta blockers.

Case 2 is the use of Intensive insulin therapy (IIT) in critically ill patients. In 2001, a single center randomized trial of IIT in critically ill patients with raised serum glucose reported a 42% relative risk reduction in mortality, 95% CI (22% to 62%). The authors used a liberal stopping threshold (P=0.01) and took frequent looks at the data, strategies they said were “designed to allow early termination of the study.” Results were rapidly incorporated into guidelines, e.g., American College Endocrinology practice guidelines, with recommendations for an upper limit of glucose of </=8.3 mmol/L. A systematic review published in 2008 summarized the results of subsequent studies which did not confirm lower mortality with IIT and documented an increased risk of hypoglycemia.  Later, a good quality SR confirmed these later findings. Nevertheless, some guideline groups continue to advocate limits of </=8.3 mmol/L. Other guidelines utilizing the results of more recent studies, recommend a range of 7.8-10 mmol/L.15.

Case 3 is the use of  activated protein C in critically ill patients with sepsis. The original 2001 trial of recombinant human activated protein C (rhAPC) was stopped early after the second interim analysis because of an apparent difference in mortality. In 2004, the Surviving Sepsis Campaign, a global initiative to improve management, recommended use of the drug as part of a “bundle” of interventions in sepsis. A subsequent trial, published in 2005, reinforced previous concerns from studies reporting increased risk of bleeding with rhAPC and raised questions about the apparent mortality reduction in the original study. As of 2007, trials had failed to replicate the favorable results reported in the pivotal Recombinant Human Activated Protein C Worldwide Evaluation in Severe Sepsis (PROWESS) study. Nevertheless, the 2008 iteration of the Surviving Sepsis guidelines and another guideline in 2009 continued to recommend rhAPC. Finally, after further discouraging trial results, Eli Lilly withdrew the drug, activated drotrecogin alfa (Xigris) from the market 2011.

Key points about trials terminated early for benefit:

  • Truncated trials are likely to overestimate benefits.
  • Results should be confirmed in other studies.
  • Maintain a high level of scepticism regarding the findings of trials stopped early for benefit, particularly when those trials are relatively small and replication is limited or absent.
  • Stopping rules do not protect against overestimation of benefits.
  • Stringent criteria for stopping for benefit would include not stopping before approximately 500 events have accumulated.


1. http://www.delfini.org/delfiniClick_PrimaryStudies.htm#truncation

2. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012 Jun 15;344:e3863. doi: 10.1136/bmj.e3863. PMID:22705814.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

CONSORT Update of Abstract Guidelines 2012


CONSORT Update of Abstract Guidelines 2012

We have previously described the rationale and details of The Consort Statement: Consolidated Standards of Reporting Trials (CONSORT).[1] In brief, CONSORT is a checklist, based on evidence, of 25 items that need to be addressed in reports of clinical trials in order to provide readers with a clear picture of study quality and the progress of all participants in the trial, from the time they are randomized until the end of their involvement. The intent is to make the experimental process clear, flawed or not, so that users of the data can more appropriately evaluate its validity and usefulness of the results. A recent BMJ study has assessed the use of CONSORT guidelines for abstracts in five top journals—JAMA, New England Journal of Medicine (NEJM), the British Medical Journal (BMJ), Lancet and the Annals of Internal Medicine. [2]

In this study, the authors checked each journal’s instructions to authors in January 2010 for any reference to the CONSORT for Abstracts guidelines (for example, reference to a publication or link to the relevant section of the CONSORT website). For those journals that mentioned the guidelines in their instructions to authors, they contacted the editor of that journal to ask when the guidance was added, whether the journal enforced the guidelines, and if so, how. They classified journals in three categories: those not mentioning the CONSORT guidelines in their instructions to authors (JAMA and NEJM); those referring to the guidelines in their instructions to authors, but with no specific policy to implement them (BMJ); and those referring to the guidelines in their instructions to authors, with a policy to implement them (Annals of Internal Medicine and the Lancet).

First surprise—JAMA and NEJM don’t even mention CONSORT in their instructions to authors. Second surprise—CONSORT published what evidologists agree to be reasonable abstract requirements in 2008, but only the Annals and Lancet now instruction authors to follow them. The study design was to evaluate the inclusion of the 9 CONSORT items omitted more than 50% of the time from abstracts (details of the trial design, generation of the allocation sequence, concealment of allocation, details of blinding, number randomized and number analyzed in each group, primary outcome results for each group and its effect size, harms data and funding source). The primary outcome was the mean number of CONSORT items reported in selected abstracts, among nine items reported in fewer than 50% of the abstracts published across the five journals in 2006. Overall, for the primary outcome, publication of the CONSORT guidelines did not lead to a significant increase in the level of the mean number of items reported (increase of 0.3035 of nine items, P=0.16) or the trend (increase of 0.0193 items per month, P=0.21). There was a significant increase in the level of the mean number of items reported after the implementation of the CONSORT guidelines (increase of 0.3882 of five items, P=0.0072) and in trends (increase of 0.0288 items per month, P=0.0025).

What follows is not really surprising—

  • After publication of the guidelines in January 2008, the authors identified a significant increase in the reporting of key items in the two journals (Annals of Internal Medicine, and Lancet) that endorsed the guidelines in their instructions to authors and that had an active editorial policy to implement them. At baseline, in January 2006, the mean number of items reported per abstract was 1.52 of nine items, which increased to 2.56 nine items during the 25 months before the intervention. In December 2009, 23 months after the publication of the guidelines, the mean number of items reported per abstract for the primary outcome in the Annals of Internal Medicine and the Lancet was 5.41 items, which represented a 53% increase compared with the expected level estimated on the basis of pre-intervention trends.
  • The authors observed no significant difference in the one journal (BMJ) that endorsed the guidelines but did not have an active implementation strategy, and in the two journals (JAMA, NEJM) that did not endorse the guidelines in their instructions to authors.

What this study shows is that without actively implementing editorial policies—i.e., requiring the use of CONSORT guidelines, improved reporting does not happen. A rather surprising finding for us was that only two of the five top journals included in this study have active implementation policies (e.g., an email to authors at time of revision that requires revision of the abstract according to CONSORT guidance). We have a long ways to go.

More details about CONSORT are available, including a few of the flow diagram, at— http://www.consort-statement.org/


1. http://www.delfini.org/delfiniClick_ReportingEvidence.htm#consort

2. Hopewell S, Philippe P, Baron G., Boutron I.  Effect of editors’ implementation of CONSORT on the reporting of abstracts in high impact medical journals: interrupted time series analysis. BMJ 2012;344:e4178.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Are Adaptive Trials Ready For Primetime?


Are Adaptive Trials Ready For Primetime?

It is well-known that many patients volunteer for clinical trials because they mistakenly believe that the goal of the trial is to improve outcomes for the volunteers. A type of trial that does attempt to improve outcomes for those who enter into the trial late is the adaptive trial. In adaptive trials investigators change the enrollment and treatment procedures as the study gathers data from the trial about treatment efficacy. For example, if a study compares a new drug against a placebo treatment and the drug appears to be working, subjects enrolling later will be more likely to receive it. The idea is that adaptive designs will attract more study volunteers.

As pointed out in a couple of recent commentaries, however, there are many unanswered questions about this type of trial. A major concern is the problem of unblinding that may occur with this design with resulting problems with allocation of patients to groups. Frequent peeks at the data may influence decisions made by monitoring boards, investigators and participants.  Another issue is the unknown ability to replicate adaptive trials.  Finally, there are ethical questions such as the issue of greater risk for early enrollees compared to risk for later enrollees.

For further information see—

1. Adaptive Trials in Clinical Research: Scientific and Ethical Issues to Consider
van der Graaf R, Roes KC, van Delden JJ. Adaptive Trials in Clinical Research: Scientific and Ethical Issues to ConsiderAdaptive Trials in Clinical Research. JAMA. 2012 Jun 13;307(22):2379-80. PubMed PMID: 22692169.

2. Adaptive Clinical Trials: A Partial Remedy for the Therapeutic Misconception?
Meurer WJ, Lewis RJ, Berry DA. Adaptive clinical trials: a partial remedy for the therapeutic Misconception?adaptive clinical trials. JAMA. 2012 Jun 13;307(22):2377-8. PubMed PMID: 22692168.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Critical Appraisal Matters


Critical Appraisal Matters

Most of us know that there is much variation in healthcare that is not explained by patient preference, differences in disease incidence or resource availability. We think that many of the healthcare quality problems with overuse, underuse, misuse, waste, patient harms and more stems from a broad lack of understanding by healthcare decision-makers about  what constitutes solid clinical research.

We think it’s worth visiting (or revisiting) our webpage on “Why Critical Appraisal Matters.”


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Dr. John Ioannidis on Clinical Trials Issues, Cost and Inappropriate Care


Dr. John Ioannidis on Clinical Trials Issues, Cost and Inappropriate Care

Since 1949, the NIH has provided a biweekly newsletter for employees of the National Institute of Health. Mostly the NIH Record announces talks to be given on-campus, but also summarizes some of the talks. In a recent issue the Record summarized a recent talk on bias in healthcare trials, delivered by  Dr.John Ioannidis, director of the Stanford Prevention Research Center. Some of his key points are quite thought-provoking and relate to our our huge problem of costly and  inappropriate care. Here is some food for thought from Dr Ioannidis:

  • Most statistically significant findings are not real at all—they’re just false positives
  • Many of these false positives are revealed when larger-scale studies attempt to replicate the findings of smaller studies
  • One of every four such trials is refuted when a larger trial is conducted
  • Journal editorial policies are responsible for much of this trend— editors want to see research that is novel and will have a large impact on the field. This generally means that editors are looking for papers that report very large, statistically significant effects.
  • An important safeguard is “repeatability” of positive findings
  • Individuals with a track record for doing high quality research should be recognized and given priority in publishing.

To read the entire entry go to:


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Loss to Follow-up Update


Loss to Follow-up Update
Heads up about an important systematic review of the effects of attrition on outcomes of randomized controlled trials (RCTs) that was recently published in the BMJ.[1]


  • Key Question: Would the outcomes of the trial change significantly if all persons had completed the study, and we had complete information on them?
  • Loss to follow-up in RCTs is important because it can bias study results if the balance between study groups that was established through randomization is disrupted in key prognostic variables that would otherwise result in different outcomes.  If there is no imbalance between and within various study subgroups (i.e., as randomized groups compared to completers), then loss to follow-up may not present a threat to validity, except in instances in which statistical significance is not reached because of decreased power.

BMJ Study
The aim of this review was to assess the reporting, extent and handling of loss to follow-up and its potential impact on the estimates of the effect of treatment in RCTs. The investigators evaluated 235 RCTs published between 2005 through 2007 in the five general medical journals with the highest impact factors: Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine. All eligible studies reported a significant (P<0.05) primary patient-important outcome.

The investigators did several sensitivity analyses to evaluate the effect varying assumptions about the outcomes of participants lost to follow-up on the estimate of effect for the primary outcome.  Their analyses strategies were—

  • None of the participants lost to follow-up had the event
  • All the participants lost to follow-up had the event
  • None of those lost to follow-up in the treatment group had the event and all those lost to follow-up in the control group did (best case scenario)
  • All participants lost to follow-up in the treatment group had the event and none of those in the control group did (worst case scenario)
  • More plausible assumptions using various event rates which the authors call the “the event incidence:” The investigators performed sensitivity analyses using what they considered to be plausible ratios of event rates in the dropouts compared to the completers using ratios of 1, 1.5, 2, 3.5 in the intervention group compared to the control group (see Appendix 2 at the link at the end of this post below the reference). They chose an upper limit of 5 times as many dropouts for the intervention group as it represents the highest ratio reported in the literature.

Key Findings

  • Of the 235 eligible studies, 31 (13%) did not report whether or not loss to follow-up occurred.
  • In studies reporting the relevant information, the median percentage of participants lost to follow-up was 6% (interquartile range 2-14%).
  • The method by which loss to follow-up was handled was unclear in 37 studies (19%); the most commonly used method was survival analysis (66, 35%).
  • When the investigators varied assumptions about loss to follow-up, results of 19% of trials were no longer significant if they assumed no participants lost to follow-up had the event of interest, 17% if they assumed that all participants lost to follow-up had the event, and 58% if they assumed a worst case scenario (all participants lost to follow-up in the treatment group and none of those in the control group had the event).
  • Under more plausible assumptions, in which the incidence of events in those lost to follow-up relative to those followed-up was higher in the intervention than control group, 0% to 33% of trials—depending upon which plausible assumptions were used (see Appendix 2 at the link at the end of this post below the reference)— lost statistically significant differences in important endpoints.

When plausible assumptions are made about the outcomes of participants lost to follow-up in RCTs, this study reports that up to a third of positive findings in RCTs lose statistical significance. The authors recommend that authors of individual RCTs and of systematic reviews test their results against various reasonable assumptions (sensitivity analyses). Only when the results are robust with all reasonable assumptions should inferences from those study results be used by readers.

For more information see the Delfini white paper  on “missingness” at http://www.delfini.org/Delfini_WhitePaper_MissingData.pdf


1. Akl EA, Briel M, You JJ et al. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review BMJ 2012;344:e2809 doi: 10.1136/bmj.e2809 (Published 18 May 2012). PMID: 19519891

Article is freely available at—


Supplementary information is available at—


For sensitivity analysis results tables, see Appendix 2 at—


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

From Richard Lehman’s Blog on Clinical Trial Quality


From Richard Lehman’s Blog JAMA 2 May 2012 Vol 307:

“Here the past and present custodians of this site look at the quality of the trials registered between 2007 and 2010. They ‘are dominated by small trials and contain significant heterogeneity in methodological approaches, including reported use of randomization, blinding, and data monitoring committees.’ In other words, these trials are never going to yield clinically dependable data; most of them are futile, and therefore by definition unethical. Something is terribly wrong with the system which governs clinical trials: it is failing to protect patients and failing to generate useful knowledge. Most of what it produces is not evidence, but rubbish. And with no system in place to compel full disclosure of the data, it is often impossible to tell one from the other.”

For more Richard Lehman go to Journal Watch http://www.cebm.net/index.aspx?o=2320

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email