## Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence

### Status

Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence

A recent paper by McCormack reminds us that authors may mislead readers by making unwarranted “all-or-none” statements and that readers should be mindful of this and carefully examine confidence intervals.

When examining results of a valid study, confidence intervals (CIs) provide much more information than p-values. The results are statistically significant if a confidence interval does not touch the line of no difference (zero in the case of measures of outcomes expressed as percentages such as absolute risk reduction and relative risk reduction and 1 in the case of ratios such as relative risk and odds ratios). However, in addition to providing information about statistical significance, confidence intervals also provide a plausible range for possibly true results within a margin of chance (5 percent in the case of a 95% CI). While the actual calculated outcome (i.e., the point estimate) is “the most likely to be true” result within the confidence interval, having this range enables readers to judge, in their opinion, if statistically significant results are clinically meaningful.

However, as McCormack points out, authors frequently do not provide useful interpretation of the confidence intervals, and authors at times report different conclusions from similar data. McCormack presents several cases that illustrate this problem, and this paper is worth reading.

As an illustration, assume two hypothetical studies report very similar results. In the first study of drug A versus drug B, the relative risk for mortality was 0.9, 95% CI (0.80 to 1.05). The authors might state that there was no difference in mortality between the two drugs because the difference is not statistically significant. However, the upper confidence interval is close to the line of no difference and so the confidence interval tells us that it is possible that a difference would have been found if more people were studied, so that statement is misleading. A better statement for the first study would include the confidence intervals and a neutral interpretation of what the results for mortality might mean. Example—

“The relative risk for overall mortality with drug A compared to placebo was 0.9, 95% CI (0.80 to 1.05). The confidence intervals tell us that Drug A may reduce mortality by up to a relative 20% (i.e., the relative risk reduction), but may increase mortality, compared to Drug B, by approximately 5%.”

In a second study with similar populations and interventions, the relative risk for mortality might be 0.93, 95% CI (0.83 to 0.99). In this case, some authors might state, “Drug A reduces mortality.” A better statement for this second hypothetical study would ensure that the reader knows that the upper confidence interval is close to the line of no difference and, therefore, is close to non-significance. Example—

“Although the mortality difference is statistically significant, the confidence interval indicates that the relative risk reduction may be as great as 17% but may be as small as 1%.”

The Bottom Line

1. Remember that p-values refer only to statistical significance and confidence intervals are needed to evaluate clinical significance.
2. Watch out for statements containing the words “no difference” in the reporting of study results. A finding of no statistically significant difference may be a product of too few people studied (or insufficient time).
3. Watch out for statements implying meaningful differences between groups when one of the confidence intervals approaches the line of no difference.
4. None of this means anything unless the study is valid. Remember that bias tends to favor the intervention under study.

If authors do not provide you with confidence intervals, you may be able to compute them yourself, if they have supplied you with sufficient data, using an online confidence interval calculator. For our favorites, search “confidence intervals” at our web links page: http://www.delfini.org/delfiniWebSources.htm

Reference

McCormack J, Vandermeer B, Allan GM. How confidence intervals become confusion intervals. BMC Med Res Methodol. 2013 Oct 31;13(1):134. [Epub ahead of print] PubMed PMID: 24172248.

## Time-related Biases

### Status

Time-related Biases Including Immortality Bias

We were recently asked about the term “immortality bias.” The easiest way to explain immortality bias is to start with an example.  Imagine a study of hospitalized COPD patients undertaken to assess the impact of drug A, an inhaled corticosteroid preparation, on survival.  In our first example, people are randomized to receive a prescription to drug A post-discharge or not to receive a prescription. If someone in group A dies prior to filling their prescription, they should be analyzed as randomized and, therefore, they should be counted as a death in the drug A group even though they were never actually exposed to drug A.

Let’s imagine that drug A confers no survival advantage and that mortality for this population is 10 percent.  In a study population of 1,000 patients in each group, we would expect 100 deaths in each group. Let us say that 10 people in the drug A group died before they could receive their medication. If we did not analyze the unexposed people who died in group A as randomized, that would be 90 drug A deaths as compared to 100 comparison group deaths—making it falsely appear that drug A resulted in a survival advantage.

If drug A actually works, the time that patients are not exposed to the drug works a little against the intervention (oh, yes, and do people actually take their drug?), but as bias tends to favor the intervention, this probably evens up the playing field a bit—there is a reason why we talk about “closeness to truth” and “estimates of effect.”

“Immortality bias” is a risk in studies when there is a time period (the “immortal” or the “immune” time when the outcome is other than survival) in which patients in one group cannot experience an event.  Setting aside the myriad other biases that can plague observational studies, such as the potential for confounding through choice of treatment, to illustrate this, let us compare our randomized controlled trial (RCT) that we just described to a retrospective cohort study to study the same thing. In the observational study, we have to pick a time to start observing patients, and it is no longer randomly decided how patients are grouped for analysis, so we have to make a choice about that too.

For our example, let us say we are going to start the clock on recording outcomes (death) beginning at the date of discharge. Patients are then grouped for analysis by whether or not they filled a prescription for drug A within 90 days of discharge.  Because “being alive” is a requirement for picking up prescription, but not for the comparison group, the drug A group potentially receives a “survival advantage” if this bias isn’t taken into account in some way in the analysis.

In other words, by design, no deaths can occur in the drug A group prior to picking up a prescription.  However, in the comparison group, death never gets an opportunity to “take a holiday” as it were.  If you die before getting a prescription, you are automatically counted in the comparison group.  If you live and pick up your prescription, you are automatically counted in the drug A group.  So the outcome of “being alive” is a prerequisite to being in the drug A group. Therefore, all deaths of people not filling a prescription that occur prior to that 90 day window get counted in the comparison group.   And so yet another example of how groups being different or being treated differently other than what is being studied can bias outcomes.

Many readers will recognize the similarity between immortality bias and lead time bias. Lead time bias occurs when earlier detection of a disease, because of screening, makes it appear that the screening has conferred a survival advantage—when, in fact, the “greater length of time survived” is really an artifact resulting from the additional time counted between disease identification and when it would have been found if no screening had taken place.

Another instance where a time-dependent bias can occur is in oncology studies when intermediate markers (e.g., tumor recurrence) are assessed at the end of follow-up segments using Kaplan-Meier methodology. Recurrence may have occurred in some subjects at the beginning of the time segment rather than at the end of a time segment.

It is always good to ask if, in the course of the study, could the passing of time have had a resulting impact on any outcomes?

Other Examples —

• Might the population under study have significantly changed during the course of the trial?
• Might the time period of the study affect study results (e.g., studying an allergy medication, but not during allergy season)?
• Could awareness of adverse events affect future reporting of adverse events?
• Could test timing or a gap in testing result in misleading outcomes (e.g., in studies comparing one test to another, might discrepancies have arisen in test results if patients’ status changed in between applying the two tests)?

All of these time-dependent biases can distort study results.

## Webinar: “Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities”

### Status

“Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities”

On Monday, May 20, 2013, we presented a webinar on “Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities” for the member organizations of the Alliance of Community Health Plans (ACHP).

The 80-minute discussion addressed four topic areas, all of which have unique critical appraisal challenges. Webinar goals were to discuss issues that arise when conducting quality improvement efforts using real world data, such as data from claims, surveys and observational studies and other published healthcare evidence.

Key pitfalls were cherry picked for these four mini-seminars—

• Pitfalls to avoid when using real-world data, dealing with heterogeneity, confounding-by-indication and causality.
• Key issues in evaluating oncology studies — outcome issues and focus on how to address large attrition rates.
• Important issues when conducting comparative safety reviews — assessing patterns through use of RCTs, systematic reviews, observational studies and registries.
• Key issues in evaluating studies employing Kaplan-Meier estimates — time-to-event basics with attention to the important problem of censoring.

A recording of the webinar is available at—

https://achp.webex.com/achp/lsr.php?AT=pb&SP=TC&rID=45261732&rKey=1475c8c3abed8061&act=pb

## When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?

### Status

When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?

For those of you who struggle with the fundamental difference between absolute risk reduction (ARR) versus relative risk reduction (RRR) and their counterparts, absolute and relative risk increase (ARI/RRI), we have always explained that only knowing the RRR or the RRI without other quantitative information about the frequency of events is akin to knowing that a store is having a half-off sale—but when you walk in, you find that they aren’t posting the actual price!  And so your question is 50 percent off of what???

You should have the same question greet you whenever you are provided with a relative measure (and if you aren’t told whether the measure is relative or absolute, you may be safer off assuming that it is relative). Below is a link to a great short cartoon that turns the lens a little differently and which might help.

However, we will add that, in our opinion, ARR alone isn’t fully informative either, nor is its kin, the number-needed-to-treat or NNT, and for ARI, the number-needed-to-harm or NNH.  A 5 percent reduction in risk may be perceived very differently when “10 people out of a hundred benefit with one intervention compared to 5 with placebo” as compared to a different scenario in which “95 people out of a hundred benefit with one intervention as compared to 90 with placebo.” As a patient, I might be less likely to want to expose myself to side effects if it is highly likely I am going to improve without treatment, for example.  Providing this full information–for critically appraised studies that are deemed to be valid–of course, may best provide patients with information that helps them make choices based on their own needs and requirements including their values and preferences.

We think that anyone involved in health care decision-making—including the patient—is best helped by knowing the event rates for each of the groups studied—i.e., the numerators and denominators for the outcome of interest by group which comprise the 4 numbers that make up the 2 by 2 table which is used to calculate many statistics.

Isn’t it great when learning can be fun too!  Enjoy!

http://www.ibtimes.com/articles/347476/20120531/relative-risk-absolute-comic-health-medical-reporting.htm

### Status

We have a new 1-pager explaining key points about quality-adjusted life years (QALYs) available on our website.  Search “QALY” at our tools page:

## Centrum—Spinning the Vitamins?

### Status

Centrum—Spinning the Vitamins?

Scott K. Aberegg, MD, MPH, has written an amusing and interesting blog about a recently published randomized controlled trial (RCT) on vitamins and cancer outcomes[1]. In the blog, he critiques the Physicians’ Health Study II and points out the following:

• Aberegg wonders why, with a trial of 14,000 people, you would adjust the baseline variables.
• The lay press reported a statistically significant 8% reduction in subjects taking Centrum multivitamins; the unadjusted Crude Log Rank p-value, however, was 0.05—not statistically significant.
• The adjusted p-value was 0.04 for the hazard ratio which means that the 8% was a relative risk reduction.
• His own calculations reveals an absolute risk reduction of 1.2% and, by performing a simple sensitivity analysis—by adding 5 cancers and then 10 cancers to the placebo group—the p-value changes to 0.0768 and 0.0967, demonstrating that small changes have a big impact on the p-value.

He concludes that, “…without spin, we see that multivitamins (and other supplements) create both expensive urine and expensive studies – and both just go right down the drain.”

A reminder that, if the results had indeed been clinically meaningful, then the next step would be to perform a critical appraisal to determine if the study were valid or not.

Reference

[1] http://medicalevidence.blogspot.com/2012/10/a-centrum-day-keeps-cancer-at-bay.html accessed 10/25/12.

[2] Gaziano JM et al. Multivitamins in the Prevention of Cancer in Men The Physicians’ Health Study II Randomized Controlled Trial. JAMA. 2012;308(18):doi:10.1001/jama.2012.14641.

## Early Termination of Clinical Trials—2012 Update

### Status

Early Termination of Clinical Trials—2012 Update

Several years ago we presented the increasing evidence of problems with early termination of clinical trials for benefit after interim analyses.[1] The bottom line is that results are very likely to be distorted because of chance findings.  A useful review of this topic has been recently published.[2] Briefly, this review points out that—

• Frequently trials stopped early for benefit report results that are not credible, e.g., in one review, relative risk reductions were over 47% in half, over 70% in a quarter. The apparent overestimates were larger in smaller trials.
• Stopping trials early for apparent benefit is highly likely to systematically overestimate treatment effects.
• Large overestimates were common when the total number of events was less than 200.
• Smaller but important overestimates are likely with 200 to 500 events, and trials with over 500 events are likely to show small overestimates.
• Stopping rules do not appear to ensure protection against distortion of results.
• Despite the fact that stopped trials may report chance findings that overestimate true effect sizes—especially when based on a small number of events—positive results receive significant attention and can bias clinical practice, clinical guidelines and subsequent systematic reviews.
• Trials stopped early reduce opportunities to find potential harms.

The authors provide 3 examples to illustrate the above points where harm is likely to have occurred to patients.

Case 1 is the use of preoperative beta blockers in non-cardiac surgery in 1999 a clinical trial of bisoprolol in patients with vascular disease having non-cardiac surgery with a planned sample size of 266 stopped early after enrolling 112 patients—with 20 events. Two of 59 patients in the bisoprolol group and 18 of 53 in the control group had experienced a composite endpoint event (cardiac death or myocardial infarction). The authors reported a 91% reduction in relative risk for this endpoint, 95% confidence interval (63% to 98%). In 2002, a ACC/AHA clinical practice guideline recommended perioperative use of beta blockers for this population. In 2008, a systematic review and meta-analysis, including over 12,000 patients having non-cardiac surgery, reported a 35% reduction in the odds of non-fatal myocardial infarction, 95% CI (21% to 46%), a twofold increase in non-fatal strokes, odds ratio 2.1, 95% CI (2.7 to 3.68), and a possible increase in all-cause mortality, odds ratio 1.20, 95% CI (0.95 to 1.51). Despite the results of this good quality systematic review, subsequent guidelines published in 2009 and 2012 continue to recommend beta blockers.

Case 2 is the use of Intensive insulin therapy (IIT) in critically ill patients. In 2001, a single center randomized trial of IIT in critically ill patients with raised serum glucose reported a 42% relative risk reduction in mortality, 95% CI (22% to 62%). The authors used a liberal stopping threshold (P=0.01) and took frequent looks at the data, strategies they said were “designed to allow early termination of the study.” Results were rapidly incorporated into guidelines, e.g., American College Endocrinology practice guidelines, with recommendations for an upper limit of glucose of </=8.3 mmol/L. A systematic review published in 2008 summarized the results of subsequent studies which did not confirm lower mortality with IIT and documented an increased risk of hypoglycemia.  Later, a good quality SR confirmed these later findings. Nevertheless, some guideline groups continue to advocate limits of </=8.3 mmol/L. Other guidelines utilizing the results of more recent studies, recommend a range of 7.8-10 mmol/L.15.

Case 3 is the use of  activated protein C in critically ill patients with sepsis. The original 2001 trial of recombinant human activated protein C (rhAPC) was stopped early after the second interim analysis because of an apparent difference in mortality. In 2004, the Surviving Sepsis Campaign, a global initiative to improve management, recommended use of the drug as part of a “bundle” of interventions in sepsis. A subsequent trial, published in 2005, reinforced previous concerns from studies reporting increased risk of bleeding with rhAPC and raised questions about the apparent mortality reduction in the original study. As of 2007, trials had failed to replicate the favorable results reported in the pivotal Recombinant Human Activated Protein C Worldwide Evaluation in Severe Sepsis (PROWESS) study. Nevertheless, the 2008 iteration of the Surviving Sepsis guidelines and another guideline in 2009 continued to recommend rhAPC. Finally, after further discouraging trial results, Eli Lilly withdrew the drug, activated drotrecogin alfa (Xigris) from the market 2011.

Key points about trials terminated early for benefit:

• Truncated trials are likely to overestimate benefits.
• Results should be confirmed in other studies.
• Maintain a high level of scepticism regarding the findings of trials stopped early for benefit, particularly when those trials are relatively small and replication is limited or absent.
• Stopping rules do not protect against overestimation of benefits.
• Stringent criteria for stopping for benefit would include not stopping before approximately 500 events have accumulated.

References

2. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012 Jun 15;344:e3863. doi: 10.1136/bmj.e3863. PMID:22705814.

## NNT from RR and OR

### Status

Obtaining Absolute Risk Reduction (ARR) and Number Needed To Treat (NNT) From Relative Risk (RR) and Odds Ratios (OR) Reported in Systematic Reviews

Background
Estimates of effect in meta-analyses can be expressed as either relative effects or absolute effects. Relative risks (aka risk ratios) and odds ratios are relative measures. Absolute risk reduction (aka risk difference) and number-needed-to-treat are absolute measures.  When reviewing meta-analyses, readers will almost always see results (usually mean differences between groups) presented as relative risks or odds ratios. The reason for this is that relative risks are considered to be the most consistent statistic for study results combined from multiple studies. Meta-analysts usually avoid performing meta-analyses using absolute differences for this reason.

Fortunately we are now seeing more meta-analyses reporting both the relative risks along with ARR and NNT. The key point is that meta-analyses almost always use relative effect measures (relative risk or odds ratio) and then (hopefully) re-express the results using absolute effect measures (ARR or NNT).

You may see the term, “assumed control group risk” or “assumed control risk” (ACR).   This frequently refers to risk in a control group or subgroup of patients in a meta-analysis, but could also refer to risk in any group (i.e., patients not receiving the study intervention) being compared to an intervention group.

The Cochrane Handbook now recommends that meta-analysts provide a summary table for the main outcome and that the table include the following items—

• The topic, population, intervention and comparison
• The assumed risk and corresponding risk (i.e., those receiving the intervention)
• Relative effect statistic (RR or OR)

When RR is provided, ARR can easily be calculated. Odds ratios deal with odds and not probabilities and, therefore, cannot be converted to ARR with accuracy because odds cannot account for a number within a population—only how many with, for example, as compared to how many without.  For more on “odds,” see— http://www.delfini.org/page_Glossary.htm#odds

Example 1: Antihypertensive drug therapy compared to control in elderly (60 years or older) for hypertension in the elderly

Reference: Musini VM, Tejani AM, Bassett K, Wright JM. Pharmacotherapy for hypertension in the elderly. Cochrane Database Syst Rev. 2009 Oct 7;(4):CD000028. Review. PubMed PMID: 19821263.

• Computing ARR and NNT from Relative Risk
When RR is reported in a meta-analysis, determine (this is a judgment) the assumed control risk (ACR)—i.e., the risk in the group being compared to the new intervention—from the control event rate or other data/source
• Formula: ARR=100 X ACR X (1-RR)

Calculating the ARR and NNT from the Musini Meta-analysis

• In the above meta-analysis of 12 RCTs in elderly patients with moderate hypertension, the RR for overall mortality with treatment compared to no treatment over 4.5 years was 0.90.
• The event rate  (ACR) in the control group was 116 per 1000 or 0.116
• ARR=100 X .116 X 0.01=1.16%
• NNT=100/1.16=87
• Interpretation: The relative risk with treatment compared to usual care is 90% of the control group (in this case the group of elderly patients not receiving treatment for hypertension) which translates into 1 to 2 fewer deaths per 100 treated patients over 4.5 years with treatment. In other words you would need to treat 87 elderly hypertensive people at moderate risk with antihypertensives for 4.5 years to prevent one death.

Computing ARR and NNT from Odds Ratios

In some older meta-analyses you may not be given the assumed (ACR) risk.

Example 2: Oncology Agent

Assume a meta-analysis on an oncology agent reports an estimate of effect (mortality) as an OR of 0.8 over 3 years for a new drug. In order to do the calculation, an ACR is required.  Hopefully this information will be provided in the study. If not, the reader will have to obtain the assumed control group risk (ACR) from other studies or another source. Let’s assume that the control risk in this example is 0.3.

Formula for converting OR to ARR: ARR=100 X (ACR-OR X ACR) / (1-ACR+OR X ACR)

• ARR=100 X (0.3-0.8 X 0.3) /  (1-0.3 + 0.8 X 0.3)
• In this example:
• ARR=100 X (0.3-0.24) / (0.7 + 0.28)
• ARR= 0.06/0.98
• ARR=0.061 or 6.1%
• Thus the ARR is 6.1% over 3 years.
• The NNT to benefit one patient over 3 years is 100/6.1 (rounded) is 17.

Because of the limitations of odds ratios, as described above, it should be noted that when outcomes occur commonly (e.g., >5%), odds ratios may then overestimate the effect of a treatment.

For more information see The Cochrane Handbook, Part 2, Chapter 12.5.4 available at http://www.cochrane-handbook.org/

## Critical Appraisal Matters

### Status

Critical Appraisal Matters

Most of us know that there is much variation in healthcare that is not explained by patient preference, differences in disease incidence or resource availability. We think that many of the healthcare quality problems with overuse, underuse, misuse, waste, patient harms and more stems from a broad lack of understanding by healthcare decision-makers about  what constitutes solid clinical research.

We think it’s worth visiting (or revisiting) our webpage on “Why Critical Appraisal Matters.”

http://www.delfini.org/delfiniFactsCriticalAppraisal.htm

## Loss to Follow-up Update

### Status

Loss to Follow-up Update
Heads up about an important systematic review of the effects of attrition on outcomes of randomized controlled trials (RCTs) that was recently published in the BMJ.[1]

Background

• Key Question: Would the outcomes of the trial change significantly if all persons had completed the study, and we had complete information on them?
• Loss to follow-up in RCTs is important because it can bias study results if the balance between study groups that was established through randomization is disrupted in key prognostic variables that would otherwise result in different outcomes.  If there is no imbalance between and within various study subgroups (i.e., as randomized groups compared to completers), then loss to follow-up may not present a threat to validity, except in instances in which statistical significance is not reached because of decreased power.

BMJ Study
The aim of this review was to assess the reporting, extent and handling of loss to follow-up and its potential impact on the estimates of the effect of treatment in RCTs. The investigators evaluated 235 RCTs published between 2005 through 2007 in the five general medical journals with the highest impact factors: Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine. All eligible studies reported a significant (P<0.05) primary patient-important outcome.

Methods
The investigators did several sensitivity analyses to evaluate the effect varying assumptions about the outcomes of participants lost to follow-up on the estimate of effect for the primary outcome.  Their analyses strategies were—

• None of the participants lost to follow-up had the event
• All the participants lost to follow-up had the event
• None of those lost to follow-up in the treatment group had the event and all those lost to follow-up in the control group did (best case scenario)
• All participants lost to follow-up in the treatment group had the event and none of those in the control group did (worst case scenario)
• More plausible assumptions using various event rates which the authors call the “the event incidence:” The investigators performed sensitivity analyses using what they considered to be plausible ratios of event rates in the dropouts compared to the completers using ratios of 1, 1.5, 2, 3.5 in the intervention group compared to the control group (see Appendix 2 at the link at the end of this post below the reference). They chose an upper limit of 5 times as many dropouts for the intervention group as it represents the highest ratio reported in the literature.

Key Findings

• Of the 235 eligible studies, 31 (13%) did not report whether or not loss to follow-up occurred.
• In studies reporting the relevant information, the median percentage of participants lost to follow-up was 6% (interquartile range 2-14%).
• The method by which loss to follow-up was handled was unclear in 37 studies (19%); the most commonly used method was survival analysis (66, 35%).
• When the investigators varied assumptions about loss to follow-up, results of 19% of trials were no longer significant if they assumed no participants lost to follow-up had the event of interest, 17% if they assumed that all participants lost to follow-up had the event, and 58% if they assumed a worst case scenario (all participants lost to follow-up in the treatment group and none of those in the control group had the event).
• Under more plausible assumptions, in which the incidence of events in those lost to follow-up relative to those followed-up was higher in the intervention than control group, 0% to 33% of trials—depending upon which plausible assumptions were used (see Appendix 2 at the link at the end of this post below the reference)— lost statistically significant differences in important endpoints.

Summary
When plausible assumptions are made about the outcomes of participants lost to follow-up in RCTs, this study reports that up to a third of positive findings in RCTs lose statistical significance. The authors recommend that authors of individual RCTs and of systematic reviews test their results against various reasonable assumptions (sensitivity analyses). Only when the results are robust with all reasonable assumptions should inferences from those study results be used by readers.

For more information see the Delfini white paper  on “missingness” at http://www.delfini.org/Delfini_WhitePaper_MissingData.pdf

Reference

1. Akl EA, Briel M, You JJ et al. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review BMJ 2012;344:e2809 doi: 10.1136/bmj.e2809 (Published 18 May 2012). PMID: 19519891

Article is freely available at—

http://www.bmj.com/content/344/bmj.e2809

Supplementary information is available at—

http://www.bmj.com/content/suppl/2012/05/18/bmj.e2809.DC1

For sensitivity analysis results tables, see Appendix 2 at—