When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?


When Is a Measure of Outcomes Like a Coupon for a Diamond Necklace?

For those of you who struggle with the fundamental difference between absolute risk reduction (ARR) versus relative risk reduction (RRR) and their counterparts, absolute and relative risk increase (ARI/RRI), we have always explained that only knowing the RRR or the RRI without other quantitative information about the frequency of events is akin to knowing that a store is having a half-off sale—but when you walk in, you find that they aren’t posting the actual price!  And so your question is 50 percent off of what???

You should have the same question greet you whenever you are provided with a relative measure (and if you aren’t told whether the measure is relative or absolute, you may be safer off assuming that it is relative). Below is a link to a great short cartoon that turns the lens a little differently and which might help.

However, we will add that, in our opinion, ARR alone isn’t fully informative either, nor is its kin, the number-needed-to-treat or NNT, and for ARI, the number-needed-to-harm or NNH.  A 5 percent reduction in risk may be perceived very differently when “10 people out of a hundred benefit with one intervention compared to 5 with placebo” as compared to a different scenario in which “95 people out of a hundred benefit with one intervention as compared to 90 with placebo.” As a patient, I might be less likely to want to expose myself to side effects if it is highly likely I am going to improve without treatment, for example.  Providing this full information–for critically appraised studies that are deemed to be valid–of course, may best provide patients with information that helps them make choices based on their own needs and requirements including their values and preferences.

We think that anyone involved in health care decision-making—including the patient—is best helped by knowing the event rates for each of the groups studied—i.e., the numerators and denominators for the outcome of interest by group which comprise the 4 numbers that make up the 2 by 2 table which is used to calculate many statistics.

Isn’t it great when learning can be fun too!  Enjoy!


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Our Current Thinking About Attrition Bias


Delfini Thoughts on Attrition Bias

Significant attrition, whether it be due to loss of patients or discontinuation or some other reason, is a reality of many clinical trials. And, of course, the key question in any study is whether attrition significantly distorted the study results. We’ve spent a lot of time researching the evidence-on-the-evidence and have found that many researchers, biostatisticians and others struggle with this area—there appears to be no clear agreement in the clinical research community about how to best address these issues. There also is inconsistent evidence on the effects of attrition on study results.

We, therefore, believe that studies should be evaluated on a case-by-case basis and doing so often requires sleuthing and sifting through clues along with critically thinking through the unique circumstances of the study.

The key question is, “Given that attrition has occurred, are the study results likely to be true?” It is important to look at the contextual elements of the study. These contextual elements may include information about the population characteristics, potential effects of the intervention and comparator, the outcomes studied and whether patterns emerge, timing and setting. It is also important to look at the reasons for discontinuation and loss-to-follow up and to look at what data is missing and why to assess likely impact on results.

Attrition may or may not impact study outcomes depending, in part, upon the reasons for withdrawals, censoring rules and the resulting effects of applying those rules, for example. However, differential attrition issues should be looked at especially closely. Unintended differences between groups are more likely to happen when patients have not been allocated to their groups in a blinded fashion, groups are not balanced at the onset of the study and/or the study is not effectively blinded or an effect of the treatment has caused the attrition.

One piece of the puzzle, at times, may be whether prognostic characteristics remained balanced. One item that would be helpful authors could help us all out tremendously by assessing comparability between baseline characteristics at randomization and for those analyzed. However, an imbalance may be an important clue too because it might be informative about efficacy or side effects of the agent understudy.

In general, we think it is important to attempt to answer the following questions:

Examining the contextual elements of a given study—

  • What could explain the results if it is not the case that the reported findings are true?
  • What conditions would have to be present for an opposing set of results (equivalence or inferiority) to be true instead of the study findings?
  • Were those conditions met?
  • If these conditions were not met, is there any reason to believe that the estimate of effect (size of the difference) between groups is not likely to be true.
Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Delfini Treatment Messaging Scripts™ Update


 Messaging Scripts ™ Update

Delfini Messaging Scripts  are scripts for scripts. Years ago we were asked by a consultancy pharmacy to come up with a method to create concise evidence-based statements for various therapies.  That’s how we came up with our ideas for Messaging Scripts, which are targeted treatment messaging & decision support tools for specific clinical topics. Since working with that group, we created a template and some sample scripts which have been favorably received wherever we have shown them.  The template is available at the link below, along with several samples.  Samples recently updated: Ace Inhibitors, Alendronate, Sciatica (Low Back Pain), Statins (two scripts) and Venous Thromboembolism (VTE) Prevention in Total Hip and Total Knee Replacement.


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Centrum—Spinning the Vitamins?


Centrum—Spinning the Vitamins?

Scott K. Aberegg, MD, MPH, has written an amusing and interesting blog about a recently published randomized controlled trial (RCT) on vitamins and cancer outcomes[1]. In the blog, he critiques the Physicians’ Health Study II and points out the following:

  • Aberegg wonders why, with a trial of 14,000 people, you would adjust the baseline variables.
  • The lay press reported a statistically significant 8% reduction in subjects taking Centrum multivitamins; the unadjusted Crude Log Rank p-value, however, was 0.05—not statistically significant.
  • The adjusted p-value was 0.04 for the hazard ratio which means that the 8% was a relative risk reduction.
  • His own calculations reveals an absolute risk reduction of 1.2% and, by performing a simple sensitivity analysis—by adding 5 cancers and then 10 cancers to the placebo group—the p-value changes to 0.0768 and 0.0967, demonstrating that small changes have a big impact on the p-value.

He concludes that, “…without spin, we see that multivitamins (and other supplements) create both expensive urine and expensive studies – and both just go right down the drain.”

A reminder that, if the results had indeed been clinically meaningful, then the next step would be to perform a critical appraisal to determine if the study were valid or not.


[1] http://medicalevidence.blogspot.com/2012/10/a-centrum-day-keeps-cancer-at-bay.html accessed 10/25/12.

[2] Gaziano JM et al. Multivitamins in the Prevention of Cancer in Men The Physicians’ Health Study II Randomized Controlled Trial. JAMA. 2012;308(18):doi:10.1001/jama.2012.14641.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Early Termination of Clinical Trials—2012 Update


Early Termination of Clinical Trials—2012 Update

Several years ago we presented the increasing evidence of problems with early termination of clinical trials for benefit after interim analyses.[1] The bottom line is that results are very likely to be distorted because of chance findings.  A useful review of this topic has been recently published.[2] Briefly, this review points out that—

  • Frequently trials stopped early for benefit report results that are not credible, e.g., in one review, relative risk reductions were over 47% in half, over 70% in a quarter. The apparent overestimates were larger in smaller trials.
  • Stopping trials early for apparent benefit is highly likely to systematically overestimate treatment effects.
  • Large overestimates were common when the total number of events was less than 200.
  • Smaller but important overestimates are likely with 200 to 500 events, and trials with over 500 events are likely to show small overestimates.
  • Stopping rules do not appear to ensure protection against distortion of results.
  • Despite the fact that stopped trials may report chance findings that overestimate true effect sizes—especially when based on a small number of events—positive results receive significant attention and can bias clinical practice, clinical guidelines and subsequent systematic reviews.
  • Trials stopped early reduce opportunities to find potential harms.

The authors provide 3 examples to illustrate the above points where harm is likely to have occurred to patients.

Case 1 is the use of preoperative beta blockers in non-cardiac surgery in 1999 a clinical trial of bisoprolol in patients with vascular disease having non-cardiac surgery with a planned sample size of 266 stopped early after enrolling 112 patients—with 20 events. Two of 59 patients in the bisoprolol group and 18 of 53 in the control group had experienced a composite endpoint event (cardiac death or myocardial infarction). The authors reported a 91% reduction in relative risk for this endpoint, 95% confidence interval (63% to 98%). In 2002, a ACC/AHA clinical practice guideline recommended perioperative use of beta blockers for this population. In 2008, a systematic review and meta-analysis, including over 12,000 patients having non-cardiac surgery, reported a 35% reduction in the odds of non-fatal myocardial infarction, 95% CI (21% to 46%), a twofold increase in non-fatal strokes, odds ratio 2.1, 95% CI (2.7 to 3.68), and a possible increase in all-cause mortality, odds ratio 1.20, 95% CI (0.95 to 1.51). Despite the results of this good quality systematic review, subsequent guidelines published in 2009 and 2012 continue to recommend beta blockers.

Case 2 is the use of Intensive insulin therapy (IIT) in critically ill patients. In 2001, a single center randomized trial of IIT in critically ill patients with raised serum glucose reported a 42% relative risk reduction in mortality, 95% CI (22% to 62%). The authors used a liberal stopping threshold (P=0.01) and took frequent looks at the data, strategies they said were “designed to allow early termination of the study.” Results were rapidly incorporated into guidelines, e.g., American College Endocrinology practice guidelines, with recommendations for an upper limit of glucose of </=8.3 mmol/L. A systematic review published in 2008 summarized the results of subsequent studies which did not confirm lower mortality with IIT and documented an increased risk of hypoglycemia.  Later, a good quality SR confirmed these later findings. Nevertheless, some guideline groups continue to advocate limits of </=8.3 mmol/L. Other guidelines utilizing the results of more recent studies, recommend a range of 7.8-10 mmol/L.15.

Case 3 is the use of  activated protein C in critically ill patients with sepsis. The original 2001 trial of recombinant human activated protein C (rhAPC) was stopped early after the second interim analysis because of an apparent difference in mortality. In 2004, the Surviving Sepsis Campaign, a global initiative to improve management, recommended use of the drug as part of a “bundle” of interventions in sepsis. A subsequent trial, published in 2005, reinforced previous concerns from studies reporting increased risk of bleeding with rhAPC and raised questions about the apparent mortality reduction in the original study. As of 2007, trials had failed to replicate the favorable results reported in the pivotal Recombinant Human Activated Protein C Worldwide Evaluation in Severe Sepsis (PROWESS) study. Nevertheless, the 2008 iteration of the Surviving Sepsis guidelines and another guideline in 2009 continued to recommend rhAPC. Finally, after further discouraging trial results, Eli Lilly withdrew the drug, activated drotrecogin alfa (Xigris) from the market 2011.

Key points about trials terminated early for benefit:

  • Truncated trials are likely to overestimate benefits.
  • Results should be confirmed in other studies.
  • Maintain a high level of scepticism regarding the findings of trials stopped early for benefit, particularly when those trials are relatively small and replication is limited or absent.
  • Stopping rules do not protect against overestimation of benefits.
  • Stringent criteria for stopping for benefit would include not stopping before approximately 500 events have accumulated.


1. http://www.delfini.org/delfiniClick_PrimaryStudies.htm#truncation

2. Guyatt GH, Briel M, Glasziou P, Bassler D, Montori VM. Problems of stopping trials early. BMJ. 2012 Jun 15;344:e3863. doi: 10.1136/bmj.e3863. PMID:22705814.

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

NNT from RR and OR


Obtaining Absolute Risk Reduction (ARR) and Number Needed To Treat (NNT) From Relative Risk (RR) and Odds Ratios (OR) Reported in Systematic Reviews

Estimates of effect in meta-analyses can be expressed as either relative effects or absolute effects. Relative risks (aka risk ratios) and odds ratios are relative measures. Absolute risk reduction (aka risk difference) and number-needed-to-treat are absolute measures.  When reviewing meta-analyses, readers will almost always see results (usually mean differences between groups) presented as relative risks or odds ratios. The reason for this is that relative risks are considered to be the most consistent statistic for study results combined from multiple studies. Meta-analysts usually avoid performing meta-analyses using absolute differences for this reason.

Fortunately we are now seeing more meta-analyses reporting both the relative risks along with ARR and NNT. The key point is that meta-analyses almost always use relative effect measures (relative risk or odds ratio) and then (hopefully) re-express the results using absolute effect measures (ARR or NNT).

You may see the term, “assumed control group risk” or “assumed control risk” (ACR).   This frequently refers to risk in a control group or subgroup of patients in a meta-analysis, but could also refer to risk in any group (i.e., patients not receiving the study intervention) being compared to an intervention group.

The Cochrane Handbook now recommends that meta-analysts provide a summary table for the main outcome and that the table include the following items—

  • The topic, population, intervention and comparison
  • The assumed risk and corresponding risk (i.e., those receiving the intervention)
  • Relative effect statistic (RR or OR)

When RR is provided, ARR can easily be calculated. Odds ratios deal with odds and not probabilities and, therefore, cannot be converted to ARR with accuracy because odds cannot account for a number within a population—only how many with, for example, as compared to how many without.  For more on “odds,” see— http://www.delfini.org/page_Glossary.htm#odds

Example 1: Antihypertensive drug therapy compared to control in elderly (60 years or older) for hypertension in the elderly

Reference: Musini VM, Tejani AM, Bassett K, Wright JM. Pharmacotherapy for hypertension in the elderly. Cochrane Database Syst Rev. 2009 Oct 7;(4):CD000028. Review. PubMed PMID: 19821263.

  • Computing ARR and NNT from Relative Risk
    When RR is reported in a meta-analysis, determine (this is a judgment) the assumed control risk (ACR)—i.e., the risk in the group being compared to the new intervention—from the control event rate or other data/source
  • Formula: ARR=100 X ACR X (1-RR)

Calculating the ARR and NNT from the Musini Meta-analysis

  • In the above meta-analysis of 12 RCTs in elderly patients with moderate hypertension, the RR for overall mortality with treatment compared to no treatment over 4.5 years was 0.90.
  • The event rate  (ACR) in the control group was 116 per 1000 or 0.116
  • ARR=100 X .116 X 0.01=1.16%
  • NNT=100/1.16=87
  • Interpretation: The relative risk with treatment compared to usual care is 90% of the control group (in this case the group of elderly patients not receiving treatment for hypertension) which translates into 1 to 2 fewer deaths per 100 treated patients over 4.5 years with treatment. In other words you would need to treat 87 elderly hypertensive people at moderate risk with antihypertensives for 4.5 years to prevent one death.

Computing ARR and NNT from Odds Ratios

In some older meta-analyses you may not be given the assumed (ACR) risk.

Example 2: Oncology Agent

Assume a meta-analysis on an oncology agent reports an estimate of effect (mortality) as an OR of 0.8 over 3 years for a new drug. In order to do the calculation, an ACR is required.  Hopefully this information will be provided in the study. If not, the reader will have to obtain the assumed control group risk (ACR) from other studies or another source. Let’s assume that the control risk in this example is 0.3.

Formula for converting OR to ARR: ARR=100 X (ACR-OR X ACR) / (1-ACR+OR X ACR)

  • ARR=100 X (0.3-0.8 X 0.3) /  (1-0.3 + 0.8 X 0.3)
  • In this example:
  • ARR=100 X (0.3-0.24) / (0.7 + 0.28)
  • ARR= 0.06/0.98
  • ARR=0.061 or 6.1%
  • Thus the ARR is 6.1% over 3 years.
  • The NNT to benefit one patient over 3 years is 100/6.1 (rounded) is 17.

Because of the limitations of odds ratios, as described above, it should be noted that when outcomes occur commonly (e.g., >5%), odds ratios may then overestimate the effect of a treatment.

For more information see The Cochrane Handbook, Part 2, Chapter 12.5.4 available at http://www.cochrane-handbook.org/

Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Critical Appraisal Matters


Critical Appraisal Matters

Most of us know that there is much variation in healthcare that is not explained by patient preference, differences in disease incidence or resource availability. We think that many of the healthcare quality problems with overuse, underuse, misuse, waste, patient harms and more stems from a broad lack of understanding by healthcare decision-makers about  what constitutes solid clinical research.

We think it’s worth visiting (or revisiting) our webpage on “Why Critical Appraisal Matters.”


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Loss to Follow-up Update


Loss to Follow-up Update
Heads up about an important systematic review of the effects of attrition on outcomes of randomized controlled trials (RCTs) that was recently published in the BMJ.[1]


  • Key Question: Would the outcomes of the trial change significantly if all persons had completed the study, and we had complete information on them?
  • Loss to follow-up in RCTs is important because it can bias study results if the balance between study groups that was established through randomization is disrupted in key prognostic variables that would otherwise result in different outcomes.  If there is no imbalance between and within various study subgroups (i.e., as randomized groups compared to completers), then loss to follow-up may not present a threat to validity, except in instances in which statistical significance is not reached because of decreased power.

BMJ Study
The aim of this review was to assess the reporting, extent and handling of loss to follow-up and its potential impact on the estimates of the effect of treatment in RCTs. The investigators evaluated 235 RCTs published between 2005 through 2007 in the five general medical journals with the highest impact factors: Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine. All eligible studies reported a significant (P<0.05) primary patient-important outcome.

The investigators did several sensitivity analyses to evaluate the effect varying assumptions about the outcomes of participants lost to follow-up on the estimate of effect for the primary outcome.  Their analyses strategies were—

  • None of the participants lost to follow-up had the event
  • All the participants lost to follow-up had the event
  • None of those lost to follow-up in the treatment group had the event and all those lost to follow-up in the control group did (best case scenario)
  • All participants lost to follow-up in the treatment group had the event and none of those in the control group did (worst case scenario)
  • More plausible assumptions using various event rates which the authors call the “the event incidence:” The investigators performed sensitivity analyses using what they considered to be plausible ratios of event rates in the dropouts compared to the completers using ratios of 1, 1.5, 2, 3.5 in the intervention group compared to the control group (see Appendix 2 at the link at the end of this post below the reference). They chose an upper limit of 5 times as many dropouts for the intervention group as it represents the highest ratio reported in the literature.

Key Findings

  • Of the 235 eligible studies, 31 (13%) did not report whether or not loss to follow-up occurred.
  • In studies reporting the relevant information, the median percentage of participants lost to follow-up was 6% (interquartile range 2-14%).
  • The method by which loss to follow-up was handled was unclear in 37 studies (19%); the most commonly used method was survival analysis (66, 35%).
  • When the investigators varied assumptions about loss to follow-up, results of 19% of trials were no longer significant if they assumed no participants lost to follow-up had the event of interest, 17% if they assumed that all participants lost to follow-up had the event, and 58% if they assumed a worst case scenario (all participants lost to follow-up in the treatment group and none of those in the control group had the event).
  • Under more plausible assumptions, in which the incidence of events in those lost to follow-up relative to those followed-up was higher in the intervention than control group, 0% to 33% of trials—depending upon which plausible assumptions were used (see Appendix 2 at the link at the end of this post below the reference)— lost statistically significant differences in important endpoints.

When plausible assumptions are made about the outcomes of participants lost to follow-up in RCTs, this study reports that up to a third of positive findings in RCTs lose statistical significance. The authors recommend that authors of individual RCTs and of systematic reviews test their results against various reasonable assumptions (sensitivity analyses). Only when the results are robust with all reasonable assumptions should inferences from those study results be used by readers.

For more information see the Delfini white paper  on “missingness” at http://www.delfini.org/Delfini_WhitePaper_MissingData.pdf


1. Akl EA, Briel M, You JJ et al. Potential impact on estimated treatment effects of information lost to follow-up in randomised controlled trials (LOST-IT): systematic review BMJ 2012;344:e2809 doi: 10.1136/bmj.e2809 (Published 18 May 2012). PMID: 19519891

Article is freely available at—


Supplementary information is available at—


For sensitivity analysis results tables, see Appendix 2 at—


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email

Adjusting for Multiple Comparisons


Adjusting for Multiple Comparisons

Frequently studies report results that are not the primary or secondary outcome measures—sometimes because the finding is not anticipated, is unusual or judged to be important by the authors. How should these findings be assessed? A common belief is that if outcomes are not pre-specified, serious attention to them is not warranted. But is this the case? Kenneth J. Rothman in 1990 wrote an article that we feel is very helpful in such situations.[1]

  • Rothman points out that making statistical adjustments for multiple comparisons is similar to the problem of statistical significance testing where the investigator uses the P-value to estimate the probability of a study demonstrating an effect size as great or greater than the one found in the study, given that the null hypothesis is true—i.e., that there is truly no difference between the groups being studied (with alpha as the arbitrary cutoff for clinical significance which is frequently set at 5%).  Obviously if the risk for rejecting a truly null hypothesis is 5% for every hypothesis examined, then examining multiple hypotheses will generate a larger number of falsely positive statistically significant findings because of the increasing number of hypotheses examined.
  • Adjusting for multiple comparisons is thought by many to be desirable because it will result in a smaller probability of erroneously rejecting the null hypothesis. Rothman argues this “pay for peeking” at more data by adjusting P-values with multiple comparisons is unnecessary and can be misleading. Adjusting for multiple comparisons might be paying a penalty for simply appropriately doing more comparisons, and there is no logical reason (or good evidence) for doing statistical adjusting. Rather, the burden is on those who advocate for multiple comparison adjustments to show there is a problem requiring a statistical fix.
  • Rothman’s  conclusion: It is reasonable to consider each association on its own for the information it conveys—he believes that there is no need for adjusting P-values with multiple comparisons.

Delfini Comment: Reading his paper is a bit difficult, but he make some good points about our not really understanding what chance is all about and that evaluating study outcomes for validity requires critical appraisal for the assessment of bias and other factors as well as the use of statistics for evaluating chance effects.


Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology.  1990 Jan;1(1):43-6. PubMed PMID: 2081237.


Facebook Twitter Linkedin Digg Delicious Reddit Stumbleupon Tumblr Email