## Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence

### Status

Why Statements About Confidence Intervals Often Result in Confusion Rather Than Confidence

A recent paper by McCormack reminds us that authors may mislead readers by making unwarranted “all-or-none” statements and that readers should be mindful of this and carefully examine confidence intervals.

When examining results of a valid study, confidence intervals (CIs) provide much more information than p-values. The results are statistically significant if a confidence interval does not touch the line of no difference (zero in the case of measures of outcomes expressed as percentages such as absolute risk reduction and relative risk reduction and 1 in the case of ratios such as relative risk and odds ratios). However, in addition to providing information about statistical significance, confidence intervals also provide a plausible range for possibly true results within a margin of chance (5 percent in the case of a 95% CI). While the actual calculated outcome (i.e., the point estimate) is “the most likely to be true” result within the confidence interval, having this range enables readers to judge, in their opinion, if statistically significant results are clinically meaningful.

However, as McCormack points out, authors frequently do not provide useful interpretation of the confidence intervals, and authors at times report different conclusions from similar data. McCormack presents several cases that illustrate this problem, and this paper is worth reading.

As an illustration, assume two hypothetical studies report very similar results. In the first study of drug A versus drug B, the relative risk for mortality was 0.9, 95% CI (0.80 to 1.05). The authors might state that there was no difference in mortality between the two drugs because the difference is not statistically significant. However, the upper confidence interval is close to the line of no difference and so the confidence interval tells us that it is possible that a difference would have been found if more people were studied, so that statement is misleading. A better statement for the first study would include the confidence intervals and a neutral interpretation of what the results for mortality might mean. Example—

“The relative risk for overall mortality with drug A compared to placebo was 0.9, 95% CI (0.80 to 1.05). The confidence intervals tell us that Drug A may reduce mortality by up to a relative 20% (i.e., the relative risk reduction), but may increase mortality, compared to Drug B, by approximately 5%.”

In a second study with similar populations and interventions, the relative risk for mortality might be 0.93, 95% CI (0.83 to 0.99). In this case, some authors might state, “Drug A reduces mortality.” A better statement for this second hypothetical study would ensure that the reader knows that the upper confidence interval is close to the line of no difference and, therefore, is close to non-significance. Example—

“Although the mortality difference is statistically significant, the confidence interval indicates that the relative risk reduction may be as great as 17% but may be as small as 1%.”

The Bottom Line

1. Remember that p-values refer only to statistical significance and confidence intervals are needed to evaluate clinical significance.
2. Watch out for statements containing the words “no difference” in the reporting of study results. A finding of no statistically significant difference may be a product of too few people studied (or insufficient time).
3. Watch out for statements implying meaningful differences between groups when one of the confidence intervals approaches the line of no difference.
4. None of this means anything unless the study is valid. Remember that bias tends to favor the intervention under study.

If authors do not provide you with confidence intervals, you may be able to compute them yourself, if they have supplied you with sufficient data, using an online confidence interval calculator. For our favorites, search “confidence intervals” at our web links page: http://www.delfini.org/delfiniWebSources.htm

Reference

McCormack J, Vandermeer B, Allan GM. How confidence intervals become confusion intervals. BMC Med Res Methodol. 2013 Oct 31;13(1):134. [Epub ahead of print] PubMed PMID: 24172248.

## Biostatistical Help for Critical Appraisers

### Status

Book Recommendation: Biostatistics for Dummies by John C. Pezzullo, PhD

We highly recommend this book.  In short—

• An excellent resource
• Useful to critical appraisers because it can help us understand why certain common statistical tests are used in studies
• Provides a needed resource for answering questions about various tests
• Written in a clear style with the goal of making difficult information accessible and understandable
• Friendly style due to author’s wit and charm, and the reassurance he provides along the way

Read our full review here. Go to Amazon page and full customer reviews here.

## Time-related Biases

### Status

Time-related Biases Including Immortality Bias

We were recently asked about the term “immortality bias.” The easiest way to explain immortality bias is to start with an example.  Imagine a study of hospitalized COPD patients undertaken to assess the impact of drug A, an inhaled corticosteroid preparation, on survival.  In our first example, people are randomized to receive a prescription to drug A post-discharge or not to receive a prescription. If someone in group A dies prior to filling their prescription, they should be analyzed as randomized and, therefore, they should be counted as a death in the drug A group even though they were never actually exposed to drug A.

Let’s imagine that drug A confers no survival advantage and that mortality for this population is 10 percent.  In a study population of 1,000 patients in each group, we would expect 100 deaths in each group. Let us say that 10 people in the drug A group died before they could receive their medication. If we did not analyze the unexposed people who died in group A as randomized, that would be 90 drug A deaths as compared to 100 comparison group deaths—making it falsely appear that drug A resulted in a survival advantage.

If drug A actually works, the time that patients are not exposed to the drug works a little against the intervention (oh, yes, and do people actually take their drug?), but as bias tends to favor the intervention, this probably evens up the playing field a bit—there is a reason why we talk about “closeness to truth” and “estimates of effect.”

“Immortality bias” is a risk in studies when there is a time period (the “immortal” or the “immune” time when the outcome is other than survival) in which patients in one group cannot experience an event.  Setting aside the myriad other biases that can plague observational studies, such as the potential for confounding through choice of treatment, to illustrate this, let us compare our randomized controlled trial (RCT) that we just described to a retrospective cohort study to study the same thing. In the observational study, we have to pick a time to start observing patients, and it is no longer randomly decided how patients are grouped for analysis, so we have to make a choice about that too.

For our example, let us say we are going to start the clock on recording outcomes (death) beginning at the date of discharge. Patients are then grouped for analysis by whether or not they filled a prescription for drug A within 90 days of discharge.  Because “being alive” is a requirement for picking up prescription, but not for the comparison group, the drug A group potentially receives a “survival advantage” if this bias isn’t taken into account in some way in the analysis.

In other words, by design, no deaths can occur in the drug A group prior to picking up a prescription.  However, in the comparison group, death never gets an opportunity to “take a holiday” as it were.  If you die before getting a prescription, you are automatically counted in the comparison group.  If you live and pick up your prescription, you are automatically counted in the drug A group.  So the outcome of “being alive” is a prerequisite to being in the drug A group. Therefore, all deaths of people not filling a prescription that occur prior to that 90 day window get counted in the comparison group.   And so yet another example of how groups being different or being treated differently other than what is being studied can bias outcomes.

Many readers will recognize the similarity between immortality bias and lead time bias. Lead time bias occurs when earlier detection of a disease, because of screening, makes it appear that the screening has conferred a survival advantage—when, in fact, the “greater length of time survived” is really an artifact resulting from the additional time counted between disease identification and when it would have been found if no screening had taken place.

Another instance where a time-dependent bias can occur is in oncology studies when intermediate markers (e.g., tumor recurrence) are assessed at the end of follow-up segments using Kaplan-Meier methodology. Recurrence may have occurred in some subjects at the beginning of the time segment rather than at the end of a time segment.

It is always good to ask if, in the course of the study, could the passing of time have had a resulting impact on any outcomes?

Other Examples —

• Might the population under study have significantly changed during the course of the trial?
• Might the time period of the study affect study results (e.g., studying an allergy medication, but not during allergy season)?
• Could awareness of adverse events affect future reporting of adverse events?
• Could test timing or a gap in testing result in misleading outcomes (e.g., in studies comparing one test to another, might discrepancies have arisen in test results if patients’ status changed in between applying the two tests)?

All of these time-dependent biases can distort study results.

## Patient Years

### Status

What are Patient-Years?

A participant at one of our recent conferences asked a good question—“What are patient-years?”

“Person-years” is a statistic for expressing incidence rates—it is the summing of the results of events divided by time. In many studies, the length of exposure to the treatment is different for different subjects, and the patient-year statistic is one way of dealing with this issue.

The calculation of events per patient-year(s) is the number of incident cases divided by the amount of person-time at risk. The calculation can be accomplished by adding the number of patients in the group and multiplying that number times the years that patients are in a study in order to calculate the patient-years (denominator). Then divide the number of events (numerator) by the denominator.

• Example: 100 patients are followed for 2 years. In this case, there are 200 patient-years of follow-up.
• If there were 8 myocardial infarctions in the group, the rate would be 8 MIs per 200 patient years or 4 MIs per 100 patient-years.

The rate can be expressed in various ways, e.g., per 100, 1,000, 100,000, or 1 million patient-years. In some cases, authors report the average follow-up period as the mean and others use the median, which may result in some variation in results between studies.

Another example: Assume we have a study reporting one event at 1 year and one event at 4 years, but no events at year 2 and 3. This same information can be expressed as 2 events/10 (1+2+3+4=10) years or an event rate of 0.2 per person-year.

An important issue is that frequently the timeframe for observation in studies reporting patient-years does not match the timeframe stated in the study. Brian Alper of Dynamed explains it this way: “If I observed a million people for 5 minutes each and nobody died, any conclusion about mortality over 1 year would be meaningless. This problem occurs whether or not we translate our outcome into a patient-years measure. The key in critical appraisal is to catch the discrepancy between timeframe of observation and timeframe of conclusion and not let the use of ‘patient-years’ mistranslate between the two or represent an inappropriate extrapolation.”[1]

References

1. Personal communication 9/3/13 with Brian S. Alper, MD, MSPH, FAAFP, Editor-in-Chief, DynaMed, Medical Director, EBSCO Information Services.

## Can Clinical Guidelines be Trusted?

### Status

Can Clinical Guidelines be Trusted?

In a recent BMJ article, “Why we can’t trust clinical guidelines,” Jeanne Lenzer raises a number of concerns regarding clinical guidelines[1]. She begins by summarizing the conflict between 1990 guidelines recommending steroids for acute spinal injury versus 2013 cllinical recommendations against using steroids in acute spinal injury. She then asks, “Why do processes intended to prevent or reduce bias fail?

Her proposed answers to this question include the following—

• Many doctors follow guidelines, even if not convinced about the recommendations, because they fear professional censure and possible harm to their careers.
• Supporting this, she cites a poll of over 1000 neurosurgeons which showed that—
• Only 11% believed the treatment was safe and effective.
• Only 6% thought it should be a standard of care.
• Yet when asked if they would continue prescribing the treatment, 60% said that they would. Many cited a fear of malpractice if they failed to follow “a standard of care.” (Note: the standard of care changed in March 2013 when the Congress of Neurological Surgeons stated there was no high quality evidence to support the recommendation.)
• Clinical guideline chairs and participants frequently have financial conflicts.
• The Cochrane reviewer for the 1990 guideline she references had strong ties to industry.

Delfini Comment

• Fear-based Decision-making by Physicians

We believe this is a reality. In our work with administrative law judges, we have been told that if you “run with the pack,” you better be right, and if you “run outside the pack,” you really better be right. And what happens in court is not necessarily true or just. The solution is better recommendations constructed from individualized, thoughtful decisions based on valid critically appraised evidence found to be clinically useful, patient preferences and other factors. The important starting place is effective critical appraisal of the evidence.

• Financial Conflicts of Interest & Industry Influence

It is certainly true that money can sway decisions, be it coming from industry support or potential for income. However, we think that most doctors want to do their best for patients and try to make decisions or provide recommendations with the patient’s best interest in mind. Therefore, we think this latter issue may be more complex and strongly affected in both instances by the large number of physicians and others involved in health care decision-making who 1) do not understand that many research studies are not valid or reported sufficiently to tell; and, 2) lack the skills to be able to differentiate reliable studies from those which may not be reliable.

When it comes to industry support, one of the variables traveling with money includes greater exposure to information through data or contacts with experts supporting that manufacturer’s products. We suspect that industry influence may be less due to financial incentives than this exposure coupled with lack of critical appraisal understanding. As such, we wrote a Letter to the Editor describing our theory that the major problem of low quality guidelines might stem from physicians’ and others’ lack of competency in evaluating the quality of the evidence. Our response is reproduced here.

Delfini BMJ Rapid Response [2]:

We (Delfini) believe that we have some unique insight into how ties to industry may result in advocacy for a particular intervention due to our extensive experience training health care professionals and students in critical appraisal of the medical literature. We think it is very possible that the outcomes Lenzer describes are less due to financial influence than are due to lack of knowledge. The vast majority of physicians and other health care professionals do not have even rudimentary skills in identifying science that is at high to medium risk of bias or understand when results may have a high likelihood of being due to chance. Having ties to industry would likely result in greater exposure to science supporting a particular intervention.

Without the ability to evaluate the quality of the science, we think it is likely that individuals would be swayed and/or convinced by that science. The remedy for this and for other problems with the quality of clinical guidelines is ensuring that all guideline development members have basic critical appraisal skills and there is enough transparency in guidelines so that appraisal of a guideline and the studies utilized can easily be accomplished.

References

1. Lenzer J. Why we can’t trust clinical guidelines. BMJ 2013; 346:f3830

2. Strite SA, Stuart M. BMJ Rapid Response: Why we can’t trust clinical guidelines. BMJ 2013;346:f3830; http://www.bmj.com/content/346/bmj.f3830/rr/651876

## Webinar: “Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities”

### Status

“Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities”

On Monday, May 20, 2013, we presented a webinar on “Using Real-World Data & Published Evidence in Pharmacy Quality Improvement Activities” for the member organizations of the Alliance of Community Health Plans (ACHP).

The 80-minute discussion addressed four topic areas, all of which have unique critical appraisal challenges. Webinar goals were to discuss issues that arise when conducting quality improvement efforts using real world data, such as data from claims, surveys and observational studies and other published healthcare evidence.

Key pitfalls were cherry picked for these four mini-seminars—

• Pitfalls to avoid when using real-world data, dealing with heterogeneity, confounding-by-indication and causality.
• Key issues in evaluating oncology studies — outcome issues and focus on how to address large attrition rates.
• Important issues when conducting comparative safety reviews — assessing patterns through use of RCTs, systematic reviews, observational studies and registries.
• Key issues in evaluating studies employing Kaplan-Meier estimates — time-to-event basics with attention to the important problem of censoring.

A recording of the webinar is available at—

https://achp.webex.com/achp/lsr.php?AT=pb&SP=TC&rID=45261732&rKey=1475c8c3abed8061&act=pb

## Review of Endocrinology Guidelines

### Status

Review of Endocrinology Guidelines

Decision-makers frequently rely on the body of pertinent research in making decisions regarding clinical management decisions. The goal is to critically appraise and synthesize the evidence before making recommendations, developing protocols and making other decisions. Serious attention is paid to the validity of the primary studies to determine reliability before accepting them into the review.  Brito and colleagues have described the rigor of systematic reviews (SRs) cited from 2006 until January 2012 in support of the clinical practice guidelines put forth by the Endocrine Society using the Assessment of Multiple Systematic Reviews (AMSTAR) tool [1].

The authors included 69 of 2817 studies. These 69 SRs had a mean AMSTAR score of 6.4 (standard deviation, 2.5) of a maximum score of 11, with scores improving over time. Thirty five percent of the included SRs were of low-quality (methodological AMSTAR score 1 or 2 of 5, and were cited in 24 different recommendations). These low quality SRs were the main evidentiary support for five recommendations, of which only one acknowledged the quality of SRs.

The authors conclude that few recommendations in field of endocrinology are supported by reliable SRs and that the quality of the endocrinology SRs is suboptimal and is currently not being addressed by guideline developers. SRs should reliably represent the body of relevant evidence.  The authors urge authors and journal editors to pay attention to bias and adequate reporting.

Delfini note: Once again we see a review of guideline work which suggests using caution in accepting clinical recommendations without critical appraisal of the evidence and knowing the strength of the evidence supporting clinical recommendations.

1. Brito JP, Tsapas A, Griebeler ML, Wang Z, Prutsky GJ, Domecq JP, Murad MH, Montori VM. Systematic reviews supporting practice guideline recommendations lack protection against bias. J Clin Epidemiol. 2013 Jun;66(6):633-8. doi: 10.1016/j.jclinepi.2013.01.008. Epub 2013 Mar 16. PubMed PMID: 23510557.

## Review of Bias In Diabetes Randomized Controlled Trials

### Status

Review of Bias In Diabetes Randomized Controlled Trials

Healthcare professionals must evaluate the internal validity of randomized controlled trials (RCTs) as a first step in the process of considering the application of clinical findings (results) for particular patients. Bias has been repeatedly shown to increase the likelihood of distorted study results, frequently favoring the intervention.

Readers may be interested in a new systematic review of diabetes RCTs. Risk of bias (low, unclear or high) was assessed in 142 trials using the Cochrane Risk of Bias Tool.  Overall, 69 trials (49%) had at least one out of seven domains with high risk of bias. Inadequate reporting frequently hampered the risk of bias assessment: the method of producing the allocation sequence was unclear in 82 trials (58%) and allocation concealment was unclear in 78 trials (55%). There were no significant reductions in the proportion of studies at high risk of bias over time nor in the adequacy of reporting of risk of bias domains. The authors conclude that these trials have serious limitations that put the findings in question and therefore inhibit evidence-based quality improvement (QI). There is a need to limit the potential for bias when conducting QI trials and improve the quality of reporting of QI trials so that stakeholders have adequate evidence for implementation. The entire freely-available study is available at—

http://bmjopen.bmj.com/content/3/4/e002727.long

Ivers NM, Tricco AC, Taljaard M, Halperin I, Turner L, Moher D, Grimshaw JM. Quality improvement needed in quality improvement randomised trials: systematic review of interventions to improve care in diabetes. BMJ Open. 2013 Apr 9;3(4). doi:pii: e002727. 10.1136/bmjopen-2013-002727. Print 2013. PubMed PMID: 23576000.

## G-I-N Webinar: Guideline Development & Evidence-based Quality Improvement

### Status

Guidelines International Network Webinar: How to Develop Guidelines Within the Context of a Clinical Quality Improvement Program

Thanks to the Guidelines International Network, a webinar we did for them is available online.  To access the recording and slide show presentation, go to—

http://www.g-i-n.net/activities/g-i-n-na/g-i-n-na-events-activities/webinar-series/delfini

For information about the case study we showcased for our presentation, go to—

http://www.delfini.org/Showcase_Project_VTE.htm

## Critical Appraisal Tool for Clinical Guidelines & Other Secondary Sources

### Status

Critical Appraisal Tool for Clinical Guidelines & Other Secondary Sources

Everything citing medical science should be appraised for validity and clinical usefulness. That includes clinical guidelines and other secondary sources. Our tool for evaluating these resources— the Delfini QI Project Appraisal Tool—has been updated and is available in the Delfini Tools & Educational Library at www.delfini.org.  For quick access to the PDF version, go to—

http://www.delfini.org/delfiniNew.htm