Evidence-based Medicine ClickEBM Mike Stuart MDEBM Sheri Strite

A Cool Click for Evidence-based Medicine (EBM) and Evidence-based Practice (EBP) Commentaries & Health Care Quality Improvement Nibblets

The EBM Information Quest: Is it true? Is it useful? Is it usable?™


Valdity Detectives: Michael E Stuart MD, President & Medical Director . Sheri Ann Strite, Managing Director & Principal

Quick Picks

Delfini: Dr. Michael E. Stuart & Sheri Ann Strite
Why Critical Appraisal Matters



Delfini Group Publishing

Contact Us
Updates & Contact Info

Free Online Tools

Free Online Tutorial

Delfini Blog

EBM Dolphin
Evidence & Quality Improvement Commentaries


Follow & Share...

Just-in-time UpdatesFollow Delfini Group on Twitter

Like Us Like Us on Facebook  Find UsFind Us at LinkedIn

DelfiniGram™: GET ON OUR UPDATE LIST Contact Us

Volume — Quality of Evidence:
Observational Studies

02/28/2016: Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies


Go to DelfiniClick™ for all volumes.Delfini Group EBM DolphinDelfini Group EBM Dolphin

Comparative Effectiveness Research (CER), “Big Data” & Causality

For a number of years now, we've been concerned that the CER movement and the growing love affair with "big data," will lead to many erroneous conclusions about cause and effect.  We were pleased to see the following blog from Austin Frakt, an editor-in-chief of The Incidental Economist: Contemplating health care with a focus on research, an eye on reform

Ten impressions of big data: Claims, aspirations, hardly any causal inference



Five more big data quotes: The ambitions and challenges


Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. 

Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies

We have previously mentioned the enormous contributions made by John Ioannidis MD in the area of understanding the reliability of medical evidence. [Ioannidis, Delfini Blog, Giannakakis] We want to draw your attention to a recent publication dealing with the risks of relying on observational data for cause and effect conclusions. [Hemkens] In this recent study, Hemkens, Ioannidis and other colleagues assessed differences in mortality effect size reported in observational (routinely collected data [RCD]) studies as compared with results reported in RCTs.

Eligible RCD studies used propensity scores in an effort to address confounding bias in the observational studies. The authors  compared the results of RCD and RCTs. The analysis included only RCD studies conducted before any RCT was published on the same topic. They assessed the risk of bias for RCD studies and randomized controlled trials (RCTs) using The Cochrane Collaboration risk of bias tools.  The direction of treatment effects, confidence intervals and effect sizes (odds ratios) were compared between RCD studies and RCTs. The relative odds ratios were calculated across all pairs of RCD studies and trials.

The authors found that RCD studies systematically and substantially overestimated mortality benefits of medical treatments compared with subsequent trials investigating the same question. Overall, RCD studies reported significantly more favorable mortality estimates by a relative 31% than subsequent trials (summary relative odds ratio 1.31 (95% confidence interval 1.03 to 1.65; I2 (I square)=0%)).

These authors remind us yet again that If no randomized trials exist, clinicians and other decision-makers should not trust results from observational data from sources such as local or national databases, registries, cohort or case-control studies. 

Delfini Blog: http://delfini.org/blog/?p=292

Giannakakis IA, Haidich AB, Contopoulos-Ioannidis DG, Papanikolaou GN, Baltogianni MS, Ioannidis JP. Citation of randomized evidence in support of guidelines of therapeutic and preventive interventions. J Clin Epidemiol. 2002 Jun;55(6):545-55. PubMed PMID: 12063096.

Hemkens LG, Contopoulos-Ioannidis DG, Ioannidis JP. Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey. BMJ. 2016 Feb 8;352:i493. doi: 10.1136/bmj.i493. PubMed PMID: 26858277.

Ioannidis JPA. Why Most Published Research Findings are False. PLoS Med 2005; 2(8):696-701 PMID: 16060722

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. 

Cochrane Risk Of Bias Tool For Non-Randomized Studies

Like many others, our position is that, with very few exceptions, cause and effect conclusions regarding therapeutic interventions can only be drawn when valid RCT data exists. However, there are uses for observational studies which may be used to answer additional questions, and non-randomized studies (NRS) are often included in systematic reviews.

In September 2014, Cochrane published a tool for assessing bias in NRS for systematic review authors [1]. It may be of interest to our colleagues. The tool is called ACROBAT-NRSI (“A Cochrane Risk Of Bias Assessment Tool for Non-Randomized Studies”) and is designed to assist with evaluating the risk of bias (RoB) in the results of NRS that compare the health effects of two or more interventions.

The tool focuses on internal validity. It covers seven domains through which bias might be introduced into a NRS. The domains provide a framework for considering any type of NRS, and are summarized in the table below, and many of the biases listed here are described and explanations of how they may cause bias are presented in the full document.


Related Terms



Bias due to confounding

Selection bias; Allocation bias; Case- mix bias; Channelling bias

Process differs from assessing bias in RCTs.

Bias in selecting participants

Inception bias; Lead-time bias;
Immortal time bias

At intervention

Bias in measurement of interventions

Misclassification bias; Information bias; Recall bias;
Measurement bias; Observer bias

Process has overlap with assessing bias in RCTs.


Bias due to departures from intended interventions

Performance bias; Time-varying confounding

Process has overlap with assessing bias in RCTs.

Bias due to missing data

Attrition bias; Selection bias as it is sometimes used in relation to observational studies

Bias in measurement of outcomes bias

Detection bias; Recall bias; Information bias;
Misclassification bias; Observer bias; Measurement bias

Bias in selection of the reported result

Outcome reporting bias; Analysis reporting bias


Response options for each bias include: low risk of bias; moderate risk of bias; serious risk of bias; critical risk of bias; and no information on which to base a judgment.

Details are available in the full document which can be downloaded at—https://sites.google.com/site/riskofbiastool/

Delfini Comment
We again point out that non-randomized studies often report seriously misleading results even when treated and control groups appear similar in prognostic variables and agree with Deeks that, for therapeutic interventions ,“non-randomised studies should only be undertaken when RCTs are infeasible or unethical”[2]—and even then, buyer beware.  Studies do not get "validity grace" because of scientific or practical challenges.

Furthermore, we are uncertain that this tool is of great value when assessing NRS.  Deeks [2] identified 194 tools that could be or had been used to assess NRS. Do we really need another one? While it’s a good document for background reading,  we are more comfortable approaching the problem of observational data by pointing out that, when it comes to efficacy, high quality RCTs have a positive predictive value of about 85% whereas well-done observational trials have a positive predictive value of about 20% [3].  


  1. Sterne JAC, Higins JPT, Reves BC on behalf of the development group for ACROBAT- NRSI. A Cochrane Risk Of Bias Asesment Tol: for Non-Randomized Studies of Interventions (ACROBAT- NRSI), Version 1.0.0, 24 September 2014. Available from htp:/www.riskofbias.info [accessed 10/11/14.
  2. Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M,  Altman DG; International Stroke Trial Collaborative Group; European Carotid Surgery Trial Collaborative Group. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii-x, 1-173. Review. PubMed PMID: 14499048.
  3. Ioannidis JPA. Why Most Published Research Findings are False. PLoS Med 2005; 2(8):696-701 PMID: 16060722.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. 

Interesting Comparative Effectiveness Research (CER) Case Study: "Real World Data" Hypothetical Migraine Case and Lack of PCORI Endorsement

In the October issue of Health Affairs, the journal’s editorial team created a fictional set of clinical trials and observational studies to see what various stakeholders would say about comparative effectiveness evidence of two migraine drugs.[1]

The hypothetical set-up is this:

The newest drug, Hemikrane, is an FDA-approved drug that has recently come on the market. It was reported in clinical trials to reduce both the frequency and the severity of migraine headaches. Hemikrane is taken once a week. The FDA approved Hemikrane based on two randomized, double-blind, controlled clinical trials, each of which had three arms.

  • In one arm, patients who experienced multiple migraine episodes each month took Hemikrane weekly.
  • In another arm, a comparable group of patients received a different migraine drug, Cephalal, a drug which was reported to be effective in earlier, valid studies. It is taken daily.
  • In a third arm, another equivalent group of patients received placebos.

The study was powered to find a difference between Hemikrane and placebo if there was one and if it were at least as effective as Cephalal. Each of the two randomized studies enrolled approximately 2,000 patients and lasted six months. They excluded patients with uncontrolled high blood pressure, diabetes, heart disease, or kidney dysfunction. The patients received their care in a number of academic centers and clinical trial sites. All patients submitted daily diaries, recording their migraine symptoms and any side effects.

Hypothetical Case Study Findings: The trials reported that the patients who took Hemikrane had a clinically significant reduction in the frequency, severity, and duration of headaches compared to placebo, but not to Cephalal.

The trials were not designed to evaluate the comparative safety of the drugs, but there were no safety signals from the Hemikrane patients, although a small number of patients on the drug experienced nausea.

Although the above studies reported efficacy of Hemikrane in a controlled environment with highly selected patients, they did not assess patient experience in a real-world setting. Does once weekly dosing improve adherence in the real world? The monthly cost of Hemikrane to insurers is $200, whereas Cephalal costs insurers $150 per month. (In this hypothetical example, the authors assume that copayments paid by patients are the same for all of these drugs.)

A major philanthropic organization with an interest in advancing treatments for migraine sufferers funded a collaboration among researchers at Harvard; a regional health insurance company, Trident Health; and, Hemikrane’s manufacturer, Aesculapion. The insurance company, Trident Health, provided access to a database of five million people, which included information on medication use, doctor visits, emergency department evaluations and hospitalizations. Using these records, the study identified a cohort of patients with migraine who made frequent visits to doctors or hospital emergency departments. The study compared information about patients receiving Hemikrane with two comparison groups: a group of patients who received the daily prophylactic regimen with Cephalal, and a group of patients receiving no prophylactic therapy.

The investigators attempted to confirm the original randomized trial results by assessing the frequency with which all patients in the study had migraine headaches. Because the database did not contain a diary of daily symptoms, which had been collected in the trials, the researchers substituted as a proxy the amount of medications such as codeine and sumatriptan (Imitrex) that patients had used each month for treatment of acute migraines. The group receiving Hemikrane had lower use of these symptom-oriented medications than those on Cephalal or on no prophylaxis and had fewer emergency department visits than those taking Cephalal or on no prophylaxis.

Although the medication costs were higher for patients taking Hemikrane because of its higher monthly drug cost, the overall episode-of-care costs were lower than for the comparison group taking Cephalal. As hypothesized, the medication adherence was higher in the once-weekly Hemikrane patients than in the daily Cephalal patients (80 percent and 50 percent, respectively, using the metric of medication possession ratio, which is the number of days of medication dispensed as a percentage of 365 days).

The investigators were concerned that the above findings might be due to the unique characteristics of Trident Health’s population of covered patients, regional practice patterns, copayment designs for medications, and/or the study’s analytic approach. They also worried that the results could be confounded by differences in the patients receiving Hemikrane, Cephalal, or no prophylaxis. One possibility, for example, was that patients who experienced the worst migraines might be more inclined to take or be encouraged by their doctors to take the new drug, Hemikrane, since they had failed all previously available therapies. In that case, the results for a truly matched group of patients might have shown even more pronounced benefit for Hemikrane.

To see if the findings could be replicated, the investigators contacted the pharmacy benefit management company, BestScripts, that worked withTrident Health, and asked for access to additional data. A research protocol was developed before any data were examined. Statistical adjustments were also made to balance the three groups of patients to be studied as well as possible—those taking Hemikrane, those taking Cephalal, and those not on prophylaxis—using a propensity score method (which included age, sex, number of previous migraine emergency department visits, type and extent of prior medication use and selected comorbidities to estimate the probability of a person’s being in one of the three groups) to balance the groups.

The pharmacy benefit manager, BestScripts, had access to data covering more than fifty million lives. The findings in this second, much larger, database corroborated the earlier assessment. The once-weekly prophylactic therapy with Hemikrane clearly reduced the use of medications such as codeine to relieve symptoms, as well as emergency department visits compared to the daily prophylaxis and no prophylaxis groups. Similarly, the Hemikrane group had significantly better medication adherence than the Cephalal group. In addition, BestScripts had data from a subset of employers that collected work loss information about their employees. These data showed that patients on Hemikrane were out of work for fewer days each month than patients taking Cephalal.

In a commentary, Joe Selby, executive director of the Patient-Centered Outcomes Research Institute (PCORI), and colleagues provided a list of problems with these real world studies including threats to validity. They conclude that these hypothetical studies would be unlikely to have been funded or communicated by PCORI.[2]

Below are several of the problems identified by Selby et al.

  • Selection Bias
    • Patients and clinicians may have tried the more familiar, less costly Cephalal first and switched to Hemikrane only if Cephalal failed to relieve symptoms, making the Hemikrane patients a group, who on average, would be more difficult to treat.
    • Those patients who continued using Cephalal may be a selected group who tolerate the treatment well and perceived a benefit.
    • Even if the investigators had conducted the study with only new users, it is plausible that patients prescribed Hemikrane could differ from those prescribed Cephalal. They may be of higher socioeconomic status, have better insurance coverage with lower copayments, have different physicians, or differ in other ways that could affect outcomes.
  • Performance Biases or Other Differences Between Groups is possible.
  • Details of any between-group differences found in these exploratory analyses should have been presented.

Delfini Comment

These two articles are worth reading if you are interested in the difficult area of evaluating observational studies and including them in comparative effectiveness research (CER). We would add that to know if drugs really work, valid RCTs are almost always needed. In this case we don’t know if the studies were valid, because we don’t have enough information about the risk of selection, performance, attrition and assessment bias and other potential methodological problems in the studies. Database studies and other observational studies are likely to have differences in populations, interventions, comparisons, time treated and clinical settings (e.g., prognostic variables of subjects, dosing, co-interventions, other patient choices, bias from lack of blinding) and adjusting for all of these variables and more requires many assumptions. Propensity scores do not reliably adjust for differences. Thus, the risk of bias in the evidence base is unclear.

This case illustrates the difficulty of making coverage decisions for new drugs with some potential advantages for some patients when several studies report benefit compared to placebo, but we already have established treatment agents with safety records. In addition new drugs frequently are found to cause adverse events over time.

Observational data is frequently very valuable. It can be useful in identifying populations for further study, evaluating the implementation of interventions, generating hypotheses, and identifying current condition scenarios (e.g., who, what, where in QI project work; variation, etc.). It is also useful in providing safety signals and for creating economic projections (e.g., balance sheets, models). In this hypothetical set of studies, however, we have only gray zone evidence about efficacy from both RCTs and observational studies and almost no information about safety.

Much of the October issue of Health Affairs is taken up with other readers’ comments. Those of you interested in the problems with real world data in CER activities will enjoy reading how others reacted to these hypothetical drug studies.


1. Dentzer S; the Editorial Team of Health Affairs. Communicating About Comparative Effectiveness Research: A Health Affairs Symposium On The Issues. Health Aff (Millwood). 2012 Oct;31(10):2183-2187. PubMed PMID: 23048094.

2. Selby JV, Fleurence R, Lauer M, Schneeweiss S. Reviewing Hypothetical Migraine Studies Using Funding Criteria From The Patient-Centered Outcomes Research Institute. Health Aff (Millwood). 2012 Oct;31(10):2193-2199. PubMed PMID: 23048096.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. 

Comparative Effectiveness Research (CER) Warning—Using Observational Studies to Draw Conclusions About Effectiveness May Give You The Wrong Answer
Case Study: Losartan


This week we saw five CER studies—all observational. Can we trust the results of these studies? The following is a case study that helps answer that question:

Numerous clinical trials have reported decreased mortality in heart failure patients treated with ARBs, but no head-to-head randomized trials have compared individual ARBs. In 2007, an administrative database study comparing various ARBs concluded that, “elderly patients with heart failure who were prescribed losartan had worse survival rates compared with those prescribed other commonly used ARBs.”[1] This study used hospital discharge data and information from physician claims and pharmacy databases to construct an observational study. The information on prescriptions included type of drug, dose category, frequency and duration. The authors used several methods to estimate adherence.

Unadjusted mortality for users of each ARB was calculated by using Kaplan-Meier curves. To account for differences in follow-up and to control for differences among patient characteristics, a multivariable Cox proportional hazards model was used.

The main outcome was time to all-cause death in patients with heart failure who were prescribed losartan, valsartan, irbesartan, candesartan or telmisartan. Losartan was the most frequently prescribed ARB (61% of patients). Other ARBs included irbesartan (14%), valsartan (13%), candesartan (10%) and telmisartan (2%). In this scenario, losartan loses. Using losartan as the reference, adjusted hazard ratios (HRs) for mortality among the 6876 patients were 0.63 (95% confidence interval [CI] 0.51 to 0.79) for patients who filled a prescription for valsartan, 0.65 (95% CI 0.53 to 0.79) for irbesartan, and 0.71 (95% CI 0.57 to 0.90) for candesartan. Compared with losartan, adjusted HR for patients prescribed telmisartan was 0.92 (95% CI 0.55 to 1.54). Being at or above the target dose was a predictor of survival (adjusted HR 0.72, 95% CI 0.63 to 0.83).

The authors of this observational study point out that head-to-head comparisons are unlikely to be undertaken in trial settings because of the enormous size and expense that such comparative trials of survival would entail. They state that their results represent the best available evidence that some ARBs may be more effective in increasing the survival rate than others and that their results should be useful to guide clinicians in their choice of drugs to treat patients with heart failure.

In 2011, a retrospective analysis of the Swedish Heart Failure Registry reported a survival benefit of candesartan over losartan in patients with heart failure (HF) at 1 and 5 years.[2] Survival by ARB agent was analyzed by Kaplan-Meier estimates and predictors of survival were determined by univariate and multivariate proportional hazard regression models, with and without adjustment for propensity scores and interactions. Stratified analyses and quantification of residual confounding analyses were also performed. In this scenario, losartan loses again. One-year survival was 90% (95% confidence interval [CI] 89% to 91%) for patients receiving candesartan and 83% (95% CI 81% to 84%) for patients receiving losartan, and 5-year survival was 61% (95% CI 54% to 68%) and 44% (95% CI 41% to 48%), respectively (log-rank P<.001). In multivariate analysis with adjustment for propensity scores, the hazard ratio for mortality for losartan compared with candesartan was 1.43 (95% CI 1.23 to 1.65, P<.001). The results persisted in stratified analyses.

But wait!

In March 2012, a nationwide Danish registry–based cohort study, linking individual-level information on patients aged 45 years and older reported all-cause mortality in users of losartan and candesartan.[3] Cox proportional hazards regression were used to compare outcomes. In 4,397 users of losartan, 1,212 deaths occurred during 11,347 person years of follow-up (unadjusted incidence rate [IR]/100 person-years, 10.7; 95% CI 10.1 to 11.3) compared with 330 deaths during 3,675 person-years among 2,082 users of candesartan (unadjusted IR/100 person-years, 9.0; 95% CI 8.1 to 10.0). Compared with candesartan, losartan was not associated with increased all-cause mortality (adjusted hazard ratio [HR] 1.10; 95% CI 0.9 to 1.25) or cardiovascular mortality (adjusted HR 1.14; 95% CI 0.96-1.36). Compared with high doses of candesartan (16-32 mg), low-dose (12.5 mg) and medium-dose losartan (50 mg) were associated with increased mortality (HR 2.79; 95% CI 2.19 to 3.55 and HR 1.39; 95% CI 1.11 to 1.73, respectively) but use of high-dose losartan (100 mg) was similar in risk (HR 0.71; 95% CI 0.50 to 1.00).

Another small cohort study found no difference in all-cause mortality between 4 different ARBs, including candesartan and losartan.[4] Can we tell who is the winner and who is the loser? It is impossible to know. Different results are likely to be due to different populations (different co-morbidities/prognostic variables), dosages of ARBs, co-interventions, analytic methods, etc. Svanström et al point out that, unlike the study by Eklind-Cervenka, they were able to include a wide range of comorbidities (including noncardiovascular disease), co-medications and health status markers in order to better account for baseline treatment group differences with respect to frailty and general health. As an alternative explanation they state that, given that their findings stem from observational data, their results could be due to unmeasured confounding because of frailty (e.g., patients with frailty and advanced heart failure tolerating only low doses of losartan and because of the severity of heart failure being more likely to die than patients who tolerate high candesartan doses). The higher average relative dose among candesartan users may have led to an overestimation of the overall comparative effectiveness of candesartan.

Our position is that, without randomization, investigators cannot be sure that their adjustments (e.g., use of propensity scoring and modeling) will eliminate selection bias. Adjustments can only account for the factors that can be measured, that have been measured and only as well as the instruments can measure them. Other problems in observational studies include drug dosages and other care experiences which cannot be reliably adjusted (performance and assessment bias).

Get ready for more observational studies claiming to show comparative differences between interventions. But remember, even the best observational studies may have only about a 20% chance of telling you the truth.[5]


1. Hudson M, Humphries K, Tu JV, Behlouli H, Sheppard R, Pilote L. Angiotensin II receptor blockers for the treatment of heart failure: a class effect? Pharmacotherapy. 2007 Apr;27(4):526-34. PubMed PMID: 17381379.

2. Eklind-Cervenka M, Benson L, Dahlström U, Edner M, Rosenqvist M, Lund LH. Association of candesartan vs losartan with all-cause mortality in patients with heart failure. JAMA. 2011 Jan 12;305(2):175-82. PubMed PMID: 21224459.

3. Svanström H, Pasternak B, Hviid A. Association of treatment with losartan vs candesartan and mortality among patients with heart failure. JAMA. 2012 Apr 11;307(14):1506-12. PubMed PMID: 22496265.

4. Desai RJ, Ashton CM, Deswal A, et al. Comparative effectiveness of individual angiotensin receptor blockers on risk of mortality in patients with chronic heart failure [published online ahead of print July 22, 2011]. Pharmacoepidemiol Drug Saf. doi: 10.1002/pds.2175.

5. Ioannidis JPA. Why Most Published Research Findings are False. PLoS Med 2005; 2(8):696-701. PMID: 16060722

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message. 

Cause & Effect Conclusions from Observations

We recently reviewed agreement between RCTs and observational studies dealing with treatment, having worked with so many folks who still believe that observational studies are sufficient for drawing reliable conclusions about effectiveness. This topic has become hot because of the HRT issue—in multiple well-conducted observational studies, authors reported ~40% ARR in CAD with HRT in secondary prevention, yet 5 valid RCTs showed no benefit.

Just to be clear, our take on this is that although an observational study may report outcomes that are similar to a RCT, you are going out on a limb if you conclude that any study type other than a RCT can establish cause and effect.

Here is what we found in our review, taking a historical perspective:

• In 1977 Chalmers et al. reported that a comparison of 6 RCTs to 8 observational studies of the use of anticoagulants in acute MI that the observational studies all showed larger benefit than did the RCTs (N Engl J Med.1977; 297:1091-6). He concluded that there are systematic biases in observational studies.

• In 1983 Chalmers reported that in 160 studies of 6 treatments in Cardiology, the outcomes in the intervention group were better than control in 60% of RCTs and 93% of observational studies. He again concluded that there are systematic biases in observational studies (N Engl J Med.1983;309: 1358-61).

• However, in 1998, Britton et al. concluded that there was not bias in observational studies after reporting that in 7 of 8 observational studies the outcomes were similar to outcomes in RCTs (Health Technol Assess. 1998; 2:1-124).

• In 2000 Guyatt et al. reviewed 13 RCTs and 17 observational studies of pregnancy in adolescence and reported that 6 of 8 outcomes of observational studies showed significant benefit, but that none of the RCTs showed significant benefit. Guyatt’s conclusion was that treatment decisions should be based on observational studies ONLY when RCTs are unavailable and ONLY with careful consideration of possible biases (Journal Clinical Epidemiology.200; 53: 167-174).

• However, using a similar approach to the one used above, two authors (Benson and Concato) in 2000 concluded that observational studies usually provide valid information. An editorial (Ioannidis JP, Haidich A, and Lau J. N Engl J Med. 2000;342: 879-880) stated in response to these authors (and other authors who conclude that cause-effect conclusions can be drawn from observational studies) that:

  • Benson Concato et al are still dealing with only a very small portion of randomized and observational research.
  • Their sampling failed to capture some prodigious discrepancies between the two methods. Interventions such as beta carotene and tocopherol, which have brought fame to observational epidemiologists, crashed when they were tested in rigorous randomized controlled trials.
  • Given the hundreds of thousands of trials and observational studies that have been conducted and are still being conducted, the number of topics studied in the two reports is limited and subject to strong selection biases.
  • We should not abandon plans for RCTs in favor of quick and dirty observational designs.

Bottom Lines

  • Randomization is the only effective means of controlling for known and unknown confounders.
  • Even with RCTs, threats to validity remain—for example:
    • Trials with inadequate or unclear concealment of allocation show more beneficial effects than adequately concealed trials;
    • Open trials tend to show more beneficial effects than double-blind trials (Health Technology Assessment 2003; Vol.7: No.1).
  • Be cautious about accepting the reported treatment effects from observational studies because even well-conducted observational studies may not provide valid evidence because of bias inherent in observational studies
    • Even valid observational studies may provide you with misleading results – even going so far as to suggest benefit when in fact the actual result may be harms as we have seen with HRT.
    • Even if observational studies are subsequently found to have “agreed” with valid RCTs, they may overestimate treatment effects when compared with RCTs
  • Always look for RCTs and always look at the methods.
    • RCTs without blinding may overestimate treatment effects (or may provide you with misleading results).
    • One SR estimates that clinical trials without adequate concealment of allocation produce estimates of effect that are on average 40% larger than clinical trials with adequately concealed random allocation (Kunz and Oxman, BMJ 1998; 317: 1185-90).
    • The bias, however, can go in either direction.
  • Be cautious about accepting the treatment effects of low quality RCTs and non-randomized clinical trials because they may provide inaccurate estimates of treatment effects.

So for those who point to the sometimes agreement between the two study types, we quote one of our friends who points out that even a stopped clock is right at least twice a day.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

More on The Problem with Drawing Cause-Effect Conclusions from Observational Studies

Our last teaching engagement was in Framingham, Massachusetts and reminds us of the value of observational studies to assist us in developing risk stratification models. The Framingham Study began in 1948 as the first prospective study of cardiovascular disease and is important because through observations it has identified cardiovascular disease (CVD) risk factors which can be associated with morbidity and mortality.

But there is good evidence that basing cause and effect conclusions from observational studies is unreliable. Cause and effect conclusions should be based on randomized controlled trials (RCTs) where bias, confounding and chance have been ruled out as possible explanations for the observed association between the intervention and the outcome. Because there are so many observational studies published each week and because we keep seeing health professionals inappropriately basing treatment decisions on them, it is worthwhile summarizing an excellent review of the literature on this topic.

The study and literature review can be found in the reference:

Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG. Evaluating non-randomised intervention studies. Health Technology Assessment 2003; Vol. 7: No. 27.

Some key points from this article:

  • Comparison of results of randomized and non-randomized studies across multiple interventions in multiple studies demonstrate that, in the majority of cases, observational studies are not consistent with the results of RCTs
  • This study, using meta-epidemiological techniques, demonstrates that —
    • None of the study results can be adequately adjusted for bias in observational studies using historic and concurrent controls
    • Logistic regression on average increases bias when applied to observational studies


  • Non-randomized studies may still give seriously misleading results even when those treated and control groups appear similar in key prognostic factors
  • Standard methods of case-mix adjustment do not guarantee removal of bias
  • Omission of important confounding factors can explain failure of adjustment as a substitute for randomization
  • There is no known method for reliably adjusting for confounding factors in observational studies

Delfini Commentary
Extreme caution is urged when considering results of observational studies in interventions for screening, prevention and therapy. Cause and effect conclusions should only be drawn from RCTs.

One reason for this is that there may be major differences in the characteristics (prognostic factors) of individuals who choose a therapy compared to people who do not choose that therapy. A classic example is hormone replacement therapy after myocardial infarction (MI) in women. Most observational studies reported that roughly twice as many women who did not choose to take hormone replacement therapy (HRT) had a recurrent MI compared to women who chose to take HRT. This led people to believe — incorrectly — that HRT caused this benefit. Later, well-done RCTs were conducted and no such benefit was found. Why? The most likely reason is that the observational studies were highly prone to biases resulting from differences between the groups which could not be eliminated even with statistical adjustments in which researchers try to balance confounders between the groups, such as adjusting for smoking.

Another reason is that in observations, investigators do not “control” all elements of the study as they do in RCTs. The end result is that in observational studies other aspects affecting the groups are almost certain to be different in important ways which are likely to explain or affect the study results.

Key Point
Any difference between groups — except for what is being studied (e.g., HRT use) — is a bias.

In the case of HRT after MI, selection bias was present in that women who chose to take HRT were probably more likely to be “health-conscious,” exercise, watch their diets, etc., making them different from the women who did not take HRT. It is also likely that there were other differences in how the two groups experienced their health care because in observational studies there is no formal protocol and so there will be differences in many ways that could affected observed outcomes such as other therapies used, how outcomes are assessed, frequency for follow-up, and so on.

Even with statistical adjustments for differences between potential and known prognostic characteristics of the groups, bias cannot be reliably eliminated because whatever is actually responsible for the outcome (i.e., the confounder) is what would have to be adjusted. This would entail having advance knowledge of cause and effect (but that is why the study is being conducted). Plus statistical adjustment has limitations. How could every single factor that made the HRT users different be adjusted? Humans embody an infinite number of variables such as characteristics and exposures.

Comparisons of RCTs and observational studies of the same interventions have repeatedly demonstrated that even with the most meticulous statistical adjustments, bias cannot be reliably eliminated from observational studies. The key message is that without randomization and assurance that interventions and assessments are the same for both study and comparison groups, one cannot reliably draw conclusions about cause and effect relationships. Associations between interventions and outcomes in observational studies are very likely to be due to bias or confounding. Therefore, observational studies are only useful for hypothesis-generating when considering questions of preventive, screening or therapeutic interventions.

Database Studies
Some groups have tried to demonstrate improved health outcomes (e.g., death, stroke, etc.) through studies of their databases. It should be remembered that this type of study is an observational study and prone to bias and confounding for the reasons explained above, plus it is highly prone to chance findings of statistical significance. Therefore, database studies may be useful for suggesting areas for further study, but they should not be thought of as valid studies from which cause and effect relationships can be concluded.

Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.

Bias in Observational Studies—More on HRT in Menopause

After RCTs (e.g., WHI and HERS) reported no benefit of hormone replacement therapy in preventing coronary heart disease when compared to placebo, many authors explained the dramatic difference from (presumed) benefit reported in observational studies on the basis of the “healthy user effect” a type of selection bias. The conclusion was that in the observational studies women who elected to take HRT were different (i.e., healthier) than those who did not seek out HRT.

In the Dec. 2 Annals of Internal Medicine, SG Pauker points out there are other potential differences between the RCTs and observational studies that might explain the dramatic differences in reported coronary artery disease outcomes between observational studies and RCTs:

  1. In the Nurses Health Study (observational study) silent MI was not studied.
  2. HRT users who believed that HRT protected against coronary heart disease (subjects in the observational studies) might not interpret ischemic pain as related to their hearts and not seek care. This illustrates why blinding of study subjects is so important.
  3. HRT users who believed that HRT protected against coronary heart disease and with atypical ischemic pain might describe their symptoms in a way that was interpreted as non-cardiac by their physicians—again illustrating the importance of blinding.
  4. Physicians completing death certificates might also believe that HRT protects against CHD and assign the cause of death to a condition other than CHD—an assessment bias.

This article is a nice reminder that in observational studies there is always potential for selection bias, performance bias and assessment bias.


Share LinkClick here to share this page. If you are at an entry URL (#title), copy URL, then click "share" button to paste into body text of your message.


Contact UsCONTACT DELFINI Delfini Group EBM DolphinDelfini Group EBM Dolphin

At DelfiniClick™

EBM Dolphin

Read Our Blog...

Use of our website implies agreement to our Notices. Citations for references available upon request.

Best of Delfini
What's New


Delfini Group Publishing
Sample Projects
About Us & Our Work
Site Search

Contact Info/Updates


Quick Navigator to Selected Resources



Return to Top

© 2002-2017 Delfini Group, LLC. All Rights Reserved Worldwide.
Use of this website implies your agreement to our Notices.

EBM Solutions for Evidence-based Health Care Quality