NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Lau J, Balk E, Rothberg M, et al. Management of Clinically Inapparent Adrenal Mass. Rockville (MD): Agency for Healthcare Research and Quality (US); 2002 May. (Evidence Reports/Technology Assessments, No. 56.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Management of Clinically Inapparent Adrenal Mass

Management of Clinically Inapparent Adrenal Mass.

Show details

Appendix C

Biases of Case Series and Potential Hazards in Making Recommendations from Case Series Reports


Case reports and case series have been the earliest method of accumulating evidence in medicine. The Hippocratic writings abound in descriptions of specific classical disease conditions. Most diseases in the past were defined on the basis of case reports or case series that brought new entities into focus. With the advent of controlled studies, the prominence of case series in medicine and in the hierarchy of evidence has diminished (Harbour and Miller, 2001; U.S. Preventive Services Task Force, 1996), but they continue to enjoy substantial popularity among clinicians (Jenicek, 2001). Clinicians may often feel intimidated by the “inroad of statistics” into their disciplines, as this is reflected in sophisticated controlled clinical research and the new methods of meta-analysis, decision-analysis and cost-effectiveness analysis that comprise the foundation of evidence-based medicine. However, clinicians still appreciate the interesting case report and find it easy to understand the findings of a case series. Moreover, the majority of studies published in several major clinical specialty peer-reviewed journals still pertain to uncontrolled data. In this section, we discuss some of the hazards that exist in the use of evidence from case series for making clinical decisions and generating recommendations for medical practice.

Uses of Case Series to Inform Clinical Practices Situations where Case Series May Be the Predominant Evidence

Many conditions may be rare enough to make it impractical to perform controlled research. Controlled trials are typically performed to test hypotheses, and power calculations are conducted to estimate the required sample sizes. For rare diseases, controlled trials and power calculations are often thought to be exercises in futility. Nevertheless, an underpowered controlled trial may still provide useful evidence to estimate the magnitude of a therapeutic effect. Such information would be less prone to bias compared with the evidence that may be obtained with the same subjects with an uncontrolled case series design. Bayesian inference may be used to interpret evidence from small trials. Unfortunately, most trial designs are still dependent on frequentist considerations. Finally, single case reports or small case series may be the only way to discover and study new or uncommon diseases, such as genetic syndromes (Simpson and Griggs, 1985).

Situations where Controlled Designs Are Not Established or Not Feasible

In some situations, controlled designs may be difficult to implement because of logistic considerations. For example, controlled designs and in particular randomized trials are relatively rare in many surgical disciplines and therefore the case reports and case series have remained very popular in the surgical literature. The technical aspects of a new surgical procedure may present difficulties in creating an appropriate control arm. Parameters such as the experience of the surgeon and the point in the learning curve of each surgeon may require particular ingenuity to incorporate in randomized designs. Therefore, case series are very common for many surgical subjects. In medical domains, the adoption of controlled designs varies considerably. In some areas such as cardiology and oncology there is relatively wide appreciation of the merits of randomized controlled trials. In other specialties such as rheumatology, uncontrolled designs are still common research practice.

Common Diseases where Large Amounts of Old Uncontrolled Evidence Exists

Health care practices sometimes become established on the basis of data accumulated before the availability of evidence from randomized trials. For example, the effectiveness of Pap tests in reducing cervical cancer mortality has never been tested with the rigorousness of randomized trials. However, it is difficult to question its use at the moment, despite the fact that the diagnostic characteristics (sensitivity and specificity) of this test are far from optimal (Fahey, Irwig, and Macaskill, 1995). In general, for interventions that once seemed to be extremely effective in uncontrolled research in the past, one may now perceive an ethical barrier to testing them under controlled conditions. While many interventions based on evidence from non-randomized data may truly be efficacious (Ioannidis, Haidich, Pappa, et al., 2001), unfortunately we cannot tell whether some of them may lead to practices that later will prove to be erroneous when randomized evidence become available (Antman, Lau, Kupelnick, et al., 1992).

Situations where Evidence May Only Be Obtained from Uncontrolled Study Designs

For some research questions, such as estimation of incidence and prevalence or descriptive research, non-controlled case series may be the only legitimate mode for gathering information. Conducting clinical trials with rare events is often unfeasible. High costs may also make conducting a trial impractical. Actually, controlled studies may offer less reliable estimates of frequency and other descriptive parameters since they are usually focused on selected populations that have already satisfied the eligibility criteria of the controlled study.

One should caution that case series are sometimes used to make inferences about the prevalence or incidence of a disease. While, in some cases, this may be the only approach possible, the denominator (total population) is typically not well defined. For example, one cannot be certain that all cases in a certain “catchment” area during a specific period of time have been included in a given case series. Thus, whenever possible, case series should not replace more appropriate prevalence studies.

Biases in Case Series Reports Retrospective vs. Prospective Design

Most case series are based on retrospective examination of information, but sometimes information may be collected prospectively. Retrospective information is easier to collect than prospective information. Typical sources include medical records, electronic hospital and physician archives, and large-scale databases from insurance or other organizations (Kelsey, Whitemore, Evans, et al., 1996). Recording of information in retrospective case series may be more selective and subject to larger error than information that is being collected prospectively with a specific plan in mind. Retrospective data are also likely to be more heterogeneous both in quality and quantity. Since there is no a priori research plan when the information is being collected, the type and quality of data that are captured for each patient may vary enormously even within the same database. In general, retrospective data that pertain to parameters that are not routinely captured and unequivocally defined is likely to be incomplete. These biases do not exist with prospective collection of information where data may be captured according to a predefined plan and standardized definitions.

Random Error (Measurement Error)

Measurement error is likely to be larger in uncontrolled designs than in controlled studies. The problem may be larger when it pertains to retrospective case series when there was no common a priori rule on how the pertinent measurements should be made and recorded. Repeatability conveys the concordance of successive measurements of the same parameter performed by the same evaluator (or laboratory) on different days or at different runs of the assay. Reproducibility refers to the concordance of successive measurements of the same parameter performed by different evaluators or different laboratories. The repeatability and reproducibility decrease when measurements are performed by different individuals or different methods, with different standards and different interpretation and recording procedures. Often, but not always, in controlled prospective research, there are assurances that both the repeatability and reproducibility of the measured parameters would be appropriately high. Quality control programs may exist in multicenter trials, for example, that assures that measurements of the parameters of interest deviate within only a reasonable extent between different centers, laboratories and evaluators. In case series, usually there are no such assurances.

A special consequence of measurement error is the phenomenon of regression-to-the-mean (Davis, 1976). Regression-to-the-mean occurs when the following conditions are satisfied: (a) subjects are selected on the basis of having extreme values (higher than a certain cutoff or lower than a certain cutoff) in a given parameter of interest; (b) only one measurement of this parameter is done at baseline; and (c) the change in this parameter is used to evaluate the outcome of a subject during follow-up. Regression-to-the-mean is one common explanation why case series report spurious therapeutic efficacy for ineffective interventions. Subjects with extreme characteristics are often the target of studying interventions. For example, it is common to select high-risk patients for a new intervention. Alternatively, one may select only low-risk patients if the intervention is conceived to be potentially toxic or its toxicity is unknown. Even in the absence of any true therapeutic effect, a group selected on the basis of an extreme value in a given parameter is likely to show less extreme values on average upon re-measurement. For example, let us assume that patients with rheumatoid arthritis are selected on the basis of having an elevated sedimentation rate >60 mm per hour. Upon re-measurement, the mean ESR of the group will be lower than the baseline measurement, even if there is no true change in the disease activity. The regression-to-the mean can be eliminated, if a separate measurement is performed after the screening measurement and then the second measurement is used as the baseline. Unfortunately, this simple correction is rarely performed in most case series even in studies published in prestigious journals.

Besides regression-to-the-mean, one needs to consider also the natural course of the disease both as short term and long-term assessment. Uncontrolled studies do not allow us to evaluate what would have happened to subjects if they had not received the intervention(s) that they actually received. For some diseases with continuously progressive disease, simple stabilization of a lesion or the patient's overall condition may be a success. For other diseases, where the disease severity may naturally improve over time, stabilization of the patient's condition may actually signal a detrimental, toxic intervention that should be avoided. For the same reasons, in treatment of malignant or pre-malignant lesions, uncontrolled studies may give us biased estimates about the effectiveness of specific interventions in delaying their development and consequences.

Systematic Errors (Bias)

Subjects who are included in case series may suffer from various manifestations of selection biases. Selection bias occurs when the sample of a case series is not representative of the general population of the condition that it represents. In some situations, even self-selection may occur; for example, patients with some characteristics suggestive of worst prognosis may be selectively lost-to-follow-up, and the remaining group of subjects may subsequently and falsely give a picture of a group with more favorable prognosis than the truth would be. Another form of selection bias is diagnosis bias, where the attribution of a diagnosis may depend on the availability of additional information. For example, a lesion may be more likely to be screened for malignant features by a radiologist, if the radiologist knows that the patient has clinical signs that are suggestive of malignancy. More exhaustive imaging studies may be performed in such cases. Moreover, the radiologist may also be tempted to make a diagnosis of malignant lesion, if such clinical information is available.

Confounding is very difficult to handle in case series. The definition of a confounder (Rothman and Greenland, 1998) relies on the following criteria: (a) the confounder is a risk factor for the disease among those that do not have the studied parameter; (b) the confounder is associated with the studied parameter in the source population from which the subjects arise; and (c) the confounder is not affected by the disease or the studied parameter. In most case series research, information on confounders is not collected or may be collected erratically and selectively. Thus it is very difficult to evaluate how confounding impacts the results.

Biases Seen from a Diagnostic Test Perspective

Case series that aim to differentiate between two or more diagnoses based on various parameters of interest may be conceived as applications of diagnostic test research. Typical biases that occur in this setting (Mulrow, Linn, Gaul, et al., 1989) include, among others: (1) lack of accurate definitions of the diagnostic categories; (2) verification bias (different diagnostic work-up depending on the features and categorization of the lesion); (3) improper handling of gray measurements and uninterpretable results; (4) lack of accurate and reproducible definitions for the putative discriminating parameters; and (5) information bias (misclassification) that may be non-differential or differential. The issues are similar to the biases that exist for other types of case series.

Synthesis of Data from Case Series and Their Respective Biases Selective Reporting and Preference for Extreme Cases - Publication Bias

Many case series may report selected data, rather than all the data that has been accumulated from the investigators. In retrospective research especially, the eligibility criteria of what is to be included in a study report may be decided after the database has been collected. A special concern is publication bias, i.e. the tendency to publish more easily results that show statistically significant findings, while at the same time leaving small studies with non-significant results unpublished. Such a bias may influence the strength of associations observed in isolated case series as well as in their overall synthesis. For example, it is possible that such biases may tend to create a picture that some parameters are stronger predictors of an outcome or stronger discriminating factors for differentiating between two types of lesions than the truth would be. Publication bias is stronger in uncontrolled research (Easterbrook, Berlin, Gopalan, et al., 1991).

Different Definitions of Parameters and Outcomes

The synthesis of data from uncontrolled research is further confounded by the fact that it is unlikely that the different research reports use the same definitions for the various parameters of interest or for the study outcomes. Some heterogeneity is unavoidable and to some extent it is even useful in assessing the generalizability of the conclusions. This is especially true for differences in eligibility and selection criteria. However, significant differences in definitions may be problematic. Reports of case series may give suboptimal information and it may not be feasible to merge all data adjusting them to common definitions.

Lack of Standardization for Confounders

Lack of standardized definitions is likely to be prominent even for putative confounders. Information on confounders may have been collected with different means in different studies, or according to different definitions. In many cases, information on important confounders may have been collected only in some studies, but not in others. Synthesizing such disparate data may be very problematic.

Quality of Data

The quality of the overall data is likely to be different in various studies. Most importantly, the quality of the data may be difficult or even impossible to discern simply from the written report of an uncontrolled study. Decisions on data synthesis for case series should always employ a priori the use of some rules that define the minimal prerequisites of quality that need to be met in order to include a case series into the final quantitative synthesis. While it is important to be able to integrate knowledge obtained from case reports and case series into the larger frame of evidence-based medicine (Vandenbroucke, 1999; Vandenbroucke, 2001), quality issues should be taken seriously.

References for Appendix C

  1. Antman E M, Lau J, Kupelnick B. et al. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. [see comments] JAMA. 1992 Jul 8;268(2):240–8. [PubMed: 1535110]
  2. Davis C E. The effect of regression to the mean in epidemiologic and clinical studies. American Journal of Epidemiology. 1976 Nov;104(5):493–8. [PubMed: 984023]
  3. Easterbrook P J, Berlin J A, Gopalan R. et al. Publication bias in clinical research. [see comments] Lancet. 1991 Apr 13;337(8746):867–72. [PubMed: 1672966]
  4. Fahey M T, Irwig L, Macaskill P. Meta-analysis of Pap test accuracy. [see comments] American Journal of Epidemiology. 1995 Apr 1;141(7):680–9. [PubMed: 7702044]
  5. Harbour R, Miller J. A new system for grading recommendations in evidence based guidelines. BMJ. 2001 Aug 11;323(7308):334–6. [PMC free article: PMC1120936] [PubMed: 11498496]
  6. Ioannidis J P, Haidich A B, Pappa M. et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. [Review] [59 refs] JAMA. 2001 Aug 15;286(7):821–30. [PubMed: 11497536]
  7. Jenicek. Clinical case reporting in evidence-based medicine. 2nd ed. London: Arnold; 2001.
  8. Kelsey JL, Whitemore AS, Evans AS, Thompson WD, eds. Methods in observational epidemiology. 2nd ed. New York: Oxford University Press; 1996.
  9. Mulrow C D, Linn W D, Gaul M K. et al. Assessing quality of a diagnostic test evaluation. Journal of General Internal Medicine. 1989 Jul;4(4):288–95. [PubMed: 2760697]
  10. Rothman KJ and Greenland S. Modern epidemiology. 2nd ed. Philadelphia: Lippincott-Raven; 1998.
  11. Simpson R J Jr, Griggs T R. Case reports and medical progress. Perspectives in Biology & Medicine. 1985;28(3):402–6. [PubMed: 4011382]
  12. U.S.Preventive Services Task Force. Guide to clinical preventive services. 2nd ed. Baltimore: Willaims and Wilkins; 1996:861–862.
  13. Vandenbroucke J P. Case reports in an evidence-based world. Journal of the Royal Society of Medicine. 1999 Apr;92(4):159–63. [PMC free article: PMC1297135] [PubMed: 10450190]
  14. Vandenbroucke J P. In defense of case reports and case series. Annals of Internal Medicine. 2001 Feb 20;134(4):330–4. [PubMed: 11182844]


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...