U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Mack C, Su Z, Westreich D. Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, Third Edition [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2018 Feb.

Cover of Managing Missing Data in Patient Registries

Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, Third Edition [Internet].

Show details

Reasons for Missing Data

Item Nonresponse

Registry data may be missing for many reasons. Item nonresponse, which occurs when a participant completes a case report form (CRF) or survey without providing a response for one or more of the data elements, may be the most common reason. As discussed in the chapter on “Data Collection and Quality Assurance,” CRFs typically incorporate checks to ensure that complete, valid data are entered. These checks may prevent CRFs from being marked as complete if data are missing. However, item nonresponse may still occur, either because CRFs are not marked as complete or because some data elements are optional. As a strategy for reducing the burden of data entry, registries often make only essential fields mandatory for completion of a CRF. The remaining fields are considered optional, and providers may enter only some or perhaps no data into those fields. While these optional fields may not be essential for the primary objective of the registry, they may be critical to support secondary objectives or analyses of subpopulations within the registry. For example, a recent analysis of the characteristics of missing data in three patient registries found that 71 percent of patients in one registry were missing data for body mass index (BMI), an optional field.2 Item nonresponse also occurs when patients complete PROs using paper forms and leave some fields blank or enter illegible data.

Threats From the Left: Truncation

The issue of left truncation, a form of selection bias, arises when events of interest occur prior to a patient’s enrollment in the registry and (typically) pre-empt enrollment in the registry. Applebaum et al. define left truncation as occurring “when subjects who otherwise meet entry criteria do not remain observable for a later start of follow-up.”3 For example, in a study of miscarriage which enrolls pregnant women, some patients will be left truncated because “an unknown proportion of the source population experiences losses prior to enrollment.”4 Thus, left truncation results in data missing in the observed cohort due to non-enrollment, leading the study sample to not accurately reflect the underlying target population, in this example, pregnant women at risk for miscarriage.

A related bias can be introduced due to entry of already-exposed individuals into a registry. Consider, for example, a registry designed to study disease progression over several years in patients with a rare disease. Ideally, the registry would enroll only patients at the time of diagnosis, with the goal of collecting detailed baseline and diagnostic information for all patients. However, limiting the registry enrollment to only those newly diagnosed patients would reduce the sample size significantly, and, in the case of a rare disease, likely render the registry infeasible. To enroll sufficient patients, the registry may include both existing (prevalent) patients and newly diagnosed (incident) patients. This enrollment strategy, while practical, has the potential to introduce significant bias for numerous reasons, including under-ascertainment of early events. Examples of the latter include venous thromboembolism risk in women taking third generation over-the-counter drugs relative to earlier products, falls after initiating benzodiazepines, and nonsteroidal anti-inflammatory drugs (NSAIDs) and peptic ulcers.5

The concept of ‘baseline’ will be different for patients who are newly diagnosed versus those with an existing diagnosis at the time of enrollment, and comparisons of symptoms, treatment effectiveness, and disease progression would need to account for these differences. In particular, the patients with existing diagnoses may be missing information on symptoms at diagnosis or other tests or procedures related to their diagnosis that occurred prior to study enrollment.6 Ray gives an overview of this issue in the context of medication effects, suggesting that focusing on new users (or newly exposed people, generally) is a strategy which can minimize bias, and should be considered whenever logistically feasible.5

Threats From the Right: Loss to Followup, Censoring, Competing Risks

Loss to followup and right censoring occur when information is missing at the conclusion rather than the inception of the registry. In studies that collect long-term followup data, participants may be lost to followup if they formally withdraw from the registry or simply stop completing surveys or coming for scheduled visits. Attrition of this nature occurs for many reasons, including factors both related to the study objectives (e.g., the participant becomes too ill to complete study visits) and unrelated (e.g., the participant moves or changes his/her email address without notifying study staff). Broadly speaking, if the attrition is associated with the study outcomes, it introduces a form of selection bias into the registry that must be described and accounted for in analyses to the extent possible (known as informative censoring in the context of randomized clinical trials).7 Whether it introduces bias or not, loss to followup can limit the ability of the registry to examine long-term outcomes and can have an impact on statistical power. Registries that aim to collect long-term followup data are encouraged to develop retention targets, actively monitor retention against those targets, and take proactive measure to minimize loss to followup, as needed. Strategies to retain participants and minimize loss to followup are discussed extensively in Chapters 3, 5, 10, and 13 of the User’s Guide.1

A related concept to loss-to-followup is administrative right censoring, which occurs when the registry ends before an outcome of interest occurs for all subjects (which is typically the case). This is especially common in pregnancy registries, which are designed to assess outcomes of pregnancies during which the mother (or, in some cases, the father) was exposed to medical products. Pregnancy registries typically collect information on congenital defects that are ascertained at birth or shortly after birth (e.g., 30-day followup or, often at most, one year), but are not designed to detect defects or developmental delays that are diagnosed later in life.8 Right censoring occurs in other types of registries as well. For example, a registry designed to study the effectiveness of a cancer treatment may conduct survival analyses after following patients for five years. Some patients will have died during that period, and their survival after treatment will be known. However, for patients who are still alive at the conclusion of the study, survival after treatment will be right censored due to the close of the registry. In general, missing data due to administrative right censoring will not introduce bias in analysis, but bias is possible if there are strong temporal trends in risk of the outcome.

Finally, competing risks must be considered. A competing risk is an event that prevents the outcome or outcomes of interest not merely from being observed, but from happening in the first place. For example, in a study of incidence of heart attack, death (by any cause besides heart attack) prevents incident heart attack from occurring; in a study of breast cancer, preventive double mastectomy likewise may be considered a competing risk for breast cancer. Competing risks can lead to missing data in certain settings; sometimes a study may be interested in the risk of breast cancer in all individuals – including those who, due to beliefs about their personal risks of developing breast cancer, undergo a mastectomy preemptively. In such a setting, the breast cancer status that these women would have had, had they not gotten a mastectomy, can be regarded as a variety of missing data; in other cases, competing risks do not lead to such clear instances of missing data. See Lau et al. for a more involved discussion of competing risks and missing data, as well as analytic approaches.9


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (288K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...