NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gliklich RE, Dreyer NA, Leavy MB, editors. Registries for Evaluating Patient Outcomes: A User's Guide [Internet]. 3rd edition. Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Apr.

Cover of Registries for Evaluating Patient Outcomes

Registries for Evaluating Patient Outcomes: A User's Guide [Internet]. 3rd edition.

Show details

4Data Elements for Registries

1. Introduction

Selection of data elements for a registry requires a balancing of potentially competing considerations. These considerations include the importance of the data elements to the integrity of the registry, their reliability, their necessity for the analysis of the primary outcomes, their contribution to the overall response burden, and the incremental costs associated with their collection. Registries are generally designed for a specific purpose, and data elements not critical to the successful execution of the registry or to the core planned analyses should not be collected unless there are explicit plans for their analysis.

The selection of data elements for a registry begins with the identification of the domains that must be quantified to accomplish the registry purpose. The specific data elements can then be selected, with consideration given to clinical data standards, common data definitions, and the use of patient identifiers. Next, the data element list can be refined to include only those elements that are necessary for the registry purpose. Once the selected elements have been incorporated into a data collection tool, the tool can be pilot tested to identify potential issues, such as the time required to complete the form, data that may be more difficult to access than realized during the design phase, and practical issues in data quality (such as appropriate range checks). This information can then be used to modify the data elements and reach a final set of elements.

2. Identifying Domains

Registry design requires explicit articulation of the goals of the registry and close collaboration among disciplines, such as epidemiology, health outcomes, statistics, and clinical specialties. Once the goals of the study are determined, the domains most likely to influence the desired outcomes must be defined. Registries generally include personal, exposure, and outcomes information. The personal domain consists of data that describe the patient, such as information on patient demographics, medical history, health status, and any necessary patient identifiers. The exposure domain describes the patient's experience with the product, disease, device, procedure, or service of interest to the registry. Exposure can also include other treatments that are known to influence outcome but are not necessarily the focus of the study, so that their confounding influence can be adjusted for in the planned analyses. The outcomes domain consists of information on the patient outcomes that are of interest to the registry; this domain should include both the primary endpoints and any secondary endpoints that are part of the overall registry goals.

In addition to the goals and desired outcomes, it is necessary to consider the need to create important subsets when defining the domains. Measuring potential confounding factors (variables that are linked with both the exposure and outcome) should be taken into account in this stage of registry development. Collecting data on potential confounders will allow for analytic or design control. (See Chapters 3 and 13.)

Understanding the time reference for all variables that can change over time is critical in order to distinguish cause-and-effect relationships. For example, a drug taken after an outcome is observed cannot possibly have contributed to the development of that outcome. Time reference periods can be addressed by including start and stop dates for variables that can change; they can also be addressed categorically, as is done in some quality improvement registries. For example, the Paul Coverdell National Acute Stroke Registry organized its patient-level information into categories to reflect the timeframe of the stroke event from onset through treatment to followup. In this case, the domains were categorized as prehospital, emergency evaluation and treatment, in-hospital evaluation and treatment, discharge information, and postdischarge followup.1

3. Selecting Data Elements

Once the domains have been identified, the process of selecting data elements begins with identification of the data elements that best quantify that domain and the source(s) from which those data elements can be collected. When selecting data elements, gaining consensus among the registry stakeholders is important, but this must be achieved without undermining the purpose of the registry by including elements solely to please a stakeholder. Each data element should support the purpose of the registry and answer an explicit scientific question or address a specific issue or need. The most effective way to select data elements is to start with the study purpose and objective, and then decide what types of groupings, measurements, or calculations will be needed to analyze that objective. Once the plan of analysis is clear, it is possible to work backward to define the data elements necessary to implement that analysis plan. This process keeps the group focused on the registry purpose and limits the number of extraneous (“nice to know”) data elements that may be included.2 (See Case Example 5.)

The data element selection process can be simplified if clinical data standards for a disease area exist. (See Case Example 7.) While there is a great need for common core data sets for conditions, there are few consensus or broadly accepted sets of standard data elements and data definitions for most disease areas. Thus, different studies of the same disease state may use different definitions of fundamental concepts, such as the diagnosis of myocardial infarction or the definition of worsening renal function.

To address this problem and to support more consistent data elements so that comparisons across studies can be more easily accomplished, some specialty societies and organizations are beginning to compile clinical data standards. For example, the American College of Cardiology (ACC) has created clinical data standards for acute coronary syndromes, heart failure, and atrial fibrillation.3-5 These are used by registries such as the National Cardiovascular Data Registry (NDCR)® ICD Registry ™ for implantable cardioverter defibrillators and leads, which derived their publically posted data elements and definitions from the American College of Cardiology/American Heart Association (ACC/AHA) Key Data Elements and Definitions for Electrophysiological Studies and Procedures.6 The National Cancer Institute (NCI) provides the Cancer Data Standards Registry and Repository (caDSR), which includes the caBIG® (Cancer Biomedical Informatics Grid®)–NCI data standards and the Cancer Therapy Evaluation Program (CTEP) common data element initiative.7, 8 The North American Association of Central Cancer Registries (NAACCR) has developed a set of standard data elements and a data dictionary, and it promotes and certifies the use of these standards.9 The American College of Surgeons National Cancer Database (NCDB) considers its data elements to be nationally standardized and open source.10

To a lesser extent, other disease areas also have begun to catalog data element lists and definitions. In the area of trauma, the International Spinal Cord Society has developed an International Spinal Cord Injury Core data set to facilitate comparison of studies from different countries,11 and the National Center for Injury Prevention and Control has developed Data Elements for Emergency Department Systems (DEEDS), which are uniform specifications for data entered into emergency department patient records.12 In the area of neurological disorders, the National Institute of Neurological Disorders and Stroke (NINDS) maintains a list of several hundred data elements and definitions (Common Data Elements).13 In the area of infection control, the National Vaccine Advisory Committee (NVAC) in 2007 approved a new set of core data elements for immunization information systems, which are used as functional standards by groups such as the American Immunization Registry Association (AIRA).14, 15 Currently, there are more than one set of lists for some conditions (e.g., cancer) and no central method to search broadly across disease areas.

Some standards organizations are also working on core data sets. The Clinical Data Interchange Standards Consortium (CDISC) Clinical Data Acquisition Standards Harmonization (CDASH) is a global, consensus-based effort to recommend minimal data sets in 16 domains. While developed primarily for clinical trials, these domains have significant utility for patient registries. They comprise adverse events, comments, prior and concomitant medications, demographics, disposition, drug accountability, electrocardiogram test results, exposure, inclusion and exclusion criteria, laboratory test results, medical history, physical examination, protocol deviations, subject characteristics, substance abuse, and vital signs. The CDASH Standards information also includes a table on best practices for developing case report forms.16

The use of established data standards, when available, is essential so that registries can maximally contribute to evolving medical knowledge. Standard terminologies—and to a greater degree, higher level groupings into core data sets for specific conditions—not only improve efficiency in establishing registries but also promote more effective sharing, combining, or linking of data sets from different sources. Furthermore, the use of well-defined standards for data elements and data structure ensures that the meaning of information captured in different systems is the same. This is critical for “semantic” interoperability between information systems, which will be increasingly important as health information system use grows. This is discussed more in Chapter 15, Section 6.2.

Clinical data standards are important to allow comparisons between studies, but when different sets of standards overlap (i.e., are not harmonized), the lack of alignment may cause confusion during analyses. To consolidate and align standards that have been developed for clinical research, CDISC, the HL7 (Health Level 7) Regulated Clinical Research Information Management Technical Committee (RCRIM TC), NCI, and the U.S. Food and Drug Administration (FDA) have collaborated to create the Biomedical Research Integrated Domain Group (BRIDG) model. The purpose of this project is to provide an overarching model that can be used to harmonize standards between the clinical research domain and the health care domain. BRIDG is a domain analysis model (DAM), meaning that it provides a common representation of the semantics of protocol-driven clinical and preclinical research, along with the associated data, resources, rules, and processes used to formally assess a drug, treatment, or procedure.17 The BRIDG model is freely available to the public as part of an open-source project at It is hoped that the BRIDG model will guide clinical researchers in selecting approaches that will enable their data to be compared with other clinical data, regardless of the study phase or data collection method.18

In cases where clinical data standards for the disease area do not exist, established data sets may be widely used in the field. For example, United Network of Organ Sharing (UNOS) collects a large amount of data on organ transplant patients. Creators of a registry in the transplant field should consider aligning their data definitions and data element formats with those of UNOS to simplify the training and data abstraction process for sites.

Other examples of widely used data sets are the Joint Commission and the Centers for Medicare & Medicaid Services (CMS) data elements for hospital data submission programs. These data sets cover a range of procedures and diseases, from heart failure and acute myocardial infarction to pregnancy and surgical infection prevention. Hospital-based registries that collect data on these conditions may want to align their data sets with the Joint Commission and CMS. However, one limitation of tying elements and definitions to another data collection program rather than a fixed standard is that these programs may change their elements or definitions. With Joint Commission core measure elements, for example, this has occurred with some frequency.

If clinical data standards for the disease area and established data sets do not exist, it is still possible to incorporate standard terminology into a registry. This will make it easier to compare the registry data with the data of other registries and reduce the training needs and data abstraction burden on sites. Examples of several standard terminologies used to classify important data elements are listed in Table 4–1. Standard terminologies and suggestions for minimal data sets specific to pregnancy registries are provided in Chapter 21.

Table 4–1. Standard terminologies.

Table 4–1

Standard terminologies.

In addition to these standard terminologies, numerous useful commercial code listings target specific needs, such as proficiency in checking for drug interactions or compatibility with widely used electronic medical record systems. Mappings between many of these element lists are also increasingly available. For example, SNOMED CT® (Systemized Nomenclature of Medicine Clinical Terminology) can currently be mapped to ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical Modification), and mapping between other standards is planned or underway.20

After investigating clinical data standards, registry planners may find that there are no useful standards or established data sets for the registry, or that these standards comprise only a small portion of the data set. In these cases, the registry will need to define and select data elements with the guidance of its project team, which may include an advisory board.

When selecting data elements, it is often helpful to gather input from statisticians, epidemiologists, psychometricians, and experts in health outcomes assessment who will be analyzing the data, as they may notice potential analysis issues that need to be considered at the time of data element selection. Data elements may also be selected based on performance or quality measures in a clinical area. (See Case Examples 6 and 53.)

When beginning the process of defining and selecting data elements, it can be useful to start by considering the registry design. Since many registries are longitudinal, sites often collect data at multiple visits. In these cases, it is necessary to determine which data elements can be collected once and which data elements should be collected at every visit. Data elements that can be collected once are often collected at the baseline visit.

In other cases, the registry may be collecting data at an event level, so all of the data elements will be collected during the course of the event rather than in separate visits. In considering when to collect a data element, it is also important to determine the most appropriate order of data collection. Data elements that are related to each other in time (e.g., dietary information and a fasting blood sample for glucose or lipids) should be collected in the same visit rather than in different visit case report forms.

International clinician and patient participation may be required to meet certain registry data objectives. In such situations, it is desirable to consider the international participation when selecting data elements, especially if it will be necessary to collect and compare data from individual countries. Examination and laboratory test results or units may differ among countries, and standardization of data elements may become necessary at the data-entry level. Data elements relating to cost-effectiveness studies may be particularly challenging, since there is substantial variation among countries in health care delivery systems and practice patterns, as well as in the cost of medical resources that are used as “inputs.”

Alternatively, if capture of internationally standardized data elements is not desirable or cannot be achieved, registry stakeholders should consider provisions to capture data elements according to local standards. Later, separate data conversions and merging outside the database for uniform reporting or comparison of data elements captured in multiple countries can be evaluated and performed as needed if the study design ensures that all data necessary for such conversions have been collected.

Table 4–2 provides examples of possible baseline data elements. The actual baseline data elements selected for a specific registry will vary depending on the design, nature, and goals of the registry. Examples listed include patient identifiers (e.g., for linkage to other databases), contact information (e.g., for followup), and residence location of enrollee (e.g., for geographic comparisons). Other administrative data elements that may be collected include the source of enrollment, enrollee sociodemographic characteristics, and information on provider locations.

Table 4–2. Examples of possible baseline data elements.

Table 4–2

Examples of possible baseline data elements.

Depending on the purpose of a registry, other sets of data elements may be required. Table 4–3 provides examples of possible additional data elements; again, the data elements selected for a specific registry will vary and should be driven by the design and purpose of the registry.

Table 4–3. Examples of possible additional enrollee, provider, and environmental data elements.

Table 4–3

Examples of possible additional enrollee, provider, and environmental data elements.

In addition, data elements that may be needed for specific types of registries are outlined here:

  • For registries examining questions of safety for drugs, vaccines, procedures, or devices, key information includes history of the exposure and data elements that will permit analysis of potential confounding factors that may affect observed outcomes, such as enrollee characteristics (e.g., comorbidities, concomitant therapies, socioeconomic status, ethnicity, environmental and social factors) and provider characteristics. For drug exposures, data on use (start and stop dates), as well as data providing continuing evidence that the drug was actually used (data on medication persistence and/or adherence), may be important. In some instances, it is also useful to record reasons for discontinuation and whether pills were split or shared with others. Refer to Chapter 19 for more information on using registries for product safety assessments. For registries designed to study devices, unique device identifier information may be collected. See Chapter 23 for more information on issues specific to medical devices.
  • For registries examining questions of effectiveness and cost-effectiveness, key information includes the history of exposure and data elements that will permit analysis of potential confounding factors that may affect observed outcomes. It may be particularly useful to collect information to assess confounding by indication, such as the reason for prescribing a medication. In addition to the data elements mentioned above for safety, data elements may include individual behaviors and provider and/or system characteristics. For assessment of cost-effectiveness, information may be recorded on the financial and economic burden of illness, such as office visits, visits to urgent care or the emergency room, and hospitalizations, including length of stay. Information on indirect or productivity costs (such as absenteeism and disability) may also be collected. For some studies, a quality-of-life instrument that can be analyzed to provide quality-adjusted life years or similar comparative data across conditions may be useful.
  • For registries assessing quality of care and quality improvement, data that categorize and possibly differentiate among the services provided (e.g., equipment, training, or experience level of providers, type of health care system) may be sought, as well as information that identifies individual patients as potential candidates for the treatment (Chapter 22). In addition, patient-reported outcomes are valuable to assess the patients' perception of quality of care (Chapter 5).
  • For registries examining the natural history of a condition, the selection of data elements would be similar to those of effectiveness registries.

If one goal of a registry is to identify patient subsets that are at higher risk for particular outcomes, more detailed information on patient and provider characteristics should be collected, and a higher sample size also may be required. This information may be important in registries that look at the usage of a procedure or treatment. Quality improvement registries also use this information to understand how improvement differs across many types of institutions.

Another question that may arise during data element selection relates to endpoint adjudication. Some significant endpoints may either be difficult to confirm without review of the medical record (e.g., stroke) or may not be specific to a single disease and therefore difficult to attribute without such review (e.g., mortality). While clinical trials commonly use an adjudication process for such endpoints to better assess the endpoint or the most likely cause, this is much less common in registries. The use of adjudication for endpoints will depend on the purpose of the registry.

3.1. Patient Identifiers

When selecting patient identifiers, there are a variety of options to use (e.g., the patient's name, date of birth, or some combination thereof) that are subject to legal and security considerations. When the planned analyses require linkage to other data (such as medical records), more specific patient information may be needed, depending on the planned method of linkage (e.g., probabilistic or deterministic). (For more information on linkage considerations, see Chapter 16.) In selecting patient identifiers, some thought should be given to the possibility that patient identifiers may change during the course of the registry. For example, patients may change their names during the course of the registry following marriage/divorce, or patients may move or change their telephone numbers. Patient identifiers can also be inaccurate because of intentional falsification by the patient (e.g., for privacy reasons in a sexually transmitted disease registry), unintentional misreporting by the patient or a parent (e.g., wrong date of birth), or typographical errors by clerical staff. In these cases, having more than one patient identifier for linking patient records can be invaluable. In addition, identifier needs will differ based on the registry goals. For example, a registry that tracks children will need identifiers related to the parents, and registries that are likely to include twins (e.g., immunization registries) should plan for the duplication of birth dates and other identifiers. In selecting patient identifiers for use in a registry, registry planners will need to determine what data are necessary for their purpose and plan for potential inaccurate and changing data.

Generally, patient identifiers can simplify the process of identifying and tracking patients for followup. Patient identifiers also allow for the possibility of identifying patients who are lost to followup due to death (i.e., through the National Death Index) and linking to birth certificates for studies in children. In addition, unique patient identifiers allow for analysis to remove duplicate patients.

When considering the advantages of patient identifiers, it is important to take into account the potential challenges that collecting patient identifiers can present and the privacy and security concerns associated with the collection and use of patient identifiers. Obtaining consent for the use of patient-identifiable information can be an obstacle to enrollment, as it can lead to the refusal of patients to participate. Chapter 7 contains more information on the ethical and legal considerations of using patient identifiers.

In addition to the data points related to primary and secondary outcomes, it is important to plan for patients who will leave the registry. While the intention of a registry is generally for all patients to remain in the study until planned followup is completed, planning for patients to leave the study before completion of full followup may reduce analysis problems. By designing a final study visit form, registry planners can more clearly document when losses to followup occurred and possibly collect important information about why patients left the study. Not all registries will need a study discontinuation form, as some studies collect data on the patient only once and do not include followup information (e.g., in-hospital procedure registries).

3.2. Data Definitions

Creating explicit data definitions for each variable to be collected is essential to the process of selecting data elements. This is important to ensure internal validity of the proposed study so that all participants in data collection are acquiring the requisite information in the same reproducible way. (See Chapter 11.) The data definitions should include the ranges and acceptable values for each individual data element, as well as the potential interplay of different data elements. For example, logic checks for the validity of data capture may be created for data elements that should be mutually exclusive.

When deciding on data definitions, it is important to determine which data elements are required and which elements may be optional. This is particularly true in cases where the registry may collect a few additional “nice to know” data elements. The determination will differ depending on whether the registry is using existing medical record documentation to obtain a particular data element or whether the clinician is being asked directly. For example, the New York Heart Association Functional Class for heart failure is an important staging element but is often not documented.21 However, if clinicians are asked to provide the data point prospectively, they can readily do so. Consideration should also be given to accounting for missing or unknown data. In some cases, a data element may be unknown or not documented for a particular patient, and followup with the patient to answer the question may not be possible. Including an option on the form for “not documented” or “unknown” will allow the person completing the case report form to provide a response to each question rather than leaving it blank. Depending on the analysis plans for the registry, the distinction between undocumented data and missing data may be important.

3.3. Patient-Reported Outcomes

When collecting data for patient outcomes analysis, it is important to use patient-reported outcomes (PROs) that are valid, reliable, responsive, interpretable, and translatable. PROs reflect the patients' perceptions of their status and their perspective on health and disease. PROs have become an increasingly important avenue of investigation, particularly in light of the 2001 Institute of Medicine report calling for a more patient-centered health care system.22 The FDA also noted the importance of PRO data in understanding certain treatment effects in its 2009 guidance document.23 The use of PROs in registries is discussed in more detail in Chapter 5.

When using an instrument to gather data on PROs, it is important both to collect the individual question responses and to calculate the summary or composite score. The summary score, which may be for the entire instrument or for individual domains, is ultimately used to report results. However, if the registry collects only the summary score, it will not be possible to examine how the patients scored on different components of the instrument during the registry analysis phase.

4. Registry Data Map

Once data elements have been selected, a data map should be created. The data map identifies all sources of data (Chapter 6) and explains how the sources of data will be integrated. Data maps are useful to defend the validity and/or reliability of the data, and they are typically an integral part of the data management plan (Chapter 11, Section 2.5).

5. Pilot Testing

After the data elements have been selected and the data map created, it is important to pilot test the data collection tools to determine the time needed to complete the form and the resulting subject/abstractor burden. For example, through pilot testing, registry planners might determine that it is wise to collect certain data elements that are either highly burdensome or only “nice to know” in only a subset of participating sites (nested registry) that agree to the more intensive data collection, so as not to endanger participation in the registry as a whole. Pilot testing should also help to identify the rate of missing data and any validity issues with the data collection system.

The burden of form collection is a major factor determining a registry's success or failure, with major implications for the cost of participation and for the overall acceptance of the registry by hospitals and health care personnel. Moreover, knowing the anticipated time needed for patient recruitment/enrollment will allow better communication to potential sites regarding the scope and magnitude of commitment required to participate in the study. Registries that obtain information directly from patients include the additional issue of participant burden, with the potential for participant fatigue, leading to failure to answer all items in the registry. Highly burdensome questions can be collected in a prespecified subset of subjects. The purpose of these added questions should be carefully considered when determining the subset so that useful and accurate conclusions can be achieved.

Pilot testing the registry also allows the opportunity to identify issues and make refinements in the registry-specific data collection tools, including alterations in the format or order of data elements and clarification of item definitions. Alterations to validated PRO measures are generally not advised unless they are revalidated. Validated PRO measures that are not used in the validated format may be perceived as invalid or unreliable.

Piloting may also uncover problems in registry logistics, such as the ability to accurately or comprehensively identify subjects for inclusion. A fundamental aspect of pilot testing is evaluation of the accuracy and completeness of registry questions and the comprehensiveness of both instructional materials and training in addressing these potential issues. Gaps in clarity concerning questions can result in missing or misclassified data, which in turn may cause bias and result in inaccurate or misleading conclusions. For example, time points, such as time to radiologic interpretation of imaging test, may be difficult to obtain retrospectively and, if they do exist in the chart, may not be consistently documented. Without additional instruction, some hospitals may indicate the time the image was read by the radiologist and others may use the time when the interpretation was recorded in the chart. The two time points can have significant variation, depending on the documentation practices of the institution.

Pilot testing ranges in practice from ad hoc assessments of the face validity of instruments and materials in clinical sites, to trial runs of the registry in small numbers of sites, to highly structured evaluations of inter-rater agreement. The level of pilot testing is determined by multiple factors. Accuracy of data entry is a key criterion to evaluate during the pilot phase of the registry. When a “gold standard” exists, the level of agreement with a reference standard (construct validity) may be measured.24 Data collected by seasoned abstractors or auditors following strict operational criteria can serve as the gold standard by which to judge accuracy of abstraction for chart-based registries.25

In instances where no reference standard is available, reproducibility of responses to registry elements by abstractors (inter-rater reliability) or test-retest agreement of subject responses may be assessed.26 Reliability and/or validity of a data element should be tested in the pilot phase whenever the element is collected in new populations or for new applications. Similar mechanisms to those used during the pilot phase can be used during data quality assurance (Chapter 11, Section 3). A kappa statistic measure of how much the level of agreement between two or more observers exceeds the amount of agreement expected by chance alone is the most common method for measuring reliability of categorical and ordinal data. The intraclass correlation coefficient, or inter-rater reliability coefficient, provides information on the degree of agreement for continuous data. It is a proportion that ranges from zero to one. Item-specific agreement represents the highest standard for registries; it has been employed in cancer registries and to assess the quality of data in statewide stroke registries. Other methods, such as the Bland and Altman method,26 may also be chosen, depending upon the type of data and registry purpose.

6. Summary

The selection of data elements requires balancing such factors as their importance for the integrity of the registry and for the analysis of primary outcomes, their reliability, their contribution to the overall burden for respondents, and the incremental costs associated with their collection. Data elements should be selected with consideration for established clinical data standards, common data definitions, and whether patient identifiers will be used. It is also important to determine which elements are absolutely necessary and which are desirable but not essential. Once data elements have been selected, a data map should be created, and the data collection tools should be pilot tested. Overall, the choice of data elements should be guided by parsimony, validity, and a focus on achieving the registry's purpose.

Case Examples for Chapter 4

Case Example 5Selecting data elements for a registry

DescriptionThe Dosing and Outcomes Study of Erythropoiesis-stimulating Therapies (DOSE) Registry was designed to understand anemia management patterns and clinical, economic, and patient-reported outcomes in oncology patients treated in outpatient oncology practice settings across the United States. The prospective design of the DOSE Registry enabled data capture from oncology patients treated with erythropoiesis-stimulating therapies.
SponsorCentocor Ortho Biotech Services, LLC
Year Started2003
Year Ended2009
No. of Sites71
No. of Patients2,354


Epoetin alfa was approved for patients with chemotherapy-induced anemia in 1994. In 2002, the U.S. Food and Drug Administration approved a second erythropoiesis-stimulating therapy (EST), darbepoetin alfa, for a similar indication. While multiple clinical trials described outcomes following intervention with ESTs, little information was available on real-world practice patterns and outcomes in oncology patients. The registry team determined that a prospective observational effectiveness study in this therapeutic area was needed to gain this information. The three key challenges were to make the study representative of real-world practices and settings (e.g., hospital-based clinics, community oncology clinics); to collect data elements that were straightforward so as to minimize potential data collection errors; and to collect sufficient data to study effectiveness, while ensuring that the data collection remained feasible and time efficient for outpatient oncology clinics.

Proposed Solution

The registry team began selecting data elements by completing a thorough literature review. Because this would be one of the first prospective observational studies in this therapeutic area, the team wanted to ensure that study results could be presented to health care professionals and decisionmakers in a manner consistent with clinical trials, of which there were many. The team also intended to make the data reports from this study comparable with clinical trial reports. To meet these objectives, data elements (e.g., baseline demographics, dosing patterns, hemoglobin levels) similar to those in clinical trials were selected whenever possible, based on a thorough literature review.

For the patient-reported outcomes component of the registry, the team incorporated standard validated instruments. This decision allowed the team to avoid developing and validating new instruments and supported consistency with clinical trial literature, as many trials had incorporated these instruments. To capture patient-reported data, the team selected two instruments, the Functional Assessment of Cancer Therapy–Anemia (FACT-An) and the Linear Analog Scale Assessment (LASA) tool. The FACT-An tool, developed from the FACT-General scale, had been designed and validated to measure the impact of anemia in cancer patients. The LASA tool enables patients to report their energy level, activity level, and overall quality of life on a scale of 0 to 100. Both tools are commonly used to gather patient-reported outcomes data for cancer patients.

Following the literature review, an advisory board was convened to discuss the registry objectives, data elements, and study execution. The advisory board included representatives from the medical and nursing professions. The multidisciplinary board provided insights into both the practical and clinical aspects of the registry procedures and data elements. Throughout the process, the registry team remained focused on both the overall registry objectives and user-friendly data collection. In particular, the team worked to make each question clear and unambiguous in order to minimize confusion and enable a variety of site personnel, as well as the patients, to complete the registry data collection.


The registry was launched in 2003 as one of the first prospective observational effectiveness studies in this therapeutic area. Seventy-one sites and 2,354 patients enrolled in the study. The sites participating in the registry represented a wide geographic distribution and a mixture of outpatient practice settings.

Key Point

Use of common data elements, guided by a literature review, and validated patient-reported outcomes instruments enhanced data generalizability and comparability with clinical trial data. A multidisciplinary advisory board also helped to ensure collection of key data elements in an appropriate manner from both a clinical and practical standpoint.

For More Information

Larholt K, Burton TM, Hoaglin DC, et al. Clinical and patient-reported outcomes based on achieved hemoglobin levels in chemotherapy-treated cancer patients receiving erythropoiesis-stimulating agents. Commun Oncol. 2009;6:403–8.

Larholt K, Pashos CL, Wang Q, et al. Dosing and Outcomes Study of Erythropoiesis-Stimulating Therapies (DOSE): a registry for characterizing anaemia management and outcomes in oncology patients. Clin Drug Invest. 2008;28(3):159–67 [PubMed: 18266401].

Case Example 6Understanding the needs and goals of registry participants

DescriptionThe Prospective Registry Evaluating Myocardial Infarction: Events and Recovery (PREMIER) studied the health status of patients for one year after discharge for a myocardial infarction. The registry focused on developing a rich understanding of the patients' symptoms, functional status, and squality of life by collecting extensive baseline data in the hospital and completing followup interviews at 1, 6, and 12 months.
SponsorCV Therapeutics and CV Outcomes
Year Started2003
Year Ended2004
No. of Sites19
No. of Patients2,498


With the significant advances in myocardial infarction (MI) care over the past 20 years, many studies have documented the improved mortality and morbidity associated with these new treatments. These studies typically have focused on in-hospital care, with little to no followup component. As a result, information on the transition from inpatient to outpatient care has been lacking, as have data on health status outcomes.

PREMIER was designed to address these gaps by collecting detailed information on MI patients during the hospital stay and through followup telephone interviews conducted at 1, 6, and 12 months. The goal of the registry was to provide a rich understanding of patients' health status (their symptoms, function, and quality of life) 1 year after an acute MI. The registry also proposed to quantify the prevalence, determinants, and consequences of patient and clinical factors in order to understand how the structures and processes of MI care affect patients' health status.

To develop the registry data set, the team began by clearly defining the phases of care and recovery and identifying the clinical characteristics that were important in each of these phases. These included patient characteristics upon hospital arrival, details of inpatient care, and details of outpatient care. The team felt that information on each of these phases was necessary, since the variability of any outcome over 1 year may be explained by patient, inpatient treatment, or outpatient factors. Health status also includes many determinants beyond the clinical status of disease, such as access to care, socioeconomic status, and social support; the registry needed to collect these additional data in order to fully understand the health status outcomes.

Proposed Solution

While registries often try to include as many eligible patients and sites as possible by reducing the burden of data entry, this registry took an alternative approach. The team designed a data set that included more than 650 baseline data elements and more than 200 followup interview-assessed data elements. Instead of allowing retrospective chart abstraction, the registry required hospitals to complete a five-page patient interview while the patient was in the hospital. The registry demanded significant resources from the participating sites. For each patient, the registry required about 4 hours of time, with 15 minutes for screening, 2 hours for chart abstraction, 45 minutes for interviews, 45 minutes for data entry, and 15 minutes of a cardiologist's time to interpret the electrocardiograms and angiograms. A detailed, prespecified sampling plan was developed by each site and approved by the data coordinating center to ensure that the patients enrolled at each center were representative of all of the patients seen at that site.

The registry team developed this extremely detailed data set and data collection process through extensive consultations with the registry participants. The coordinators and steering committees reviewed the data set multiple times, with some sites giving extensive feedback. Throughout the development process, there was an ongoing dialog among the registry designers, the steering committee, and the registry sites.

The registry team also used standard definitions and established instruments whenever possible to enable the registry data to be cross-referenced to other studies and to minimize the training burden. The team used the American College of Cardiology Data Standards for Acute Coronary Syndromes for data definitions of any overlapping fields. To measure other areas of the patient experience, the team used the Patient Health Questionnaire to examine depression, the ENRICHD Social Support Inventory to measure social support, the Short Form-12 to quantify overall mental and physical health, and the Seattle Angina Questionnaire (SAQ) to understand the patients' perspective on how coronary disease affects their life.


The data collection burden posed some challenges. Two of the 19 sites dropped out of the registry soon after it began. Two other sites fell behind on their chart abstractions. Turnover of personnel and multiple commitments at participating sites also delayed the study.

Despite these challenges, the registry experienced very little loss of enthusiasm or loss of sites once it was up and running. The remaining 17 sites completed the registry and collected data on nearly 2,500 patients. In return for this data collection, sites enjoyed the academic productivity and collaborative nature of the study. The data coordinating center created a Web site that offered private groups for the principal investigators, so that each investigator had access to all of the abstract ideas and all of the research that was being done. This structure provided nurturing and support for the investigators, and they viewed the registry as a way to engage themselves and their institutions in research with a prominent, highly respected team.

On the patient side, the registry met followup goals. More than 85 percent of participants provided 12-month followup information. The registry team attributed this followup rate to the strong rapport that the interviewers developed with the patients during the course of the followup period.

Key Point

This example illustrates that there is no maximum or minimum number of data elements for a successful registry. Instead, a registry can best achieve its goals by ensuring that sufficient information is collected to achieve the purpose of the registry while remaining feasible for the participants. An open, ongoing dialog with the participants or a subgroup of participants can help determine what is feasible for a particular registry and to ensure that the registry will retain the participants for the life of the study.

For More Information

Spertus JA, Peterson E, Rumsfeld JS, et al. The Prospective Registry Evaluating Myocardial Infarction: Events and Recovery (PREMIER)– evaluating the impact of myocardial infarction on patient outcomes. Am Heart J. 2006;151(3):589–97. [PubMed: 16504619].

Case Example 7Using standardized data elements in a registry

DescriptionThe Caris Registry is a national, multicenter, Web-based registry that tracks long-term outcomes for patients who have undergone Caris Molecular Intelligence™ Services.
SponsorCaris Life Sciences
Year Started2009
Year EndedOngoing
No. of Sites96
No. of Patients>1400


Molecular biomarker data may be valuable in guiding treatment decisions for cancer patients. Caris Life Sciences offers a commercial molecular profiling service (Caris Molecular Intelligence™) that combines biomarker analysis of a patient's tumor with an analysis of the published scientific literature in order to report personalized, evidence-based treatment options. These data may impact the physicians' and patients' treatment decisions at one point in time, but collection of longitudinal data would allow for correlation of treatment recommendations to clinical outcomes. In addition, longitudinal data could support collaborative investigator-initiated research that may be focused on using molecular profiling as a tool to improve treatment selection and associated outcomes for patients with cancer.

Proposed Solution

The Caris Registry employs a scientifically valid and regulatory-compliant protocol that is intended to capture clinical disease, treatment, and outcome data over the course of five years from patients who have had Caris Molecular Intelligence™ Services performed. Medical history, disease status, treatments, and outcomes are captured at enrollment (defined as the date of the report) and every 9 months for 5 years. The registry is maintained as a limited data set and all biological and laboratory data is de-identified.

During the planning phase of the registry, the sponsor elected to use standardized data elements wherever possible, in order to maintain flexibility and to anticipate multiple future uses of registry data. The National Cancer Institute's Cancer Data Standards Registry and Repository (caDSR) standardized data dictionary contains common data elements (CDEs) that can be reused for multiple purposes. The registry used some of these CDEs exactly as they appear in the caDSR (e.g., demographics). Other data elements that the sponsor wished to collect were not present in the caDSR (e.g., “Did the patient receive molecular-guided therapy?”). For these elements, the sponsor collaborated with the Center for Biomedical Informatics and Information Technology group to create new CDEs that were incorporated into the caDSR data dictionary. Of the 100 clinical data elements in the registry, 87 were incorporated directly from the caDSR data dictionary and 13 were added to the data dictionary through collaboration with the National Cancer Institute.


To date, 1,400 patients from 96 centers across the United States have been enrolled in the Caris Registry. At least 1,124 of these patients have followup data capturing disease status, treatments and clinical outcomes and 500 of those have completed end of study reports capturing vital status and cancer related deaths.

In the first half of 2013, Caris agreed to participate in a retrospective study of registry data titled, “A Retrospective Investigation To Evaluate The Use Of Target Now™ Assay in Selecting Treatment in Patients with Advanced Stage Metastatic Cancer.”

Key Point

Common data elements endorsed by recognized standards organizations are available for registry planners and may be useful for registries in some disease areas. Use of CDEs can increase opportunities for standardized collaboration, linkage, and additional exploratory analysis.

For More Information

Sanders S, Schroeder W, Wright A, et al. The Caris registry: Building a biomarker-focused database to advance patient care; Abstract presented at the 2012 ASCO Annual Meeting; June 1–5, 2012; [August 27, 2012].

References for Chapter 4

Wattigney WA, Croft JB, Mensah GA, et al. Establishing data elements for the Paul Coverdell National Acute Stroke Registry: Part 1: proceedings of an expert panel. Stroke. 2003 Jan;34(1):151–6. [PubMed: 12511767]
Good PI. A manager's guide to the design and conduct of clinical trials. New York: John Wiley & Sons, Inc.; 2002.
Cannon CP, Battler A, Brindis RG, et al. American College of Cardiology key data elements and definitions for measuring the clinical management and outcomes of patients with acute coronary syndromes. A report of the American College of Cardiology Task Force on Clinical Data Standards (Acute Coronary Syndromes Writing Committee). J Am Coll Cardiol. 2001 Dec;38(7):2114–30. [PubMed: 11738323]
McNamara RL, Brass LM, Drozda JP Jr., et al. ACC/AHA key data elements and definitions for measuring the clinical management and outcomes of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (Writing Committee to Develop Data Standards on Atrial Fibrillation). Circulation. 2004 Jun 29;109(25):3223–43. [PubMed: 15226233]
Radford MJ, Arnold JM, Bennett SJ, et al. ACC/ AHA key data elements and definitions for measuring the clinical management and outcomes of patients with chronic heart failure: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (Writing Committee to Develop Heart Failure Clinical Data Standards): developed in collaboration with the American College of Chest Physicians and the International Society for Heart and Lung Transplantation: endorsed by the Heart Failure Society of America. Circulation. 2005 Sep 20;112(12):1888–916. [PubMed: 16162914]
National Cardiovascular Disease Registry. ICD Registry Data Collection. [July 16, 2013]. https://www​​/webncdr/icd/home/datacollection.
National Cancer Institute. Cancer Data Standards Registry and Repository (caDSR). [August 15, 2012]. http://ncicb​.nci.nih​.gov/infrastructure/cacore_overview​/cadsr.
National Cancer Institute. CTEP Common Data Elements. [July 16, 2013]. https://wiki​.nci.nih​.gov/display/caDSR/CTEP+Common+Data+Elements.
The North American Association of Central Cancer Registries. Data Standards & Data Dictionary (Volume II). [August 15, 2012]. http://www​​/StandardsandRegistryOperations​/VolumeII.aspx.
The American College of Surgeons Commission on Cancer. National Quality Forum Endorsed Commission on Cancer Measures for Quality of Cancer Care for Breast and Colorectal Cancers. [August 15, 2012]. http://www​​/qualitymeasures.html.
DeVivo M, Biering-Sorensen F, Charlifue S, et al. International Spinal Cord Injury Core Data Set. Spinal Cord. 2006 Sep;44(9):535–40. [PubMed: 16955073]
National Center for Injury Prevention and Control. DEEDS – Data Elements for Emergency Department Systems. [August 15, 2012]. http://www​​/pub-res/deedspage.htm.
National Institute of Neurological Disorders and Stroke. Common Data Elements. [August 15, 2012]. http://www​.commondataelements​
Centers for Disease Control and Prevention. Vaccines & Immunizations. IIS Recommended Core Data Elements. [August 15, 2012]. http://www​​/programs/iis/core-data-elements​.html.
American Immunization Registry Association. Immunization Registry Functional Standards. [July 16, 2013]. http://www​.immregistries​.org/resources/standards​/functional-standards.
Clinical Data Interchange Standards Consortium. Clinical Data Acquisition Standards and Harmonization (CDASH). [August 15, 2012]. http://www​
Biomedical Research Integrated Domain Group (BRIDG). [August 15, 2012]. http://www​
HL7. HL7 and CDISC mark first anniversary of renewed associate charter agreement, joint projects result from important healthcare-clinical research industry collaboration. [August 15, 2012]. [press release]. http://www​​/public​_temp_A7EA2C6E-1C23-BA17-0CA05816D86FD311​/pressreleases/20051012b.pdf.
Kim K. iHealthReports. California HealthCare Foundation. Clinical data standards in health care: five case studies. [August 15, 2012]. http://www​​.cfm?itemID=112795.
Imel M. A closer look: the SNOMED clinical terms to ICD-9-CM mapping. J AHIMA. 2002 Jun;73(6):66–9. quiz 71-2. [PubMed: 12066386]
Yancy CW, Fonarow GC, Albert NM, et al. Influence of patient age and sex on delivery of guideline-recommended heart failure care in the outpatient cardiology practice setting: findings from IMPROVE HF. Am Heart J. 2009 Apr;157(4):754–62 e2. [PubMed: 19332206]
Institute of Medicine; Committee on Quality of Health Care in America. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington DC: National Academy Press, Institute of Medicine; 2001. [PubMed: 25057539]
U.S. Food and Drug Administration. Guidance for Industry: Patient Reported Outcome Measures: Use in Medical Product Development and Labeling Claims. Dec, 2009. [August 15, 2012]. http://www​​/Drugs/GuidanceComplianceRegulatoryInformation​/Guidances/UCM193282.pdf. [PMC free article: PMC1629006] [PubMed: 17034633]
Goldberg J, Gelfand HM, Levy PS. Registry evaluation methods: a review and case study. Epidemiol Rev. 1980;2:210–20. [PubMed: 7000537]
Sorensen HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol. 1996 Apr;25(2):435–42. [PubMed: 9119571]
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb 8;1(8476):307–10. [PubMed: 2868172]


  • PubReader
  • Print View
  • Cite this Page

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...