Data Sources for Registries

Richard E Gliklich; Nancy A Dreyer

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gliklich RE, Dreyer NA, editors. Registries for Evaluating Patient Outcomes: A User's Guide. 2nd edition. Rockville (MD): Agency for Healthcare Research and Quality (US); 2010 Sep.

See the latest edition of this User's Guide.

Cover of Registries for Evaluating Patient Outcomes: A User's Guide

Registries for Evaluating Patient Outcomes: A User's Guide. 2nd edition.

Show details

< Prev Next >

Chapter 6Data Sources for Registries

Introduction

Identification and evaluation of suitable data sources should be done within the context of the registry purpose and availability of the data of interest. A single registry may have multiple purposes and integrate data from various sources. While some data in a registry are collected directly for registry purposes (primary data collection), important information also can be transferred into the registry from existing databases. Examples include demographic information from a hospital admission, discharge, and transfer system; medication use from a pharmacy database; and disease and treatment information, such as details of the coronary anatomy and percutaneous coronary intervention from a catheterization laboratory information system, electronic medical record, or medical claims databases. In addition, observational studies can generate as many hypotheses as they test, and secondary sources of data can be merged with the primary data collection to allow for analyses of questions that were unanticipated when the registry was conceived.

This chapter will review the various sources of both primary and secondary data, comment on their strengths and weaknesses, and provide some examples of how data collected from different sources can be integrated to help answer important questions.

Types of Data

The types of data to be collected are guided by the registry design and data collection methods. The form, organization, and timing of required data are important components in determining appropriate data sources. Data elements can be grouped into categories identifying the specific variable or construct they are intended to describe. One framework for grouping data elements into categories follows:

Patient identifiers: Some registries may use patient identifiers to link data. In these registries, data elements are linked to the specific patient through a unique patient identifier or registry identification number. The use of patient identifiers may not be possible in all registries due to privacy regulations. (See Chapter 8.)
Patient selection criteria: The eligibility criteria in a registry protocol or study plan determine the group that will be included in the registry. These criteria may be very broad or restrictive, depending on the purpose. Criteria often include demographics (e.g., target age group), a disease diagnosis, a treatment, or diagnostic procedures and laboratory tests. Health care provider, health care facility or system, and insurance criteria may also be included in certain types of registries (e.g., following care patterns of specific conditions at large medical centers compared with small private clinics).
Treatments and tests: Treatments and tests are necessary to describe the natural history of patients. Treatments can include pharmaceutical, biotechnology, or device therapies, or procedures such as surgery or radiation. Evaluation of the treatment itself is often a primary focus of registries (e.g., treatment safety and effectiveness over 5 years). Results of laboratory testing or diagnostic procedures may be included as registry outcomes and may also be used in defining a diagnosis or condition of interest.
Confounders: Confounders are elements or factors that have an independent association with the outcomes of interest. These are particularly important because patients are typically not randomized to therapies in registries. Confounders such as comorbidities (disease diagnoses and conditions) can confuse analysis results and interpretation of causality. Information on the health care provider, treatment facility, concomitant therapies, or insurance may also be considered.
Outcomes: The focus of this document is on patient outcomes. Outcomes are end results and are defined for each condition. Outcomes may include patient-reported outcomes (PROs). In some registries, surrogate markers, such as biomarkers or other interim outcomes (e.g., hemoglobin A1c levels in diabetes) that are highly reflective of the longer term end results are used.

Before considering the potential sources for registry data, it is important to understand the types of data that may be collected in a registry. Several types of data that may be gathered from other sources in some registries are described below.

Cost/resource utilization—Cost and/or resource utilization data may be necessary to examine the cost-effectiveness of a treatment. Resource utilization data reflect the resources consumed (both services and products), while cost data reflect a monetary value assigned to those resources. Examples include the actual cost of the treatment (e.g., medication, screening, procedure) and the associated costs of the intervention (e.g., treatment of side effects, expenses incurred traveling to and from clinicians’ appointments). Costs that are avoided due to the treatment (e.g., the cost to treat the avoided disease) and costs related to lost workdays may also be important to collect, depending on the objectives of the study. Registries that collect cost data over long periods of time (i.e., many years) may need to adjust costs for inflation during the analysis phase of the study. The types of data elements included in this framework are further described in Chapter 5 and below with respect to their source or the utility of the data for linking to other sources. Many of these may be available through data sources outside of the registry system.

Patient identifiers—Depending on the data sources required, some registries may utilize certain personal identifiers for patients in order to locate them in other databases and link the data. For example, Social Security Numbers (SSNs), as well as a combination of other personal identifiers, can be utilized to identify individuals in the National Death Index (NDI). Patient contact information, such as address and phone numbers, may be collected to support tracking of participants over time. Information for additional contacts (e.g., family members) may be collected to support followup in cases where the patient cannot be reached. In many cases, patient informed consent and appropriate privacy authorizations are required to utilize personal identifiers for registry purposes, and the use of personal identifiers may not be possible in some registries; Chapter 8 discusses the legal requirements for including patient identifiers. Systems and processes must be in place to manage security and confidentiality of these data. Confidentiality can be enhanced by assigning a registry-specific identifier via a crosswalk algorithm, as discussed below. Demographics, such as date of birth (to calculate age at any time point), gender, and ethnicity, are typically collected and may be used to stratify the registry population.

Disease/condition—Disease or condition data include those related to the disease or condition of focus for the registry and may incorporate comorbidities. Elements of interest related to the confirmation of a diagnosis or condition could be date of diagnosis and the specific diagnostic results that were used to make the diagnosis, depending on the purpose of the registry. Disease or condition is often a primary eligibility or outcome variable in registries, whether the intent is to answer specified treatment questions (e.g., measure effectiveness or safety) or to describe the natural history. This information may also be collected in constructing a medical history for a patient. In addition to “yes” or “no” to indicate presence or absence of the diagnosis, it may be important to capture responses such as “missing” or “unknown.”

Treatment/therapy—Treatment or therapy data include specific identifying information for the primary treatment (e.g., drug name or code, biologic, device product or component parts, or surgical intervention, such as organ transplant or coronary artery bypass graft) and may include information on concomitant treatments. Dosage (or parameters for devices), route of administration, and prescribed exposure time, such as daily or three times weekly for four weeks, should be collected. Pharmacy data may include dispensing information, such as the primary date of dispensation and subsequent refill dates. Data in device registries can include the initial date of dispensation or implantation and subsequent dates and specifics of required evaluations or modifications. Compliance data may also be collected if pharmacy representatives or clinic personnel are engaged to conduct and report pill counts or volume measurements on refill visits or return visits for device evaluations and modifications.

Laboratory/procedures—Laboratory data include a broad range of testing, such as blood, tissue, catheterization, and radiology. Specific test results, units of measure, and laboratory reference ranges or parameters are typically collected. Laboratory databases are becoming increasingly accessible for electronic transfer of data, whether through a system-wide institutional database or a private laboratory database. Diagnostic testing or evaluation may include procedures such as psychological or behavioral assessments. Results of these procedures and clinician exam procedures may be difficult to obtain through data sources other than the patient medical record.

Biosamples—The increased collection, testing, and storage of biological specimens as part of a registry (or independently as a potential secondary data source such as those described further below) provides another source of information that includes both information from genetic testing (such as genetic markers) and actual specimens.

Health care provider characteristics—Information on the health care provider (e.g., physician, nurse, or pharmacist) may be collected, depending on the purpose of the registry. Training, education, or specialization may account for differences in care patterns. Geographic location has also been used as an indicator of differences in care or medical practice.

Hospital/clinic/health plan—System interactions include office visits, outpatient clinic visits, emergency room visits, inpatient hospitalizations, procedures, and pharmacy visits, as well as associated dates. Data on all procedures as defined by the registry protocol or plan (e.g., physical exam, psychological evaluation, chest x-ray, CAT scan), including measurements, results, and units of measure where applicable, should be collected. Cost accounting data may also be available to match these interactions and procedures. Descriptive information related to the points of care may be useful in capturing differences in care patterns and can also be used to track patterns of referral of care (e.g., outpatient clinic, inpatient hospital, academic center, emergency room, pharmacy).

Insurance—The insurance system or payer claims data can provide useful information on interactions with the health care systems, including visits, procedures, inpatient stays, and costs associated with these events. When using these data, it is important to understand what services were covered under the various insurance plans at the time the data were collected, as this may affect utilization patterns.

Data Sources

Data sources are classified as primary or secondary based on the relationship of the data to the registry purpose. Primary data sources incorporate data collected for direct purposes of the registry (i.e., primarily for the registry). Primary data sources are typically used when the data of interest are not available elsewhere or, if available, are unlikely to be of sufficient accuracy and reliability for the planned analyses and uses. Primary data collection increases the probability of completeness, validity, and reliability because the registry drives the methods of measurement and data collection. (See Chapter 5.) These data are prospectively planned and collected under the direction of a protocol or study plan, using common procedures and the same format across all registry sites and patients. The data are readily integrated for tracking and analyses. Since the data entered can be traced to the individual who collected them, primary data sources are more readily reviewed through automated checks or followup queries from a data manager than is possible with many secondary data sources.

Secondary data sources are comprised of data originally collected for purposes other than the registry under consideration (e.g., standard medical care, insurance claims processing). Data that are collected as primary data for one registry would be considered secondary data from the perspective of a second registry if linking were done. These data are often stored in electronic format and may be available for use with appropriate permissions. Data from secondary sources may be used in two ways: (1) the data may be transferred and imported into the registry, becoming part of the registry database, or (2) the secondary data and the registry data may be linked to create a new, larger dataset for analysis. This chapter primarily focuses on the first use for secondary data, while Chapter 7 discusses the complexities of linking registries with other databases.

When considering secondary data sources, it is important to note that health professionals are accustomed to entering the data for defined purposes, and additional training and support for data collection are not required. Often, these data are not constrained by a data collection protocol and they represent the diversity observed in real-world practice. However, there may be increased probability of errors and underreporting because of inconsistencies in measurement, reporting, and collection. Staff changes can further complicate data collection and may affect data quality. There may also be increased costs for linking the data from the secondary source to the primary source and dealing with any potential duplicate or unmatched patients.

Sufficient identifiers are also necessary to accurately match data between the secondary sources and registry patients. The potential for mismatch errors and duplications must be managed. (See Case Example 19.) The complexity and obligations inherent in the collection and handling of personal identifiers have previously been mentioned (e.g., obligations for informed consent, appropriate data privacy, and confidentiality procedures).

Case Example 19

Integrating Data From Multiple Sources With Patient ID Matching. In the 1990s, the Rhode Island Department of Health recognized that its data on children’s health were fragmented and program specific. The State had many children’s health (more...)

Some of the secondary data sources do not collect information at a specific patient level but are anonymous and intended to reflect group or population estimates. For example, census tract or ZIP-Code-level data are available from the Census Bureau and can be merged with registry data. These data can be used as “ecological variables” to support analyses of income or education when such socioeconomic data are missing from registry primary data collection. The intended use of the data elements will determine whether patient-level information is required.

The potential for data completeness, variation, and specificity must be evaluated in the context of the registry and intended use of the data. It is advisable to have a solid understanding of the original purpose of the secondary data collection, including processes for collection and submission, and verification and validation practices. Questions to ask include: Is data collection passive or active? Are standard definitions or codes used in reporting data? Are standard measurement criteria or instruments utilized (e.g., diagnoses, symptoms, quality of life)? The existence and completeness of claims data, for example, will depend on insurance company coverage policies. One company may cover many preventive services, whereas another may have more restricted coverage. Also, coverage policies can change over time. These variations must be known and carefully documented to prevent misinterpretation of use rates. Additionally, secondary data may not all be collected in the format (e.g., units of measure) required for registry purposes and may require transformation for integration and analyses.

An overview of secondary data sources that may be used for registries is given below. Table 8 identifies some key strengths and limitations of the identified data sources.

Table 8

Key Data Sources—Strengths and Limitations.

Medical chart abstraction—Medical charts primarily contain information collected as a part of routine medical care. These data reflect the practice of medicine or health care in general and at a specific level (e.g., geographical, by specialty care provider). Charts also reflect uncontrolled patient behavior (e.g., noncompliance). Collection of standard medical practice data is useful in looking at treatments and outcomes in the real world, including all of the confounders that affect the measurement of effectiveness (as distinguished from efficacy) and safety outside of the controlled conditions of a clinical trial. Chart documentation is often much poorer than one might expect, and there may be more than one patient-specific medical record (e.g., hospital and clinical records). A pilot collection is recommended for this labor-intensive method of data collection to explore the availability and reproducibility of the data of interest. It is important to recognize that physicians and other clinicians do not generally use standardized data definitions in entering information into medical charts, meaning that one clinician’s documented diagnosis of “chronic sinusitis” or “osteoarthritis” or description of “pedal edema” may differ from that of another clinician.

Electronic health records—The use of electronic health records (EHRs), sometimes called electronic medical records (EMRs), is increasing. EHRs have an advantage over paper medical records because the data in some EHRs can be readily searched and integrated with other information (e.g., laboratory data). The ease with which this is accomplished depends on whether the information is in a relational database or exists as scanned documents. An additional challenge relates to terminology and relationships. For example, including the term “fit” in a search for patients with epilepsy can yield a record for someone who was noted as “fit,” meaning “healthy.” Relationships can also be difficult to identify through searches (e.g., “Patient had breast cancer” vs. “Patient’s mother had breast cancer”). The quality of the information has the same limitations as described in the paragraph above. Both the availability and standardization of EHR data are expected to grow significantly in the near future. The Department of Veterans Affairs Computerized Patient Record System (CPRS) is already estimated to cover 4.2 million lives, and some data suppliers cite individual datasets exceeding 10 million lives.¹ Further, it is anticipated that more significant standardization of EHR data will result from the “EHR certification” requirements being developed in phases under the American Recovery and Reinvestment Act of 2009 (ARRA). Such standardization should increase not only the availability and utility of EHR records, but also the ability to aggregate them into larger data sources.

Institutional or organizational databases—Institutional or organizational databases may be evaluated as potential sources of a wide variety of data. System-wide institutional or hospital databases are central data repositories, or data warehouses, that are highly variable from institution to institution. They may include a portion of everything from admission, discharge, and transfer information to data reflecting diagnoses and treatment, pharmacy prescriptions, and specific laboratory tests. Laboratory test data might be chemistry or histology laboratory data, including patient identifiers with associated dates of specimen collection and measurement, results, and standard “normal” or reference ranges. Catheterization laboratory data for cardiac registries may be accessible and may include details on the coronary anatomy and percutaneous coronary intervention. Other organizational examples are computerized order entry systems, pharmacies, blood banks, and radiology departments.

Administrative databases—Private and public medical insurers collect a wealth of information in the process of tracking health care, evaluating coverage, and managing billing and payment. Information in the databases includes patient-specific information (e.g., insurance coverage and copays; identifiers such as name, demographics, SSN or plan number, and date of birth) and health care provider descriptive data (e.g., identifiers, specialty characteristics, locations). Typically, private insurance companies organize health care data by physician care (e.g., physician office visits) and hospital care (e.g., emergency room visits, hospital stays). Data include procedures and associated dates, as well as costs charged by the provider and paid by the insurers. Amounts paid by insurers are often considered proprietary and unavailable. Standard coding conventions are utilized in the reporting of diagnoses, procedures, and other information. Coding conventions include the Current Procedure Terminology (CPT) for physician services and International Classification of Diseases (ICD) for diagnoses. The databases serve the primary function of managing and implementing insurance coverage, processing, and payment.

Medicare and Medicaid claims files are two examples of commonly used administrative databases. The Medicare program covers nearly 45 million people in the United States, including almost everyone over the age of 65, people under the age of 65 who qualify for Social Security Disability, and people with end-stage renal disease.² The Medicaid program covers low-income children and their mothers; pregnant women; and blind, aged, or disabled people. As of 2007, approximately 40 million people were covered by Medicaid.³ Medicare and Medicaid claims files, maintained by the Centers for Medicare & Medicaid Services (CMS), can be obtained for inpatient, outpatient, physician, skilled nursing facility, durable medical equipment, and hospital services. As of 2006, Medicare claim files for prescription drugs can also be obtained. The claims files generally contain person-specific data on providers, beneficiaries, and recipients, including individual identifiers that would permit the identity of a beneficiary or physician to be deduced. Data with personal identifiers are clearly subject to privacy rules and regulations. As such, the information is confidential and to be used only for reasons compatible with the purpose(s) for which the data are collected. The Research Data Assistance Center (ResDAC), a CMS contractor at the University of Minnesota, provides assistance to academic, government, and nonprofit researchers interested in using Medicare and/or Medicaid data for their research.⁴

Death and birth records—Death indexes are national databases tracking population death data (e.g., the NDI⁵ and the Death Master File [DMF] of the Social Security Administration [SSA]⁶). Data include patient identifiers, date of death, and attributed causes of death. These indexes are populated through a variety of sources. For example, the DMF includes death information on individuals who had an SSN and whose death was reported to the SSA. Reports may come in to the SSA by different paths, including from survivors or family members requesting benefits or from funeral homes. However, because of the importance of tracking Social Security benefits, all States, nursing homes, and mortuaries are required to report all deaths to the SSA, thus ensuring virtually 100-percent complete mortality ascertainment for those eligible for SSA benefits. The NDI is updated annually with computer death records submitted by State vital statistics offices and has all, or nearly all, deaths in the United States. The NDI can be used to provide both fact of death and cause of death, as recorded on the death certificate. Cause-of-death data in the NDI are relatively reliable (93–96 percent) compared with death certificates.⁷^,⁸ Time delays in death reporting should be considered when using these sources, and vital status should not be assumed to be alive by the absence of information at a recent point in time. These indexes are a valuable source of data for death tracking. Of course, mortality data can be accessed directly through queries of State vital statistics offices and health departments when targeting information on a specific patient or within a State. Likewise, birth certificates are available through State departments and may be useful in registries of children or births.

Area-level databases—Two sources of area-level data are the U.S. Census and the Area Resource File (ARF). The U.S. Census Bureau databases⁹ provide population-level data utilizing survey sampling methodology. The Census Bureau conducts many different surveys, the main one being the population census. The primary use of the data is to determine the number of seats assigned to each State in the House of Representatives, although the data are used for many other purposes. These surveys calculate estimates through statistical processing of the sampled data. Estimates can be provided with a broad range of granularity, from population numbers for large regions (e.g., specific States), to ZIP Codes, all the way down to a household level (e.g., neighborhoods identified by street addresses). Information collected includes demographic, gender, age, education, economic, housing, and work data. The data are not collected at an individual level but may serve other registry purposes, such as understanding population numbers in a specific region or by specific demographics. The ARF is maintained by the Health Resources and Services Administration, which is part of the Department of Health and Human Services. The ARF includes county-level data on health facilities, health professions, measures of resource scarcity, health status, economic activity, health training programs, and socioeconomic and environmental characteristics.¹⁰

Provider-level databases—Data on medical facilities and physicians may be important for categorizing registry data or conducting subanalyses. Two sources of such data are the American Hospital Association’s Annual Survey Data and the American Medical Association’s Physician Masterfile Data Collection. The Annual Survey Data is a longitudinal database that collects 700 data elements, covering organizational structure, personnel, hospital facilities and services, and financial performance, from more than 6,000 hospitals in the United States.¹¹ Each hospital in the database has a unique ID, allowing the data to be linked to other sources; however, there is a data lag of about 2 years, and the data may not provide enough nuanced detail to support some analyses of cost or quality of care. The Physician Masterfile Data Collection contains current and historic data on nearly one million physicians and residents in the United States. Data on physician professional medical activities, hospital and group affiliations, and practice specialties are collected each year.

Existing registry and other databases—There are numerous national and regional registries and other databases that may be leveraged for incorporation into other registries (e.g., disease-specific registries managed by nonprofit organizations, professional societies, or other entities). An example is the National Marrow Donor Program (NMDP),¹² a global database of cord blood units and volunteers who have consented to donate marrow and blood cells. Databases maintained by the NMDP include identifiers and locators in addition to information on the transplants, such as samples from the donor and recipient, histocompatibility, and outcomes. NMDP actively encourages research and utilization of registry data through a data application process and submission of research proposals.

In accessing data from one registry for the purposes of another, it is important to recognize that data may have changed during the course of the source registry, and this may or may not have been well documented by the providers of the data. For example, in the United States Renal Data System (USRDS),¹³ a vital part of personal identification is CMS 2728, an enrollment form that identifies the incident data for each patient as well as other pertinent information, such as the cause of renal failure, initial therapy, and comorbid conditions. Originally created in 1973, this form is in its third version, having been revised in 1995 and again in 2005. Consequently, there are data elements that exist in some versions and not others. In addition, the coding for some variables has changed over time. For example, race has been redefined to correspond with Office of Management and Budget directives and Census Bureau categories. Furthermore, form CMS 2728 was optional in the early years of the registry, so until 1983 it was filled out for only about one-half of the subjects. Since 1995, it has been mandatory for all persons with end-stage renal disease. These changes in form content, data coding, and completeness would not be evident to most researchers trying to access the data.

Other Considerations for Secondary Data Sources

The discussion below focuses on logistical and data issues to consider when incorporating data from other sources. Chapter 10 fully explores data collection, management, and quality assurance for registries.

Before incorporating a secondary data source into a registry, it is critical to consider the potential impact of the data quality of the secondary data source on the overall data quality of the registry. The potential impact of quality issues in the secondary data sources depends on how the data are used in the primary registry. For example, quality would be significant for secondary data that are intended to be populated throughout the registry (i.e., used to populate specific data elements in the entire registry over time), particularly if these populated data elements are critical to determining a primary outcome. Quality of the secondary data would have less effect on overall registry quality if the secondary data are to be linked to registry data only for a specific analytic study. For more information on data quality, see Chapter 10.

The importance of patient identifiers for linking to secondary data sources cannot be overstated. Multiple patient identifiers should be used, and primary data for these identifiers should not be entered into the registry unless the identifying information is complete and clear. While an SSN is very useful, high-quality probabilistic linkages can be made to secondary data sources using various combinations of such information as name (last, middle initial, and first), date of birth, and gender. For example, the NDI will make possible matches when at least one of seven matching conditions is met (e.g., one matching condition is “exact month and day of birth, first name, and last name”). As noted earlier, the various types of data (e.g., personal history, adverse events, hospitalization, and drug use) have to be linked through a common identifier. It is usual in clinical trials to embed some intelligence into that identifier, such as SSN, initials, or site identifiers. While this may make sense for a closed system, it raises privacy concerns. A more complete discussion of both statistical and privacy issues in linkage is provided in Chapter 7.

The best identifier is one that is not only unique but has no embedded personal identification, unless that information is scrambled and the key for unscrambling it is stored remotely and securely. The group operating the registry should have a process by which each new entry to the registry is assigned a unique code and there is a crosswalk file to enable the system to append this identifier to all new data as they are accrued. The crosswalk file should not be accessible by persons or entities outside the management group.

In addition, consideration should be given to the fact that a registry may need to accept and link datasets from more than one outside organization. Each institution contributing data to the registry will have unique requirements for patient data, access, privacy, and duration of use. While having identical agreements with all institutions would be ideal, this may not always be possible from a practical perspective. Yet all registries have resource constraints, and decisions about including certain institutions have to be determined based on the resources available in order to negotiate specialized agreements or to maintain specialized requirements. Agreements should be coordinated as much as possible so that the function of the registry is not greatly impaired by variability among agreements. All organizations participating in the registry should have a common understanding of the rules regarding access to the data. Although exceptions can be made, it should be agreed that access to data will be based on independent assessment of research protocols and that participating organizations will not have veto power over access.

When data from secondary sources are utilized, agreements should specify ownership of the source data and clearly permit data use by the recipient registry. The agreements should also specify the roles of each institution, its legal responsibilities, and any oversight issues. It is critical that these issues and agreements be put in place before data are transferred so that there are no ambiguities or unforeseen restrictions on the recipient registry later on.

Some registries may wish to incorporate data from more than one country. In these cases, it is important to ensure that the data are being collected in the same manner in each country or to plan for any necessary conversion. For example, height and weight data collected from sites in Europe will likely be in different units than height and weight data collected from sites in the United States. Laboratory test results may also be reported in different units, and there may be variations in the types of pharmaceutical products and medical devices that are approved for use in the participating countries. Understanding these issues prior to incorporating secondary data sources from other countries is extremely important to maintain the integrity and usefulness of the registry database.

When incorporating other data sources, consideration should also be given to the registry update schedule. A mature registry will usually have a mix of data update schedules. The registry may receive an annual update of large amounts of data, or there could be monthly, weekly, or even daily transfers of data. Regardless of the schedule of data transfer, routine data checks should be in place to ensure proper transfer of data. These should include simple counts of records as well as predefined distributions of key variables. Conference calls or even routine meetings to go over recent transfers will help avoid mistakes that might not otherwise be picked up until much later. An example of the need for regular communication is a situation that arose with the United States Renal Data System a few years ago. The United Network for Organ Sharing (UNOS) changed the coding for donor type in their transplant records. This resulted in an apparent 100-percent loss of living donors in a calendar year. The change was not conveyed to USRDS and was not detected by USRDS staff. After USRDS learned about the change, standard analysis files that had been sent to researchers with the errors had to be replaced.

Distributed data networks are another model for sharing data. In a distributed data network, data sharing may be limited to the results of analyses or aggregated data only. There is much interest in the potential of distributed data networks, particularly for safety monitoring or public health surveillance. However, the complexities of data sharing within a distributed data network are still being addressed, and it is premature to discuss good practice for this area.

Summary

In summary, a registry is not a static enterprise. The management of registry data sources requires attention to detail, constant feedback to all participants, and a willingness to make adjustments to the operation as dictated by changing times.

References for Chapter 6

1.: Federal Coordinating Council for Comparative Effectiveness Research. Report to the President and the Congress. U.S. Department of Health and Human Services; Jun 30, 2009. Available at http://www.hhs.gov/recovery/programs/cer/cerannualrpt.pdf.
2.: Kaiser Family Foundation. Medicare Now and in the Future. [Accessed July 10, 2009]. Available at http://www.kff.org/medicare/upload/7821.pdf.
3.: DeNavas-Walt C, Proctor BD, Smith JC. Current Population Reports. Washington, D.C: U.S. Bureau of the Census; 2008. Income, poverty, and health insurance. Coverage in the United States: 2007. pp. 60–235. Available at http://www.census.gov/prod/2008pubs/p60-235.pdf.
4.: Research Data Assistance Center. [Accessed July 9, 2009]. Available at http://www.resdac.umn.edu.
5.: National Center for Health Statistics. [Accessed July 9, 2009]. Available at http://www.cdc.gov/nchs/ndi.htm.
6.: Social Security Administration. Death Master File. National Technical Information Service; [Accessed July 9, 2009]. http://www.ntis.gov/products/ssa-dmf.aspx.
7.: Doody MM, Hayes HM, Bilgrad R. Comparability of National Death Index Plus and standard procedures for determining causes of death in epidemiologic studies. Ann Epidemiol. 2001;11(1):46–50. [PubMed: 11164119]
8.: Sathiakumar N, Delzell E, Abdalla O. Using the National Death Index to obtain underlying cause of death codes. J Occup Environ Med. 1998;40(9):808–13. [PubMed: 9777565]
9.: U.S. Bureau of the Census. [Accessed July 9, 2009]. Available at www.census.gov.
10.: Health Resources and Services Administration. Area Resource File (ARF). [Accessed July 9, 2009]. Available at http://www.arfsys.com/
11.: American Hospital Association. AHA Data and Directories. [Accessed July 9, 2009]. Available at http://www.aha.org/aha/resource-center/Statistics-and-Studies/data-and-directories.html.
12.: National Marrow Donor Program. [Accessed July 9, 2009]. Available at http://www.marrow.org.
13.: United States Renal Database. [Accessed July 9, 2009]. Available at http://www.usrds.org.

Bookshelf ID: NBK49435

Contents

< Prev Next >

PubReader
Print View
Cite this Page
Gliklich RE, Dreyer NA, editors. Registries for Evaluating Patient Outcomes: A User's Guide. 2nd edition. Rockville (MD): Agency for Healthcare Research and Quality (US); 2010 Sep. Chapter 6, Data Sources for Registries.
PDF version of this title (2.2M)

In this Page

Introduction
Types of Data
Data Sources
Other Considerations for Secondary Data Sources
Summary
References for Chapter 6

Other titles in these collections

Recent Activity

Clear Turn Off Turn On

Data Sources for Registries - Registries for Evaluating Patient Outcomes: A User...
Data Sources for Registries - Registries for Evaluating Patient Outcomes: A User's Guide

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf