U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct.

Cover of Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2

Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet].

Show details

Chapter 4Obtaining Data From Electronic Health Records

, M.P.H., D.Sc., Professor, , M.H.I. M.D. Ph.D., (lead author), Assistant Professor, Assistant Director, , M.D. Ph.D., Associate Professor, and , Ph.D., Assitant Professor of Medicine.

Author Information and Affiliations


There is growing interest in using data captured in electronic health records (EHRs) for patient registries. Both EHRs and patient registries capture and use patient-level clinical information, but conceptually, they are designed for different purposes. A patient registry is defined as “an organized system that uses observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure and that serves one or more predetermined scientific, clinical, or policy purposes.”1

An EHR is an electronic system used and maintained by healthcare systems to collect and store patients’ medical information.c EHRs are used across clinical care and healthcare administration to capture a variety of medical information from individual patients over time, as well as to manage clinical workflows. EHRs contain different types of patient-level variables, such as demographics, diagnoses, problem lists, medications, vital signs, and laboratory data. According to the National Academies of Medicine, an EHR has multiple core functionalities, including the capture of health information, orders and results management, clinical decision support, health information exchange, electronic communication, patient support, administrative processes, and population health reporting.2

In summary, registries are patient-centered, purpose-driven, and designed to derive information on defined exposures and health outcome. In contrast, EHRs are visit-centered and transactional. Despite these differences, EHRs capture a wealth of data that is relevant to patient registries. EHRs also may assist in certain functions that a patient registry requires (e.g., data collection, data cleaning, data storage), and a registry may augment the value of the information collected in an EHR (e.g., comparative safety, effectiveness and value, population management, quality reporting).3

EHRs provide a unique opportunity for health systems to develop internal registries or contribute to external registries. Within a health system, registries are often developed by integrating registry functionalities with existing EHR platforms (i.e., EHR-integrated registries); however, these registries are limited to the health system’s patient population and may be unable to capture longitudinal data from different provider settings. Registries that capture EHR data from multiple health systems typically interface with EHRs to receive data on an interval basis (i.e., EHR-linked or EHR-reported registries), although automating such efforts and creating a bidirectional exchange of information are still challenging.

The Meaningful Use program (see Chapter 1) has propelled the development of both EHR-linked and EHR-integrated registries. For example, EHR-integrated registries have expanded to meet EHR certification requirements and to help health systems meet requirements for workflow efficiency and quality improvement to achieve value-based criteria (e.g., improving population health). EHR-linked registries have grown as the Meaningful Use program specifically requires the reporting of EHR data to external registries (e.g., public health registries, quality reporting registries).4 Meaningful Use Stage-1 provided an optional objective (which became a mandatory objective in Meaningful Use Stage-2) for eligible hospitals and professionals to submit EHR-extracted electronic data to immunization registries.5 Meaningful Use Stage-2 further expanded EHR reporting to cancer registries and other specialized registries (e.g., birth defects, chronic diseases, and traumatic injury registries).6

Driven in large part by Meaningful Use, EHR vendors and clinical providers are incentivized to develop processes that would facilitate the design and launch of EHR-based registries in the United States. Yet, despite these incentives, the practice of using EHR-based registries is still relatively immature and, like all evolving research programs, faces many challenges.7

The purpose of this chapter is to describe the opportunities and challenges related to fully integrating or linking EHRs and patient registries. The chapter reviews common and emerging EHR data types that can be incorporated in registries, provides sample use cases of integrating EHRs and registries, and proposes a series of hypothetical technical architectures to link or integrate a registry with an EHR. The chapter closes with a discussion of possible future directions for EHR-registry integration. Key questions to consider when planning to incorporate data from EHRs as well as other sources are provided in Appendix B.

Common and Emerging EHR Data Types

EHRs provide various types of data that can be linked, integrated, or merged directly into a registry. The Meaningful Use program has led to the collection of a Common Clinical Data Set (CCDS) across most providers. These data are now generally available in EHRs; the data that are commonly available will likely continue to expand as Office of the National Coordinator, under the 21st Century Cures Act, moves toward building Core Data for Interoperability (USCDI) requirement.8 EHRs can also provide data types of emerging interest to registries. Both types are described in Tables 4-1 and 4-2.

Table 4-1. Common data types of EHRs that can be integrated/interfaced with internal/external registries.

Table 4-1

Common data types of EHRs that can be integrated/interfaced with internal/external registries.

Table 4-2. Emerging data types of EHRs that can be integrated/interfaced with internal/external registries.

Table 4-2

Emerging data types of EHRs that can be integrated/interfaced with internal/external registries.

In addition to these data, EHRs capture a considerable amount of unstructured data (e.g., clinical notes) that can be further processed to extract specific data of importance to a registry (e.g., specific information extracted from radiology reports to determine eligibility).

Data types commonly extracted from EHRs and imported into registries are patient identifiers, demographics, diagnoses, medications, procedures, laboratory results, vital signs, and utilization events. These are discussed further below.

Patient Identifiers

EHRs are designed to facilitate the identification of individual patients in clinical workflows. Patient identifiers include patient’s full name, date of birth, contact information such as address and phone numbers, name and contact information of the next of kin, emergency contact information, and other personal information deemed necessary for healthcare delivery operations (e.g., employer information, insurance information). For internal operations, EHRs generate a unique patient ID (i.e., medical record number) that is used within the care setting to identity a specific patient. Organizations that provide care at multiple facilities (e.g., a health system with multiple hospitals and outpatient facilities) often have a second patient identifier that can be used to find a patient across the entire health network (i.e., master patient record). If a health system is connected to a statewide or regional health information exchange (HIE), the EHR may include a third patient identifier that has been issued by the HIE (i.e., statewide master patient index).9

Conditional to receiving proper consents and adhering to Health Insurance Portability and Accountability Act (HIPAA) policies,10 patient identifiers stored in EHRs can be used to merge patient EHR records with a patient registry. For example, a registry may collaborate with a statewide HIE to locate the master patient indexes of all registry patients and then ask multiple providers to locate the EHR records of those individuals using the HIE-issued patient master indexes. However, many registries do not have the option of acquiring master patient indexes from an HIE. These registries typically use alternative methods for matching patient identifiers and importing EHR data. Potential mistakes in matching registry patients with EHR patients may lead to quality issues such as incomplete or inaccurate data.


EHRs generally contain patient demographic information such as age, gender, and ethnicity/race. These data are needed for clinical operations and are mandated by the Meaningful Use objectives. The quality of data on age and gender is often acceptable because of the various mandates to collect them accurately.1113 However, the quality of demographics data may be affected by other factors including mode of measurement, user mistakes, and data conversion issues.14 EHRs often have a moderate to high missing data rate for non-essential demographic information such as income, marital status, education, employment status, and nationality.15, 16

Coding standards for demographic data have been published but are not always used. Demographic data such as education and nationality are often not coded in a standardized approach. Age data are governed by HIPAA and have sharing limitations if they contain a certain level of granularity (e.g., age represented by the exact date of birth or if ages above a certain limit).17 Demographic data are often used by registries to match patient records across data sources. Thus, legal limitations to sharing demographic data may hinder the development of multi-source/multi-site EHR-based registries that require demographic data for these purposes.


Diagnosis often is a key variable to evaluate a patient for inclusion in a registry. The quality of diagnosis data is often acceptable, in part due to various mandates to collect these data accurately.1012 EHRs also include problem lists as a way to capture active versus non-active diagnoses, but the quality of data found in problem lists may need further validation.

Some established vocabulary standards are available to encode diagnosis data. These include the International Classification of Diseases (ICD),18 International Classification of Primary Care (ICPC),19 Systematized Nomenclature of Medicine (SNOMED),20 Diagnostic and Statistical Manual of Mental Disorders (DSM),21 and Read Codes.22 In the U.S., ICD is the most commonly used system to capture diagnostic data in both EHRs and registries. Mapping diagnostic data from one coding system to another is challenging; even mapping diagnoses from one version of a coding system to another version is difficult (e.g., mapping ICD-9 to ICD-10). In addition, certain diagnostic codes – such as HIV status and mental illness diagnoses – are protected by various federal and state-level laws23 that may limit the ability to extract these codes for use in external registries.


In addition to diagnosis, registries often use medication data as eligibility criteria. Many registries also capture medication data to study treatment effect and/or safety. EHRs contain information on prescriptions that are written, while pharmacy claims data contain information on prescriptions that were filled. When EHR medication data are coupled with pharmacy claims data, a number of important constructs, such as medication adherence and reconciliation rates (e.g., medication regimen complexity index)24 can be derived and reported to a registry.

The quality of EHR medication data is often acceptable due to various mandates to collect medication data in EHRs. Common vocabulary standards for medications include National Drug Codes (NDCs),25 RxNorm,26 Systematized Nomenclature of Medicine’s (SNOMED)20 Chemical axis, Anatomical Therapeutic Chemical Classification System (ATC),27 and a number of commercial drug codes such as MediSpan®, Multum®, Generic Product Identifier® (GPI), and First Databank® (FDB). Each coding standard addresses different aspects of a medication (e.g. drug class, ingredients, dosage).

Potential semantic interoperability issues may arise when medication data are combined from multiple sources and mapped from one coding system to another. For example, an RxNorm code (drug class) may map to multiple NDC codes (packaged drug). Furthermore, some EHR-derived medication information may not be specific enough for research purposes (e.g., data on generics, like biosimilars, generally do not reflect which generic product was supplied to the patient).


Procedure data include clinical procedures such as surgery, radiology, pathology, and laboratory. Procedure data can be extracted directly from EHRs and reported to registries; however, procedures reported from one EHR generally only include those procedures taking place within the premises of a provider using the same EHR and may not include procedures that occurred elsewhere.

Vocabulary standards for procedures include International Classification of Diseases’ Clinical Modification (ICD-CM),18 Current Procedural Terminology (CPT)28 and Healthcare Common Procedure Coding System (HCPCS).29 Each coding system is designed to capture procedures within a specific clinical context (e.g., primary care, hospital facility). EHR-based procedure data may not have the level of detail necessary for a registry (e.g., techniques used in a clinical procedure such as a surgical process). These procedure nuances are often entered as unstructured data that usually do not accompany structured EHR-extracts for registries.

Laboratory Data

Currently, the best sources of laboratory data are the information systems used by standalone laboratories, which are frequently but not always incorporated into the EHR. Laboratory data include both lab orders and lab results. Coding standards for lab orders and lab results include the Logical Observation Identifiers Names and Codes (LOINC),30 the Systematized Nomenclature of Medicine (SNOMED),20 and the Current Procedural Terminology (CPT).28 Currently, there are no mandated laboratory coding system for certified EHRs, and the majority of healthcare providers rely on local coding systems for lab orders/results. This limits the interoperability of multi-site EHR-derived lab data for registries.

In addition, different healthcare facilities may use different laboratory tests to measure the same analyte, each of which has a different laboratory code. Discussion is needed across the provider network on how to link lab items, preferably using automated tools and not a manual process, so that a single query across the network will return all the desired data from multiple EHRs for a single registry. In addition, certain lab results are protected by federal and state laws (e.g., lab tests revealing HIV status) and thus might be missing from EHR-extracts reporting to external registries. Further, some laboratory data are accessible to clinicians without incorporation into the EHR; in fact, some lab data require active steps by the clinician to import into the EHR. Inaccurate interpretations may be made without understanding why some lab data are missing from an EHR.

Vital Signs

EHRs are a primary source of vital sign data. Vital sign data include physiological variables such as height, weight, body mass index, pulse rate, blood pressure, respiratory rate, and temperature. LOINC is the common coding standard for vital signs. Most provider organization, however, do not actively use LOINC codes to capture vital signs in their EHRs as it is not mandated by the Meaningful Use program.

The completeness of EHR-derived vital signs such as height and weight is often acceptable for use in registries. Issues with human errors and units of measurement may affect data quality; thus, data cleaning is essential before use for registries.31 For example, weight and height data may include incorrect units (e.g., pounds reported as kilograms). EHR also may lack proper meta-data that are important for the clinical interpretation of the data (e.g., sitting versus standing blood pressure measurements).


Utilization data can be extracted from EHRs especially when insurance claims data are not available. Note that EHR-level utilization data are limited to events that have occurred within a particular provider’s facilities and often do not contain utilization data from other providers. Utilization can be defined as cost, hospitalization, readmission, emergency room admission or other significant healthcare events. The quality and completeness of utilization data are often acceptable due to reimbursement guidelines.12

There are no specific standard utilization coding terminologies for EHRs; however, most EHRs adhere to the utilization guidelines of claims submission policies. A number of reimbursement policies recommend specific reference-coding systems to encode utilization events. Certain utilization events are protected by various federal and state-level laws (e.g., mental health visit), and a registry may not receive utilization data related to those conditions from an EHR.


Survey data are usually collected from self-reported questionnaires; however, clinical data captured by surveys are increasingly stored within EHRs for various purposes. Some EHRs provide standardized surveys that can be accessed via patient portals to capture patient reported outcomes or symptoms (i.e., Patient-Reported Outcomes Measurement Information System or PROMIS).32 Risk factors and self-reported behaviors often are important to registries, and such data can be derived from EHR-integrated surveys (e.g., smoking status, socioeconomic status, housing condition). Also, registries may add and integrate their own customized questionnaires in EHRs so that patients can directly enter the necessary information needed for a registry (e.g., determine eligibility; collect additional data for a study).

EHR-integrated surveys are prone to sampling, selection, response, and social-desirability biases. The quality of EHR-integrated survey data varies considerably depending on the questionnaire, and the validity and reliability of custom-built EHR-integrated surveys are often difficult to measure in the context of a clinical practice.

Surveys cover variable domains and often do not adhere to coding standards. Indeed, surveys measuring the same concept may code their variables differently. One approach to reduce bias and error in survey-collected EHR data is to use standardized questionnaires across EHRs and healthcare providers. Some of the many standardized questionnaires include the Patient Reported Outcomes Measures (PROMs), Patient Health Questionnaires (PHQ), Health Risk Assessments (HRA), Life Event Checklist (LEC), and Generalized Anxiety Disorder (GAD) screening tools.

Social Data

Social data include variables ranging from individual-level factors to community-level elements (e.g., smoking status, socio-economic status, housing condition). Social variables are often considered important factors in registries as these variables enable researchers to understand the underlying social context and potential disparities associated with the outcome of interest. As an example, social data captured within a registry can be used to assess treatment affordability or understand heterogeneity of treatment effects. Although increasingly recognized as important variables, social and behavioral data are not routinely captured in EHRs.14 EHR-derived social data are often incomplete and limited to a few data types.33 Moreover, social determinants of health that could be imported from data sources such as social services organizations are usually missing in EHRs and registries due to the lack of interoperability.34

Although a number of coding standards have been proposed to standardize social data, most EHRs use proprietary coding vocabularies. Social data are often of low quality, mainly due to incomplete survey responses and the subjective nature of many social questions. Although most social data are not subject to HIPAA, they can still be subject to other privacy rules such as the Family Education Rights and Privacy Act (FERPA).35 Establishing linkages among patient-level EHR records, social service records, and registries has faced both technical and regulatory challenges in the past.

Patient-Generated Data

Patient-generated data can include a wide array of variables (e.g., physical activity, sleep patterns, self-reported sign and symptoms, uploaded blood sugar levels) and may be captured within an EHR through various means (e.g., integrated personal health records, mobile-health exchange platforms, wearable device interfaces).36 EHR-based patient-generated data are highly customized and inconsistent across EHRs. Standards are becoming more available for mobile health and wearables devices,37 but have not yet been widely adopted for patient-generated data captured within EHRs. Although the quality of the data collected by mobile health and wearable devices is improving, accuracy and comparability are still challenging when such data are collected using different devices. Self-entered data collected via surveys (e.g., entering physical activity types) are subject to a variety of selection factors and errors (e.g., overestimating recall of time spent exercising). Data interoperability may become more challenging as more non-standardized devices enter the market. Additionally, consenting processes via internet and mobile health solutions may be complex, and the creation of large EHR-integrated registries using patient-generated data requires careful attention to legal and regulatory issues.38, 39

Sample Use Cases and Architecture of EHR-Based Registries

Registries that incorporate EHR data may use a variety of IT system architectures. Registry architects must consider the number of participating sites (single-site or multi-site), variety of underlying EHRs (one enterprise-level EHR, multiple EHR installations of the same vendor, multiple EHRs from different vendors), existence and connectivity to Health Information Exchanges (HIEs) (centralized, federated or distributed), and other factors that affect interoperability.

Following are examples of three “hypothetical” EHR-based registry types, each with a different combination of stakeholders and IT infrastructures (Table 4-2). Registries designed to support clinical care are often based on single enterprise-level EHRs, while registries designed for research are often hosted external to EHRs but may receive EHR extracts from multiple sources. Public health registries, similar to registries designed for research, are often hosted by health departments outside of a single EHR environment but receive EHR reports on a regular basis. Note, these are generalized examples; actual IT infrastructure and features may vary.

Table 4-3. IT infrastructure and other features of sample registry types using EHR data.

Table 4-3

IT infrastructure and other features of sample registry types using EHR data.

In a fully interoperable ecosystem, registry-specific functionality could be presented in a software-as-a-service or middleware model, interacting with the EHR as the presentation layer on one end and the registry database on the other.3 In this ideal model, the EHR is a gateway to multiple registries and clinical research activities through an open architecture that leverages best-in-class functionality and connectivity. Full interoperability would enable registries to interact across multiple EHRs, and EHRs to interact with multiple registries. Comprehensive interoperability, however, has not yet been realized, and customized IT architectures are required to facilitate the integration and interfacing of EHRs with registries.3 The following are examples of IT architectures that could support EHR-integrated/linked registries for clinical operations, research projects, and public health missions.

EHR-Integrated Registries To Support Clinical Care

Healthcare providers often develop and manage EHR-based registries that are used to support clinical care and meet operational goals (referred to here as ‘clinical registries’). To develop clinical registries, providers typically use EHR-based tools that are developed by EHR vendors. These EHR-based registries can facilitate clinical workflow, monitor quality metrics, enable disease/cohort management, and offer population health management features. In particular, the Triple Aim of care, health and cost has provided a framework to achieve value-based care while reducing cost.40 This framework promotes ‘population health’ while enhancing the individual’s experience of care and lowering cost.41 Effective population health management is essential to ensuring that resources are directed towards improving health outcomes of patients at the highest risk for developing undesired outcomes. The notion of population health management necessitated that health providers develop EHR-based registries to focus on high-risk subpopulations (e.g., patients at high risk for mortality and morbidity, cost, hospital and emergency room admission or who have a chronic condition that requires direct management, such as diabetes).42, 43

A major challenge with EHR-integrated clinical registries is the lack of out-of-network data in a health network’s EHR.44 In other words, data generated during patient encounters with out-of-network providers, who may not be using the same EHR, will be missed in the registry resulting in incomplete and sometimes outdated data. Individual health networks often complement their EHR data with insurance claims to generate a more complete picture of a patient’s health status; however, use of insurance claims is not always practical given that a large patient population of a health delivery network may use dozens, if not hundreds, of different insurers. Many challenges of EHR-based population health registries are derived from the overarching challenges within the broader domain of population health informatics.45

Clinical registries usually use a centralized architecture and often have an EHR data warehouse as their backbone along with multiple data marts containing various registry data. The centralized architecture accumulates and manages data in a single and centralized repository. The advantages of a centralized model are: simplicity and efficiency; greater data consistency; and easier patient linkage if the same patient identifiers are used across the healthcare network. Potential disadvantages of a centralized model include: data capture that is limited to users of a single EHR vendor across the healthcare network (e.g., trouble with integrating a different EHR vendor if a new facility joins the network); and difficult data exchange with registries developed by other networks due to a lack of interoperability.

Healthcare networks often develop clinical registries based on their underlying enterprise-wide EHR architecture (Figure 4-1). Data collected at different facilities of a healthcare delivery network (e.g., hospitals and outpatient clinics) are aggregated in a common data repository such as an EHR’s data warehouse. Facilities not using the same EHR platform face extra work to harmonize and standardize their data before feeding it into the data warehouse. Data warehouses can be used to develop multiple data marts feeding into various registries for different purposes such as quality measures, disease management, population health management, and public health reporting. Internal clinical registries are sometimes linked to external registries for reporting purposes (e.g., PQRS reporting),46 although interoperability challenges may limit such exchanges.

This figure includes a diagram depicting the high-level architecture of EHR-integrated registries to support clinical care. The diagram includes four rows data sources. Each layer feeds data to the next level of databases. The first level of databases includes electronic health record databases of various facilities of a health network feeding data into the next level. The second level includes a centralized EHR-derived data warehouse that collects data from various EHR databases. The EHR data warehouse feeds data into multiple databases of the next level. The third level of databases includes various internal clinical registries that can be used for various purposes. As an example, this level includes quality improvement, population health management, public health, and disease management registries. The first, second and third levels of databases are enclosed within one healthcare network; however, the fourth level of databases are external to the healthcare network. The fourth level of databases include various external registries that receive data from the healthcare network registry databases. These external databases include examples such as a community registry, public health registry, PQRS or physician quality reporting registry, and a research registry.

Figure 4-1

Common architecture of EHR-integrated registries to support clinical care. CDM = Chronic Disease Management; HER = Electronic Health Record; PH = Public Health; PHM = Population Health Management; PQRS = Physician Quality Reporting System; QI = Quality (more...)

EHR-Linked Registries Designed for Research

Registries designed for research purposes (referred to here as ‘research registries’) may use EHR data on a variety of levels. At the low end, research registries may use EHR data to identify and enroll eligible patients into studies that use supplementary registry-specific data collection. In this scenario, EHR data are used to identify eligible patients (based on the registry’s inclusion and exclusion criteria), and minimal EHR data (e.g., family history of breast cancer) are imported into the registry. The remaining registry-specific data are captured through another means, usually a dedicated data repository that allows for entry of eCRFs and web-based survey forms. On the other end of the spectrum, some research registries have been built entirely using EHR data (e.g., California Cancer registry).47 Many other research registries use a combination of self-reported and EHR data (e.g., Autism treatment network).47 Registries in which EHR-based extracts are merged with registry data on a periodic basis are referred to here as EHR-linked registries.

The increasing semantic and syntactic interoperability among healthcare providers is a major driver for EHR-linked registries. EHR-linked research registries often use application programing interfaces hosted by healthcare providers to extract and share standardized EHR data and then use semi-automated approaches to merge the EHR data with existing registry records. Moreover, bi-directionally interoperable EHR-linked registries may also serve an important role by delivering relevant information from a registry back to a clinician (e.g., natural history of disease, safety, effectiveness, and quality).

EHR-linked research registries collect EHR data using a variety of mechanisms, ranging from automated EHR-embedded push protocols to manual ad-hoc EHR-database pulls. Triggers for EHR data extraction include standardized protocols that follow the inclusion and exclusion criteria of the research registry (i.e., phenotyping queries; retrieve protocols). After receiving the EHR data, research registries use a multi-phase process to import incoming EHR data (Figure 4-2). Extract, transform, and load functions may include data curation activities such as data preparation, data standardization, secure data transfer, data mapping, data redaction, data integration/merging, and data reconciliation. Various organizations such as the Clinical Data Interchange Standards Consortium (CDISC) and the Standards and Interoperability (S&I) Framework have introduced detailed mechanisms to automate and standardize the incorporation of EHR data for other purposes including registries (e.g., CDISC Link Initiative48). Additionally, the growing number of common data models have enabled registry developers to adhere to specific predefined standards that facilitate integration of EHR-based data as well as data sharing among registries (e.g., Clinical Information Modeling Initiative’s (CIMI) Reference Model,49 FDA Sentinel Initiative,50 and Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)).51 Chapter 5 describes common data models in more detail.

Importing and merging data from EHRs into research registries is challenging. Automating the data imports requires high degrees of interoperability, data curation, and post-hoc harmonization as well as attention to data quality. For example, if inclusion criteria are encoded differently in different EHRs, the comparability of data may be impacted, creating artificial distortion between outcomes measured by different EHRs.52 Merging EHR data-imports with existing patient data in a registry also requires reliable master patient indexing to avoid inaccurate patient-matching which would compromise any inferences drawn from the data.9, 53 Data curation is critical, as integration of EHR data can expose data quality issues that may affect research findings.54

Data governance must be considered as well. Registries designed for research may be funded and managed by a broad range of organizations (e.g., federal, state, non-profit, private). Although patient privacy is safeguarded and protected under federal and state laws,55 data governance policies vary, resulting in different barriers for different registries when importing and integrating EHR data.56 Additionally, the incentives and liabilities associated with extracting and pushing data from an EHR to an internal or external registry are not always clear for healthcare providers.57

This figure provides an overview of the common architecture of EHR-linked research registries. The diagram includes three columns of various data sources. The left and right column databases feed into the middle/central database. The left column depicts three healthcare networks, each with a separate EHR databases, feeding data to the middle columns. The right column includes various non-EHR data sources such as survey data, patient generated data, genetic and laboratory data, insurance claims data, and other registries. The right column data sources also feed data to the middle columns. The middle column depicts a central database for a research registry, with the potential to generate subspecialized registries. Data fed by EHR databases will go through multiple ETL (extract, transform and load) processes before being deposited into the central research repository, which is depicted by a box between the left and middle columns. The ETL process lists the following functions as bullet points: (1) data and code harmonization (2) content and structure standardization (3) standardizing EHR forms and templates (4) using common information models (5) embedding EHR functions (6) using secure transport mechanisms (7) identifying patient and phenotyping protocol (8) matching patient indexes, and finally (9) data mapping, merging and reconciliation between the EHR databases and the central research registry. The diagram also includes a box showing that data can be fed back from the registry (the middle column) to the EHR databases (left column) and non-EHR data sources (right column).

Figure 4-2

Common architecture of EHR-linked research registries. HER = Electronic Health Record; ETL = Export, Transform, and Load

EHR-Linked Public Health Registries

Public health agencies have long used registries for surveillance and tracking purposes. For example, local and state public health departments usually maintain immunization registries that receive information from clinicians and other entities such as schools and pharmacies. Other common public health registries include syndromic surveillance and specialized registries such as birth defects, chronic diseases, and traumatic injury registries. In recent years, coincident with the rising EHR adoption among providers, public health entities began to link various registries with EHRs. A significant driver of increased EHR integration has been the Meaningful Use program, which incentivized clinicians to share EHR immunization and syndromic surveillance data with public health agencies.7 Other drivers have included the maturation of data standards (both semantic and syntactic) for automating and improving the transmission of EHR data to public health registries (e.g., distributed population queries),58 and the increased interest of value-based care provider organizations in assessing the needs and improving the health of the communities they serve (e.g., community health needs assessment).59 Most EHR-linked public health registries have relied on semi-automated processes; only recently have more automated mechanisms been introduced and adopted (e.g., vaccination registries). EHR-linked public health registries follow a similar architecture to that of EHR-linked research registries (Figure 4-2); however, the methods used to collect data from EHRs may vary as not all public health registries require patient-level data (e.g., counts are sufficient for some purposes). Methods used include but are not limited to: (1) semi-automated forms/templates to collect public health specific information about patients that fit a certain criteria (e.g., S&I Framework SDC);60 (2) data exchange protocols for receiving case reports from certified EHRs (e.g., MU public health reporting objectives);7 (3) tools to mine EHR and HIE data for signs and symptoms relevant to public health emergencies and outbreaks (e.g., ESSENCE Syndromic Surveillance System);61 and, (4) distributed data network queries to collect aggregated data from multiple providers when the identity of patients is not relevant (e.g., PopMedNet).62

Some public health agencies have been able to directly integrate their registries with the EHRs of clinicians who provide care in their jurisdiction. The prime example of such a fully-integrated EHR-linked public health registry is the New York City (NYC) Population Health Registry.63 This registry collects information from NYC’s eligible healthcare professionals across several domains (e.g., Influenza-like-Illnesses). The NYC’s Population Health Registry has been successful as most eligible professionals in NYC use the same EHR system, one which is capable of reporting data in real-time to local public health agencies. The Population Health Registry is part of NYC Macroscope Hub,64 a surveillance system for tracking conditions managed by primary care practices (e.g., obesity, diabetes, hypertension, and smoking).

Technical Issues and Operational Challenges of EHR-Based Registries

EHR-based registries fulfill different purposes and use different IT system architectures, but many technical issues and operational challenges are common across the range of registries. This section describes several common challenges, such as identification of eligible patients; data quality; unstructured data; interoperability; data sharing and patient privacy; data access and patient privacy; and human resources.

Identifying Eligible Patients

Retrieval protocols and phenotyping methods are commonly applied against EHR data to define the denominator of interest and identify eligible patients for screening, clinical trials, and inclusion in registries.52 Computational phenotyping involves operationalizing process, outcome and case definitions as a set of measures that can be captured during regular episodes of clinical care and that are stored in the EHR. General categories of data that are drawn for computational phenotyping from EHRs include medications, laboratory tests, and diagnoses.52 Operationalized definitions can be used for a number of applications including cohort screening and identification to enable clinical research; assessments of current healthcare delivery processes and outcomes; and, changes due to new healthcare practices and interventions. Common for any of these applications is a need to evaluate the operational definitions that are used. Given that EHR data are collected for the purpose of documentation and are collected at various points in time for each patient, there are a number of opportunities for potential biases to arise and for data to be missing. As such, a sound evaluation of the measurement approach is required prior to the use of those measures for secondary analyses of cohort screening and identification. To date, there have been a significant number of studies requiring cohort identification that report common measures such as positive and negative predictive values, sensitivity, and specificity prior to conducting downstream analyses. The evaluation of measure results depends in part on the intended use of an operational definition and EHR data source(s). Some frameworks have been developed to assist investigators in characterizing potential limitations to the use of operational definitions with EHR registry data so that when analyses are performed the confidence level of those findings can be quantified.52, 65

Various challenges with denominator and variables selections exist when extracting data from EHRs for registries. Ambiguous phenotyping algorithms and lack of standardized retrieval protocols often result in selecting a denominator of patients from an EHR that is irrelevant, skewed, or biased for a registry. Multiple factors can be used to modify and refine the definition of a population denominator (e.g., age, gender, diagnoses, medications, lab results, radiologic findings, special conditions such as disability, and administrative information such as insurance coverage). Selecting the timeframes of the EHR data extract is also complex and may result in incomplete temporal data represented in registries. Despite the higher interoperability of EHR data and standardization of phenotyping protocols, fine details of EHR data may affect the selection results. Some of the challenges include:

  • Process of Care: different providers or clinical workflows generate different data values for the same event or fact; hence, the same fact or event might be represented differently in the same EHR.
  • Nature of Intervention: different interventions with different levels of risk may be encoded similarly, meaning EHR does not contain the true risk factors for those interventions.43, 66

Data Quality

As a basic good practice, registries should use some form of data curation to review and assess data quality. In the context of EHR-based registries, data quality issues stem from the fact that data extracted from EHRs often requires extensive cleaning and preparation before being imported into registries. EHRs are designed to manage the transaction of healthcare and support clinical workflow and documentation for billing. The purpose of an EHR is not to conduct research, and EHRs are not designed to systemically collect research-grade longitudinal data. As a result, data captured by EHRs are of variable quality.14, 45 For example, EHRs often house reliable laboratory and medication data for clinical purposes, but EHRs typically lack consistent and sufficiently detailed data on risk factors, levels of education, or socioeconomic status.16 The quality of source data can affect both the underlying data as represented in a registry and the results generated using such data. Thus, EHR data may not be appropriate for some research purposes.

Data quality can be defined in various perspectives. The most impactful aspects of data quality for registries are:14

  • Accuracy: the extent to which data captured in EHR accurately reflects the state of interest, which is often complex to measure because the true value of a given variable remains unknown.
  • Completeness: the level of missing data for a particular data element in the EHR for the population of interest; this is commonly measured as a data quality indicator for EHR-integrated registries. It is important to note that for research purposes, a distinction is made between “must-have’ and “nice-to have” data, recognizing that completeness of “must-have” data is most important.
  • Timeliness: the length of time between the initial capture of a value and the time the value becomes available in the EHR.

It is important to note that data quality varies across EHRs used by different healthcare organizations. Moreover, changes may be made to EHR systems “behind the scenes” that affect data quality. For example, upgrades intended to improve performance or add features may inadvertently result in poor record linkage or may require updating record extraction protocols. Evaluating data quality, completeness and accuracy should be conducted as an on-going process and not a one-time exercise.

Unstructured Data

EHRs contain a considerable amount of unstructured data, such as progress notes. The loosely structured nature of typed text (also known as ‘free text’) is effective in day-to-day clinical workflows but presents a major challenge for automating EHR-based registries. The unstructured data may contain key patient information missing in structured data, extra information complementing structured data, or even data that may contradict information represented by structured data. The complexities of unstructured data, along with the fact that existing text mining tools and natural language processing applications have limited accuracy in extracting information from free text,67 have prompted some registries to ask for a manual chart review of individual patients before final inclusion in the registry. Unstructured data limits the application of automated computational phenotyping methods and increases the likelihood of low data quality (e.g., missing data) when data are extracted from structured EHR data only.

Many EHRs also allow a choice of places where important data may be entered. For example, some EHR have been set up to facilitate quick entry of “easy treatments” that then results in fragmented storage of treatment information. Treatment information may also be buried in clinical notes, which may not be accessible for research purposes since notes often include a patient’s name and other personally identifiable information that can be difficult to spot and redact systematically.


Interoperability is defined as the ability of a system to exchange electronic health information with, and use electronic health information from other systems without special effort on the part of the user.68 Interoperability requires multiple stages, ‘sending’, ‘receiving’, ‘finding’ and eventually ‘using’ the data.68 As discussed in Chapter 1, interoperability spans multiple dimensions of standards: regulatory, contractual, privacy, exchange formats, content, and technology.68, 69 In the context of EHRs and registries, syntactic interoperability is the ability of heterogeneous health information systems to exchange data with a registry, and semantic interoperability implies that the registry understands the data exchanged at the level of defined domain concepts.

From an EHR/registry perspective, functional interoperability could be described as a standards-based solution that achieves the following set of requirements: “The ability of any EHR to exchange valid and useful information with any registry, on behalf of any willing provider, at any time, in a manner that improves the efficiency of registry participation for the provider and the patient, and does not require significant customization to the EHR or the registry system.”3

Although interoperability of EHRs with other EHRs and health IT systems has increased over the last decade,70 most health systems do not share in-depth EHR-level data with other health systems. Lack of interoperability is a major limiting factor for the extraction, integration, and linkage of EHR data for registries. Most EHRs are not fully interoperable in the core functions that would enable them to participate in various registries without a significant effort.3 This deficiency is directly related to a combination of technical and economic barriers to EHRs’ adoption and deployment of standards-based interoperability solutions.3 EHR vendors also provide heavily customized versions of their own systems for each client thus creating additional barriers to interoperability.3 Since registries seek data across large and generalizable populations, making EHRs interoperable across providers is a key step in facilitating EHR-based registry efforts.

Data sharing and interoperability challenges are not limited to incoming EHR data for a registry. In a learning health system, a bidirectional registry shares its findings with providers that have shared their EHR data. In such a reciprocal model, the findings are turned into knowledge and can effectively be used to change the delivery of care and improve outcomes across all participating providers. Currently, there are no common standards on how to distribute registry findings while protecting the identity of individual healthcare providers. Sharing the findings about data quality issues with data providers is challenging as well as it may result in legal ramifications (e.g., individual providers might become liable when data is captured inaccurately).

Linking and integrating various EHR data sources for registries also requires matching patients across databases. HIEs are sometimes required to generate master patient indexes (MPIs) to match patients across diverse EHR data sources. Developing and utilizing an MPI is a complex process and may introduce error and bias in registries despite many tools being available to accomplish this process.9 It is worth noting that most of the data elements needed to create MPI are considered protected health information according to HIPAA regulations and may not be available for registries to complete the matching process.

EHR Infrastructure and Deployment

EHRs may provide IT infrastructure and tools to support the development of an EHR-based registry, but they typically do not provide turnkey solutions for functional registries. Over the last decade, a variety of EHR tools have been developed that could form the building blocks of EHR-based registries. For example, EHR-based clinical data warehouses collect and store EHR data across an entire health network. These system-wide data warehouses often serve as the backbone of data products that eventually support an EHR-integrated registry (see Chapter 2). However, challenges with updating, maintaining, scaling, and sharing such tools across healthcare providers still hinders development of registries.

In addition, the architecture of an EHR deployment within a healthcare delivery system may influence the usefulness of EHR for different registry applications. For example, a health system that lacks an enterprise-level EHR architecture may find it challenging to develop a system-wide EHR-integrated registry, as each of its entities operates a standalone EHR with no interoperable solution to share data among them.

Data Access, Privacy, and Use

Data access and privacy challenges are complex in multi-site EHR-based registries. Chapters 7 and 8 of the User’s Guide provide more information on ethics, informed consent, and protecting patient privacy. Data sharing is an additional concern in the context of EHR-based registries. Decisions must be made about whether a single institutional review board (IRB) will suffice or whether all sites will require local IRB approval. Governance is also challenging as the rules around sharing of data (identifiable or de-identified) vary depending on the organizations involved and the purpose of the research.

Human Resources

Most healthcare providers, especially small office-based practices, do not have adequate staff time or even the necessary expertise to solve all potential challenges with EHR-registry integration/linkage. Indeed, several types of expertise are needed, such as:

  • Regulatory/ethics – what data can we share?
  • Scientific – what question is important?
  • Research design – how do we answer the question?
  • Clinical – do the data mean what we think they mean?
  • Informatics – do the data maintain their epistemological integrity from clinical collection to analysis?
  • Information technology (IT) – how do we curate and manage the data?
  • Statistics and epidemiology – how do we answer the question with the data obtained?

In addition, although EHRs may offer cost-effective solutions for registry use, the need to capture comprehensive data for registries may counter this cost-effectiveness balance (e.g., requiring costly changes to the clinical workflow). Assuming that all data objectives for a registry can be met within an EHR, data collection for EHR-based registries hypothetically could be achieved at the time of a clinical encounter, thus reducing the cost of data collection; however, this has yet to be achieved on a widespread basis.

Other Factors

Other factors may also affect the usefulness of EHRs as a foundation for internal registries and/or for contributing to external registries. These include challenges with collecting patient consent within clinical workflows, incorporating patient-reported data, and safeguarding the security of the data.71

International Perspective on EHR-Based Registries

Some international registries are derived from national data collected in the context of national health insurance programs. In the Nordic countries, the unique constellation of universal coverage, a network of population-wide registries and databases, and individual-level linkage72 make registries optimally suited for observational medical research in multiple clinical domains73 and, increasingly for pragmatic trials.7476 In some countries, EHRs can be readily linked with the registry data using nationwide individual identifiers. For example, Nordic countries maintain a wide network of continuously updated databases, which collectively cover most health events, which can be linked on individual level in combinations dictated by the needs of a given study. In the United Kingdom, the Clinical Practice Research Datalink (CPRD)77 and The Health Improvement Network (THIN) are important sources of routinely collected data, originating in EHRs. Both CPRD and THIN capture information routinely gathered in the course of daily operations of participating general practices. The data undergo a set of built-in data checks before being available for research. In some instances, additional data are linked (e.g., hospital records, or basic socioeconomic data). All patients registered with the participating practices, regardless of their disease, are included in the resulting dataset as long as they are enrolled in a participating practice.

Similarly, routine records are also being collected in some form in many countries in Europe though generally with less national coverage than in England, with non-exhaustive list including Netherlands,78, 79 Italy,79, 80 Scotland,81 Germany,82 France,83 and Spain.84 In North America, routine health records from a single-payer system are maintained by provinces in Canada;85 and, increasingly, in Asia, including South Korea,86 and Taiwan.87 Although not originally established for research, routine data have been playing an increasingly important role in studies of health and disease, including post-marketing risk-management commitments.

The Future of EHR-Based Registries

The true promise of EHRs for registries is in facilitating the achievement of a practical, scalable, and efficient means of collecting registry data for multiple purposes. Scalability constraints on patient registries can be dramatically reduced by using digitized information.3 Paper records are inherently limited because of the associated difficulty of systematically identifying eligible patients for research activities and the effort required to re-enter information into a database.3 Digitized information has the potential to make it easier to meet both of these requirements, enabling larger, more diverse patient populations and avoiding duplication of effort by participating clinicians and patients.3 However, duplication of effort can be reduced only to the extent that EHRs capture data elements and outcomes with specific, consistent, and interoperable definitions — or that data can be found and transformed by other processes and technologies (e.g., natural language processing) into standardized formats that match registry specifications.3

Despite the challenges and barriers of using EHRs for registries, EHRs will likely play a key role in expanding and developing existing and future registries. Multiple factors are poised to increase the role of EHRs in registries in the near future such as:

  • increasing adoption of light-weight and efficient interoperability standards (e.g., HL7 FHIR);88
  • new methods to measure EHR interoperability;69
  • innovative technical frameworks to harmonize the extraction of data from EHRs (e.g., S&I Framework SDC);60
  • introduction of new EHR-embedded tools to develop EHR-integrated registries (e.g., define and apply retrieval protocols; additional EHR-integrated forms for registries);
  • incentivizing healthcare providers to share EHR data with registries (e.g., Meaningful Use);89
  • aligning value-based efforts and population health management goals with reporting of EHR data to registries across providers (e.g., MACRA);90 and
  • providing additional clarifications about the application of HIPAA and other privacy protection rules in the context of EHR-based registries for both operational purposes and research.91

EHRs can be linked or integrated with registries in many formats or various purposes. Future research should focus on developing and disseminating additional guidelines and technical documentations about registry integration with EHRs for public use. Finally, achieving a fully interoperable EHR-based registry, so that EHRs and patient registries function seamlessly with one another, is unlikely to be accomplished in the near future.3 However, it is critical that a level of interoperability be achieved to prevent the creation of information silos within proprietary informatics systems that make it difficult or impossible to develop large EHR-based registries and conduct research across diverse practices and populations.



EHRs are sometimes referred to as Electronic Medical Records (EMRs). This chapter uses both terms interchangeably.

References for Chapter 4

Gliklich R, Dreyer N, Leavy M, eds. Registries for Evaluating Patient Outcomes: A User’s Guide. Third edition. Two volumes. (Prepared by the Outcome DEcIDE Center [Outcome Sciences, Inc., a Quintiles company] under Contract No. 290 2005 00351 TO7.) AHRQ Publication No. 13(14)-EHC111. Rockville, MD: Agency for Healthcare Research and Quality. April 2014. http://www​.effectivehealthcare.ahrq.gov.
Aspden P, Corrigan JM, Wolcott J, et al. Key Capabilities of an Electronic Health Record System: Letter Report. Institute of Medicine of the National Academies; 2004.
Gliklich RE, Dreyer NA, Leavy MB. Interfacing Registries With Electronic Health Records. Registries for Evaluating Patient Outcomes: A User’s Guide. 2. Third ed. Rockville, MD: Agency for Healthcare Research and Quality (AHRQ); 2014. p. 3–22. [PubMed: 24945055]
Dixon BE, Gibson PJ, Grannis SJ. Estimating Increased Electronic Laboratory Reporting Volumes for Meaningful Use: Implications for the Public Health Workforce. Online Journal of Public Health Informatics. 2014;5(3):225. PMID: 24678378. DOI: 10.5210/ojphi.v5i3.4939. [PMC free article: PMC3959912] [PubMed: 24678378] [CrossRef]
Electronic Health Record Incentive Program, 42 CFR 412, 413, 422, 495 (2010).
Centers for Disease Control and Prevention (CDC). Summary of Public Health Objectives in Stage 2 Meaningful Use ONC and CMS Final Rules Version 1.1. 2014; https://www​.cdc.gov/ehrmeaningfuluse​/docs​/summary-of-ph-objectives-in-stage-2-mu-onc-and-cms-final-rules_04_01_2014.pdf. Accessed August 15, 2019.
Centers for Disease Control and Prevention (CDC). Meaningful Use. https://www​.cdc.gov/ehrmeaningfuluse/. Accessed August 15, 2019.
U.S. Core Data for Interoperability. Office of the National Coordinator for Health Information Technology. Version 1. https://www​.healthit​.gov/isa/us-core-data-interoperability-uscdi. Accessed June 18, 2019.
The Office of the National Coordinator for Health IT (ONC). Patient Identification and Matching: Final Report. https://www​.healthit​.gov/sites/default/files​/patient_identification​_matching_final_report.pdf. Accessed August 15, 2019.
U.S. Department of Health and Human Services (DHHS). Part 164-Security and Privacy. https://www​.gpo.gov/fdsys​/pkg/CFR-2011-title45-vol1​/pdf/CFR-2011-title45-vol1-part164.pdf. Accessed August 16, 2019.
Jha AK, Burke MF, DesRoches C, et al. Progress Toward Meaningful Use: Hospitals’ Adoption of Electronic Health Records. Am J Manag Care. 2011;17(12 Spec No.):SP117–24. PMID: 22216770. [PubMed: 22216770]
Centers for Medicare and Medicaid Services (CMS). Health Information Technology: Standards, Implementation Specifications, and Certification Criteria for Electronic Health Record Technology, 2014 edition. Fed Regist. 2012 Sep 4; 77(171):54163–292. PMID: 22946139. [PubMed: 22946139]
Centers for Medicare and Medicaid Services (CMS). Acute Care Hospital Inpatient Prospective Payment System. Medicare Learning Network. December 21, 2012.
Chan KS, Fowles JB, Weiner JP. Review: electronic health records and the reliability and validity of quality measures: a review of the literature. Med Care Res Rev. 2010;67(5):503–27. PMID: 20150441. DOI: 10.1177/1077558709359007. [PubMed: 20150441] [CrossRef]
Madden JM, Lakoma MD, Rusinak D, et al. Missing clinical and behavioral health data in a large electronic health record (EHR) system. J Am Med Inform Assoc. 2016;23(6):1143–9. PMID: 27079506. DOI: 10.1093/jamia/ocw021. [PMC free article: PMC5070522] [PubMed: 27079506] [CrossRef]
Mendelsohn AB, Dreyer NA, Mattox PW, et al. Characterization of Missing Data in Clinical Registry Studies. Therapeutic Innovation & Regulatory Science. 2015;49(1):146–54. PMID: 30222467. DOI: 10.1177/2168479014532259. [PubMed: 30222467] [CrossRef]
Health Insurance Portability and Accountability Act of 1996 (HIPAA), Pub. L. No. 104-191, 110 Stat. 139 (1996) (codified as amended in scattered sections of 42 U.S.C.). HIPAA Privacy Rule Regulations codified at 45 CFR pts. 160 & 164 (2010).
(CDC) CfDCaP. International Classification of Diseases Tenth Revision Clinical Modification (ICD-10-CM). https://www​.cdc.gov/nchs/icd/icd10cm.htm. Accessed August 16, 2019.
World Health Organization (WHO). International Classification of Primary Care, Second Edition (ICPC-2). http://www​.who.int/classifications​/icd/adaptations/icpc2/en/. Accessed August 16, 2019.
National Library of Medicine (NLM). SNOMED CT. 2017; https://www​.snomed.org/. Accessed June 10, 2019.
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®): APA Publishing; 2013.
National Library of Medicine. Unified Medical Language System (UMLS). https://www​.nlm.nih.gov​/research/umls/index.html. Accessed August 15, 2019.
42 CFR Part 2 - Confidentiality of Alcohol and Drug Abuse Patient Records. https://www​.govinfo.gov​/app/details/CFR-2010-title42-vol1​/CFR-2010-title42-vol1-part2. Accessed August 16, 2019.
George J, Phun YT, Bailey MJ, et al. Development and validation of the medication regimen complexity index. Ann Pharmacother. 2004;38(9):1369–76. PMID: 15266038. DOI: 10.1345/aph.1D479. [PubMed: 15266038] [CrossRef]
Food and Drug Administration (FDA). National Drug Code Directory. 2017; https://www​.fda.gov/drugs​/drug-approvals-and-databases​/national-drug-code-directory. Accessed August 16, 2019.
National Library of Medicine (NLM). RxNorm. Unified Medical Language System (UMLS) https://www​.nlm.nih.gov​/research/umls/rxnorm/. Accessed June 10, 2019.
World Health Organization (WHO). The Anatomical Therapeutic Chemical Classification System with Defined Daily Doses (ATC/DDD). http://www​.who.int/classifications​/atcddd/en/. Accessed August 16, 2019.
(AMA) AMA. Current Procedural Terminology (CPT). https://www​.ama-assn​.org/amaone/cpt-current-procedural-terminology. Accessed August 16, 2019.
Centers for Medicare and Medicaid Services (CMS). HCPCS - General Information. https://www​.cms.gov/Medicare​/Coding/MedHCPCSGenInfo/index​.html. Accessed August 16, 2019.
Regenstrief Institute. LOINC. https://loinc​.org/. Accessed June 10, 2019.
Townsend N, Rutter H, Foster C. Improvements in the data quality of a national BMI measuring programme. Int J Obes (Lond). 2015;39(9):1429–31. PMID: 25869597. DOI: 10.1038/ijo.2015.53. [PubMed: 25869597] [CrossRef]
National Institute of Health (NIH). Patient-reported Outcomes Measurement Information System (PROMIS). http://www​.healthmeasures.net. Accessed August 16, 2019.
Committee on the Recommended Social and Behavioral Domains and Measures for Electronic Health Records; Board on Population Health and Public Health Practice; Institute of Medicine. Capturing Social and Behavioral Domains and Measures in Electronic Health Records: Phase 2. Washington (DC): National Academies Press (US); 2015 Jan 8. Abstract. https://www​.ncbi.nlm​.nih.gov/books/NBK269341/. Accessed August 16, 2019. [PubMed: 25590118]
Adler NE, Stead WW. Patients in context-EHR capture of social and behavioral determinants of health. N Engl J Med. 2015;372(8):698–701. PMID: 25693009. DOI: 10.1056/NEJMp1413945. [PubMed: 25693009] [CrossRef]
Education USDo. Family Educational Rights and Privacy Act (FERPA). https://www2​.ed.gov/policy​/gen/guid/fpco/ferpa/index.html. Accessed August 16, 2019.
Workman TA. Engaging Patients in Information Sharing and Data Collection: The Role of Patient-Powered Registries and Research Networks. AHRQ Community Forum White Paper. AHRQ Publication No. 13-EHC124-EF. Rockville, MD: Agency for Healthcare Research and Quality; September 2013. [PubMed: 24156118]
Health Level Seven (HL7). Mobile Health. http://www​.hl7.org/Special​/committees/mobile/. Accessed August 16, 2019.
Arora S, Yttri J, Nilse W. Privacy and Security in Mobile Health (mHealth) Research. Alcohol Res. 2014;36(1):143–51. PMID: 26259009. [PMC free article: PMC4432854] [PubMed: 26259009]
Dreyer NA, Blackburn S, Hliva V, et al. Balancing the Interests of Patient Data Protection and Medication Safety Monitoring in a Public-Private Partnership. JMIR Med Inform. 2015;3(2). PMID: 25881627. DOI: 10.2196/medinform.3937. [PMC free article: PMC4414957] [PubMed: 25881627] [CrossRef]
Berwick DM, Nolan TW, Whittington J. The triple aim: care, health, and cost. Health Aff (Millwood). 2008;27(3):759–69. PMID: 18474969. DOI: 10.1377/hlthaff.27.3.759. [PubMed: 18474969] [CrossRef]
Improvement IfH. The IHI Triple Aim. http://www​.ihi.org/Engage​/Initiatives/TripleAim​/Pages/default.aspx. Accessed August 16, 2019.
Eggleston E, Klompas M. Rational Use of Electronic Health Records for Diabetes Population Management. Current Diabetes Reports. 2014;14(4):479. PMID: 24615333. DOI: 10.1007/s11892-014-0479-z. [PubMed: 24615333] [CrossRef]
Duncan I. Healthcare Risk Adjustment and Predictive Modeling. Winsted, CT: ACTEX Publications; 2011.
Kharrazi H, Weiner JP. IT-enabled Community Health Interventions: Challenges, Opportunities, and Future Directions. EGEMS (Wash DC). 2014;2(3):1117. PMID: 25848627. DOI: 10.13063/2327-9214.1117. [PMC free article: PMC4371402] [PubMed: 25848627] [CrossRef]
Kharrazi H, Lasser EC, Yasnoff WA, et al. A proposed national research and development agenda for population health informatics: summary recommendations from a national expert workshop. J Am Med Inform Assoc. 2017;24(1):2–12. PMID: 27018264. DOI: 10.1093/jamia/ocv210. [PMC free article: PMC5201177] [PubMed: 27018264] [CrossRef]
Centers for Medicare and Medicaid Services. Physician Quality Reporting System (PQRS). https://www​.cms.gov/Medicare​/Quality-Initiatives-Patient-Assessment-Instruments​/PQRS​/Downloads/PQRS_OverviewFactSheet​_2013_08_06.pdf. Accessed June 20, 2019.
Patient-Centered Outcomes Research Institute. Comprehensive Inventory of Research Networks. https://www​.pcori.org​/funding-opportunities​/research-support-funding-opportunities​/research-support-funding​/comprehensive. Accessed August 16, 2019.
Clinical Data Interchange Standards Consortium (CDISC). The CDISC Healthcare Link Initiative. https://www​.cdisc.org​/system/files/all/standard_category​/application​/pdf/healthcare_link_chapter.pdf. Accessed August 16, 2019.
Clinical Information Modeling Initiative (CIMI). CIMI Reference Model. https://wiki​.hl7.org/index​.php?title=Proposed​_CIMI_Reference_Model. Accessed August 16, 2019.
Observational Health Data Sciences and Informatics (OHDSI). https://www​.ohdsi.org/. Accessed June 10, 2019.
Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117–21. PMID: 22955496. DOI: 10.1136/amiajnl-2012-001145. [PMC free article: PMC3555337] [PubMed: 22955496] [CrossRef]
Dreyer NA, Rodriguez AM. The fast route to evidence development for value in healthcare. Curr Med Res Opin. 2016;32(10):1697–700. PMID: 27314301. DOI: 10.1080/03007995.2016.1203768. [PubMed: 27314301] [CrossRef]
Arts DG, De Keizer NF, Scheffer GJ. Defining and improving data quality in medical registries: a literature review, case study, and generic framework. J Am Med Inform Assoc. 2002;9(6):600–11. PMID: 12386111. DOI: 10.1197/jamia.m1087. [PMC free article: PMC349377] [PubMed: 12386111] [CrossRef]
Modifications to the HIPAA Privacy, Security, Enforcement, and Breach Notification Rules Under the Health Information Technology for Economic and Clinical Health Act and the Genetic Information Nondiscrimination Act, 45 CFR Parts 160 and 164 (2013). [PubMed: 23476971]
The Office of the National Coordinator for Health IT (ONC). Health Information Exchange Governance. https://www​.healthit​.gov/sites/default/files​/governancehitweekpresentation.pdf. Accessed August 16, 2019.
Fiks A, Grundmeier R, Steffes J. Comparative Effectiveness Research Through a Collaborative Electronic Reporting Consortium. Pediatrics. 2015;136(1):e215–24. PMID: 26101357. DOI: 10.1542/peds.2015-0673. [PubMed: 26101357] [CrossRef]
The Office of the National Coordinator for Health Information Technology (ONC). Distributed Population Queries. https://www​.healthit​.gov/sites/default/files​/052412_hitsc_queryhealthpresentation.pdf. Accessed August 19, 2019.
Centers for Disease Control and Prevention (CDC). Community Health Assessments and Health Improvement Plans. https://www​.cdc.gov/publichealthgateway​/cha/plan.html. Accessed August 16, 2019.
Lombardo JS, Burkom H, Pavlin J. ESSENCE II and the framework for evaluating syndromic surveillance systems. MMWR Suppl. 2004;53:159–65. PMID: 15714646. [PubMed: 15714646]
Massachusetts eHealth Institute. PopMedNet: Distributed Data Network. http://mehi​.masstech​.org/programs/past-programs​/mdphnet-project​/popmednet-distributed-data-network. Accessed August 16, 2019.
New York State Department of Health. Population Health Registry. https://www​.health.ny​.gov/health_care/medicaid​/redesign/ehr/registry/phr.htm. Accessed August 19, 2019.
New York City Health Department. The NYC Macroscope. http://www1​.nyc.gov/site​/doh/data/health-tools​/nycmacroscope.page. Accessed August 16, 2019.
Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J Am Med Inform Assoc. 2013;20(e2):e206–11. PMID: 24302669. DOI: 10.1136/amiajnl-2013-002428. [PMC free article: PMC3861925] [PubMed: 24302669] [CrossRef]
Richesson RL, Hammond WE, Nahm M, et al. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. J Am Med Inform Assoc. 2013;20(e2):e226–31. PMID: 23956018. DOI: 10.1136/amiajnl-2013-001926. [PMC free article: PMC3861929] [PubMed: 23956018] [CrossRef]
Ford E, Carroll JA, Smith HE, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23(5):1007–15. PMID: 26911811. DOI: 10.1093/jamia/ocv180. [PMC free article: PMC4997034] [PubMed: 26911811] [CrossRef]
Benson T. Principles of Health Interoperability HL7 and SNOMED. Health Informatics. 2010.
Samarath A, Sorace J, Patel V. Measurement of Interoperable Electronic Health Care Records Utilization. US Department of Health and Human Services https://aspe​.hhs.gov​/pdf-report/measurement-interoperable-electronic-health-care-records-utilization. Accessed August 16, 2019.
Swain M, Charles D, Patel V, et al. Health Information Exchange Among U.S. Non-federal Acute Care Hospitals: 2008–2014. ONC Data Brief. 2014(17).
Kharrazi H, Chi W, Chang H-Y, et al. Comparing Population-based Risk-stratification Model Performance Using Demographic, Diagnosis and Medication Data Extracted From Outpatient Electronic Health Records Versus Administrative Claims. Medical Care. 2017;55(8):789–96. PMID: 28598890. DOI: 10.1097/MLR.0000000000000754. [PubMed: 28598890] [CrossRef]
Frank L. Epidemiology. When an Entire Country Is a Cohort. Science. 2000;287(5462):2398–9. PMID: 10766613. DOI: 10.1126/science.287.5462.2398. [PubMed: 10766613] [CrossRef]
Krueger WS, Anthony MS, Saltus CW, et al. Evaluating the Safety of Medication Exposures During Pregnancy: A Case Study of Study Designs and Data Sources in Multiple Sclerosis. Drugs Real World Outcomes. 2017;4(3):139–49. PMID: 28756575. DOI: 10.1007/s40801-017-0114-9. [PMC free article: PMC5567459] [PubMed: 28756575] [CrossRef]
Raungaard B, Jensen LO, Tilsted HH, et al. Zotarolimus-eluting durable-polymer-coated stent versus a biolimus-eluting biodegradable-polymer-coated stent in unselected patients undergoing percutaneous coronary intervention (SORT OUT VI): a randomised non-inferiority trial. Lancet. 2015;385(9977):1527–35. PMID: 25601789. DOI: 10.1016/S0140-6736(14)61794-3. [PubMed: 25601789] [CrossRef]
Frobert O, Lagerqvist B, Olivecrona GK, et al. Thrombus aspiration during ST-segment elevation myocardial infarction. N Engl J Med. 2013;369(17):1587–97. PMID: 23991656. DOI: 10.1056/NEJMoa1308789. [PubMed: 23991656] [CrossRef]
Shurlock B. Randomization Within Quality Registries: A Cost-Effective Complement to Classical Randomized Trials. European heart journal. 2014;35(1):1–2. PMID: 24382633. DOI: 10.1093/eurheartj/eht493. [PubMed: 24382633] [CrossRef]
Herrett E, Gallagher AM, Bhaskaran K, et al. Data Resource Profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36. PMID: 26050254. DOI: 10.1093/ije/dyv098. [PMC free article: PMC4521131] [PubMed: 26050254] [CrossRef]
Pharmo Record Linkage System. https://www​.pharmo.nl/. Accessed August 16, 2019.
Coloma PM, Schuemie MJ, Trifiro G, et al. Combining electronic healthcare databases in Europe to allow for large-scale drug safety monitoring: the EU-ADR Project. Pharmacoepidemiol Drug Saf. 2011;20(1):1–11. PMID: 21182150. DOI: 10.1002/pds.2053. [PubMed: 21182150] [CrossRef]
Avillach P, Coloma PM, Gini R, et al. Harmonization process for the identification of medical events in eight European healthcare databases: the experience from the EU-ADR project. J Am Med Inform Assoc. 2013;20(1):184–92. PMID: 22955495. DOI: 10.1136/amiajnl-2012-000933. [PMC free article: PMC3555316] [PubMed: 22955495] [CrossRef]
Alvarez-Madrazo S, McTaggart S, Nangle C, et al. Data Resource Profile: The Scottish National Prescribing Information System (PIS). Int J Epidemiol. 2016;45(3):714–5f. PMID: 27165758. DOI: 10.1093/ije/dyw060. [PMC free article: PMC5005947] [PubMed: 27165758] [CrossRef]
The German Pharmacoepidemiological Research Database. https://www​.bips-institut​.de/en/research​/research-infrastructures/gepard.html. Accessed August 16, 2019.
Boudemaghe T, Belhadj I. Data Resource Profile: The French National Uniform Hospital Discharge Data Set Database (PMSI). Int J Epidemiol. 2017;46(2):392-d. PMID: 28168290. DOI: 10.1093/ije/dyw359. [PubMed: 28168290] [CrossRef]
Bolíbar B, Fina Avilés F, Morros R, et al. [SIDIAP Database: Electronic Clinical Records in Primary Care as a Source of Information for Epidemiologic Research]. Medicina Clínica. 2012;138(14):617–21. [PubMed: 22444996]
Garies S, Birtwhistle R, Drummond N, et al. Data Resource Profile: National electronic medical record data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Int J Epidemiol. 2017;46(4):1091–2f. PMID: 28338877. DOI: 10.1093/ije/dyw248. [PubMed: 28338877] [CrossRef]
Cheol Seong S, Kim Y-Y, Khang Y-H, et al. Data Resource Profile: The National Health Information Database of the National Health Insurance Service in South Korea. International Journal of Epidemiology. 2016;46(3):799–800. [PMC free article: PMC5837262] [PubMed: 27794523]
Chen Y-C, Yeh H-Y, Wu J-C, et al. Taiwan’s National Health Insurance Research Database: Administrative Health Care Database as Study Object in Bibliometrics. Scientometrics. 2011;86(2):365–80.
Health Level 7 (HL7). Welcome to FHIR. 2017; https://www​.hl7.org/fhir/. Accessed June 10, 2019.
Centers for Medicare & Medicaid Services (CMS). Stage 3 Program Requirements for Providers Attesting to their State’s Medicaid EHR Incentive Program. Regulations and Guidance 2016; https://www​.cms.gov/Regulations-and-Guidance​/Legislation/EHRIncentivePrograms​/Stage3Medicaid​_Require.html. Accessed August 16, 2019.
Centers for Medicare and Medicaid Services (CMS). MACRA: Delivery System Reform Medicare Payment Reform. 2017; https://www​.cms.gov/medicare​/quality-initiatives-patient-assessment-instruments​/value-based-programs​/macra-mips-and-apms​/macra-mips-and-apms.html. Accessed June 11, 2019.
U.S. Department of Health and Human Services (DHHS). Clinical Data Repositories - OHRP Correspondence. 2015; https://www​.hhs.gov/ohrp​/regulations-and-policy​/guidance/june-25-2015-letter-to-robert-portman​/index.html. Accessed August 16, 2019.
©2019 United States Government, as represented by the Secretary of the Department of Health and Human Services, by assignment.

All rights reserved. The Agency for Healthcare Research and Quality (AHRQ) permits members of the public to reproduce, redistribute, publicly display, and incorporate this work into other materials provided that it must be reproduced without any changes to the work or portions thereof, except as permitted as fair use under the U.S. Copyright Act. This work contains certain tables and figures noted herein that are subject to copyright by third parties. These tables and figures may not be reproduced, redistributed, or incorporated into other materials independent of this work without permission of the third-party copyright owner(s). This work may not be reproduced, reprinted, or redistributed for a fee, nor may the work be sold for profit or incorporated into a profit-making venture without the express written consent of AHRQ. This work is subject to the restrictions of Section 1140 of the Social Security Act, 42 U.S.C. § 1320b-10. When parts of this work are used or quoted, the following citation should be used:

Bookshelf ID: NBK551878


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.2M)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...