All rights reserved. The Agency for Healthcare Research and Quality (AHRQ) permits members of the public to reproduce, redistribute, publicly display, and incorporate this work into other materials provided that it must be reproduced without any changes to the work or portions thereof, except as permitted as fair use under the U.S. Copyright Act. This work contains certain tables and figures noted herein that are subject to copyright by third parties. These tables and figures may not be reproduced, redistributed, or incorporated into other materials independent of this work without permission of the third-party copyright owner(s). This work may not be reproduced, reprinted, or redistributed for a fee, nor may the work be sold for profit or incorporated into a profit-making venture without the express written consent of AHRQ. This work is subject to the restrictions of Section 1140 of the Social Security Act, 42 U.S.C. § 1320b-10. When parts of this work are used or quoted, the following citation should be used:
NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Gliklich RE, Leavy MB, Dreyer NA, editors. Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Oct.
Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2 [Internet].
Show detailsIntroduction
Electronic health data that are relevant for registries may come from a wide variety of sources, including electronic health records (EHRs), administrative claims databases, laboratory systems, imaging systems, medical devices, and consumer devices. A 2017 survey of patient registries in the United States found that 68 percent of registries extract some data from electronic health records (EHRs), and 35 percent capture some data from other electronic data sources. While use of data from electronic data sources has grown, most registries (88 percent) still use manual data capture for at least some data.1
Integrating data sources with patient registries can take many forms, depending on the type(s) of data and the purpose and architecture of the registry. In some cases, registries may work directly with individual systems to integrate or link to data, while, in other cases, registries may work with sources in which the data have already been aggregated and standardized, such as clinical data warehouses and health information exchanges. The purpose of this chapter is to describe several common sources of data that may be incorporated into a patient registry and discuss the strengths and limitations of these data. Chapters 4 and 5 describe the technical approaches that may be used to incorporate these data into a patient registry, and key questions to consider when planning to incorporate data from another source are summarized in Appendix B.
When selecting data sources, registries should consider the registry purpose and the suitability of the potential data source – in terms of scope, data quality, and timeliness – for addressing that purpose. Chapter 6 of the User’s Guide provides more information on selecting data sources for use in a registry.
In addition to technical and scientific considerations, registries must pay careful attention to issues of patient privacy, informed consent, and data ownership when incorporating data from multiple sources. Registries should understand, at minimum, the purpose for which the data were collected originally (e.g., treatment, payment or healthcare operations as defined by the Health Insurance Portability and Accountability Act Privacy Rule; for research purposes with documented individual consent; for research purposes with an Institutional Review Board (IRB) waiver of consent); the type of data contained in the data source (e.g., protected health information [PHI], sensitive information such as information about mental health conditions or infectious diseases); and who owns the data. More information on the legal and ethical framework under which data may be shared across systems in the United States can be found in Chapters 7 and 8 of the User’s Guide.
Lastly, it is important to note that the following discussion focuses on sources that may contribute data to a registry. This discussion does not cover the issue of when or how registries should report data back to these other sources. For example, EHR data may be sent to a registry, but registry data (such as patient-reported outcome measures or data obtained from other providers for registry purposes specifically) are typically not sent back to the EHR. Many questions exist about the appropriateness and feasibility of creating these types of continuous exchanges of data, and these issues are beyond the scope of this document.
Electronic Health Records (EHRs)
An EHR is “a longitudinal electronic record of patient health information generated by one or more encounters in any care delivery setting.”2 EHRs include information on patient demographics, progress notes, problem lists, medications, vital signs, past medical history, immunizations, laboratory data, and radiology reports. While much of this information is extremely valuable for patient registries, EHRs are designed primarily support clinical care (as opposed to research). Patient registries may leverage data contained in EHRs by integrating with the EHR to allow for real-time or nearly real-time data exchange or by linking with the EHR to allow for periodic transfers of data into the registry. The decision of whether and how to incorporate data from EHR is complex and should be guided by many factors, including the purpose and scope of the registry and the availability of the necessary data elements within an EHR. These considerations are discussed in detail in Chapter 4 (Obtaining Data from EHRs).
Claims Data
Public and private medical insurers collect a wide range of data as part of evaluating coverage, tracking health services utilization, and managing billing and payment. These data, commonly referred to as ‘claims data,’ contain patient-specific information such as demographics, insurance coverage and copayments, healthcare provider data (e.g., specialty characteristics, locations), and treatment details such as procedures, office visits, and hospitalizations. Pharmacy claims data provide specific information on the dispensing of pharmaceutical products. Standard coding systems are used to record diagnoses, procedures, and other data; these include Current Procedure Terminology (CPT) for physician services and International Classification of Diseases (ICD) for diagnoses and hospital inpatient procedures.3 Similarly for pharmacy claims, standard medication coding systems, such as National Drug Classification (NDC) codes, are used.
Medicare and Medicaid claims files are commonly used administrative databases in the United States. Together, the programs cover nearly 133 million people in the United States. The Medicare program covers some 59 million individuals ages 65 and older, as well as younger individuals with end-stage renal disease or who qualify for Social Security Disability.4 Medicaid and Children’s Health Insurance Program (CHIP) together cover an additional 73.8 million individuals.5 Both programs are administered by the Centers for Medicare and Medicaid Services (CMS). Claim files for these programs can be obtained for inpatient, outpatient, physician, skilled nursing facility, durable medical equipment, hospital services, and prescription drugs. These data, which are subject to privacy rules and regulations, can be linked to other databases with appropriate permissions. The Research Data Assistance Center (ResDAC) is a CMS contractor that supports researchers interested in using Medicare and/or Medicaid data for research purposes.6
While Medicare and Medicaid data files are tremendously valuable for some research purposes, they are restricted to patients who are eligible for these programs. A limited number of other data sources are available at the federal and state level.7 One such example is the Healthcare Cost and Utilization Project (HCUP) databases managed by the Agency for Healthcare Research and Quality (AHRQ). HCUP databases contain encounter-level data from all payers, dating back to 1988. The databases use a uniform format to provide longitudinal information that can be used to support research on cost and quality of health services, practice patterns, access to care, and outcomes of treatment.8 It is important to note that many of the databases contain a sample of data, as opposed to all data. For example, the National Inpatient Sample (NIS) contains a 20% stratified systemic random sample of all discharges. In addition, linkage of these data to registry data at the individual patient level is not feasible generally; however, these data can provide useful information to inform the design of a registry and provide context for the findings of a registry. Databases available under the HCUP program are summarized in Table 2-1.9
Table 2-1
Databases available through the HCUP program.
More recently, some efforts have focused on creating all-payer claims databases (APCDs) at the state level that can be used to produce price, resource use, and quality information for consumers. APCDs compile medical claims, pharmacy claims, dental claims, and eligibility and provider files from private and public payers, providing a more comprehensive look at healthcare services provided within a state. To date, 18 states have mandated the creation and use of APCDs; the actual implementation and use of these systems varies, as do policies for data access for research purposes.10
In the private sector, some companies have compiled data from private insurers and in some cases combined these data with other sources (e.g., data from EHRs). This is an area of rapid growth. While these databases may be useful in the context of patient registries, their applicability and limitations vary widely depending on the size and scope of the database and the research question(s) of interest.7 A full review of these private sector companies is beyond the scope of this document.
Strengths & Limitations of Claims Data
In the context of patient registries, claims data offer a relatively quick way to access large volumes of health information. Use of claims data is typically less expensive and faster than longitudinal data collection directly from providers or patients.11 Claims data may also fill gaps in data from other sources. For example, an EHR may capture detailed clinical data on a patient undergoing total joint replacement surgery, including patient characteristics and any immediate post-surgery complications. Claims data may provide information on followup care, such as physical therapy or issues that emerged later (e.g., revision surgery). Claims data are also useful for monitoring practice patterns or disease prevalence at a national or regional level, since these data often cover a wider geographic area than EHR data. It is important to distinguish between claims that are submitted by healthcare providers and the amount paid by health insurers. Health insurers have made substantial investments in claims “scrubbing” activities in which all claims are reviewed for accuracy. Aside from changes in what is paid compared to what was submitted, claims that have undergone the adjudication process employed by health insurers are considered to be more reliable than submitted claims, which have not likely undergone the same degree of curation.
Claims data have several limitations that should be considered before using these data in registry-based studies. First, claims databases are limited to individuals who were insured by a specific program (e.g., Medicare, a private payer plan). Uninsured patients are not included in these databases. Depending on the research question, the population included in a claims database may or may not be generalizable to the target population. While patients are tracked longitudinally in private payer claims data, they only remain in the dataset while they are covered by the same plan; patients typically become lost to followup when they change plans. This can limit the ability to track long-term outcomes through private payer claims data. In addition, claims databases only record billable events that are covered by the individual’s plan. For example, prescriptions that were given to a patient but not filled by the patient are not included in claims databases; similarly, claims databases do not include claims for prescriptions that were dispensed but not a covered benefit. Treatments sought outside of covered settings (e.g., by a non-covered provider, alternative treatments) are also not included. Insurance plans can vary widely with respect to the services or drugs that are covered, and patients with different plans typically have different deductibles or copays. These factors can influence treatment patterns and make comparisons across plans challenging.
In addition to issues related to the scope of the data, some questions have been raised about the accuracy of claims data compared to medical records data. For example, a 2013 study examined the agreement between administrative claims data and the medical records for 13 commonly reported comorbidities and complications in patients undergoing total joint arthroplasty. The study found that the specificity of administrative claims data is generally high (greater than 92 percent for many outcomes), but the sensitivity is variable and often lower (ranging from 29 to 100 percent). The authors concluded that comorbidities and complications coded in the administrative record were highly accurate but often incomplete.12 Data quality issues in claims data may occur due to clerical errors, different interpretations of healthcare documentation, or errors resulting from lack of education when codes change (e.g., annual updates of CPT codes, switch from ICD-9 to ICD-10).13
Linkages of Claims Data With Research Studies
Claims data may facilitate registry-based research in several ways. First, claims data may be useful in the registry planning phase. Claims data can provide information on treatment patterns, such as the frequency of a specific procedure, that can be helpful when planning enrollment and targeting recruitment efforts. Once registries are in the operational phase, data for all registry patients or for a subset may be linked with claims data to address a specific research question. For example, a recent study linked data from the Transcatheter Valve Therapies Registry to Medicare claims data to examine the prevalence of death, stroke, heart failure-related hospitalization, and mitral valve intervention at one year post transcatheter aortic valve replacement.14 In another example, data from the CORRONA registry were linked to Medicare data to examine the economic savings associated with remission among rheumatoid arthritis patients.15 Registries have also been linked with claims data to assess the generalizability of the registry population.16 Linked datasets can be valuable tools for research; one of the largest linkages of disease registry and claims data is the SEER-Medicare dataset, which has supported a wide range of cancer-related research projects and resulted in over 1,700 publications.17 Linkage with claims data is more difficult for registries that do not focus on the Medicare-eligible population; participants in these registries are often covered by a variety of payers, and linkage of data from multiple payers is rarely feasible.
Multiple approaches for linking registry data to claims data are available; the technical and legal aspects of these approaches are explored in the “Linking Registry Data With Other Data Sources To Support New Studies” chapter of the third edition of the User’s Guide.
Patient-Generated Health Data
In recent years, there has been increasing interest in incorporating patient-generated health data into patient registries, EHRs, and other data collection efforts. Patient-generated health data (PGHD) are defined as ‘health-related data created, recorded, or gathered by or from patients (or family members or other caregivers) to help address a health concern.’18 PGHD may include information on the patient’s health history, treatment history, symptoms, biometric data (e.g., blood glucose reading), and lifestyle choices (activity level tracked using a wearable device). These data differ from data captured in clinical settings in two ways: patients are responsible for recording these data, and patients decide if and how to share these data with healthcare providers.
The availability of PGHD has expanded as consumers increasingly use smartphones, mobile apps, remote monitoring devices, and wearable devices that are capable of capturing health data.19 For example, apps and wearable devices are available to track fitness,20 sleep,21, 22 heart rate and rhythm,23, 24 blood pressure,25–27 blood glucose,28, 29 and oxygen saturation.30 A 2016 report found more than 259,000 mobile health apps available in app stores such as Apple App Store and Google Play.31 In addition, the growth in provider usage of EHRs and the introduction of patient portals have created new tools to connect patients and providers and to integrate PGHD into clinical care. Some devices have even received approval from the U.S. Food and Drug Administration for use in clinical workflows.32
Strengths of PGHD
PGHD are valuable to providers, researchers, and other stakeholders for several reasons. First, these data may supplement data collected during clinical encounters with more frequent measurements of health status, providing clinicians with a better overall picture of the patient’s health status. Patients may also benefit from improved understanding of their health; for example, heart failure patients in the Connected Cardiac Care Program at Partners HealthCare reported learning more about their condition and feeling more in control of their health after regularly monitoring and sharing data on their weight, heart rate, pulse, and blood pressure.33 Ideally, frequent monitoring could lead to timely interventions that prevent more significant complications, such as a change in prescription to reduce the likelihood of an asthma exacerbation.34 Some evidence suggests that some PGHD, particularly from sensor data, may be more reliable than data collected in the clinic, since measures like a 6-minute walk test can be estimated through sensors that are free of the influence of healthcare professionals who may coach some patients differently than others.
Researchers are interested in using PGHD to capture important information outside of regularly scheduled visits with a provider and to follow patients over time, particularly when patients change providers or no longer need to return for followup visits (e.g., post-surgery). In addition, the ability to capture PGHD may enable researchers to recruit from a larger pool of patients efficiently, rather than relying on traditional site-based enrollment models. In fact, a recent review on the potential value of PGHD for comparative effectiveness research concluded that ‘leveraging the emerging wealth of big data being generated by patient-facing technologies such as systems to collect patient-reported outcomes data and patient-worn sensors is critical to developing the evidence base that informs decisions made by patients, providers, and policy makers in pursuit of high-value medical care.’35
There is also a growing body of literature on the validity and utility of PGHD for pharmacovigilance. For example, in the European PROTECT (Pharmacoepidemiological Research on Outcomes of Therapeutics) Consortium, funded by the Innovative Medicines Initiative, data were collected directly from pregnant women recruited on-line from the United Kingdom, Denmark, The Netherlands and Poland. The PROTECT study examined medication use during pregnancy using bi-weekly or monthly questionnaires administered via the Internet, with the frequency of followup determined according to the participants choosing.36 The study compared patient-reported medication use for Danish patients with data from the Danish national prescription register and showed reasonably strong agreement; moreover, the PGHD also provided rich information about non-prescription medications (and recreational drug use) not available through other sources. It should be noted that patients consented to data linkage and provided their national identity number. The actual data linkage was accomplished through use of a trusted third party; similar approaches are being used in the United States.
Several efforts have been launched in recent years to support the use of PGHD in clinical care and research. At the Federal level, the Office of the National Coordinator (ONC) launched a project in 2015 to identify best practices, gaps, and opportunities for use of PGHD; project findings include a report on PGHD intended to inform future policy work in this area, two pilot demonstrations, and a practical guide.31 These findings are intended to support the long-term implementation of the PGHD requirements included in the Federal Health IT Strategic Plan, the ONC Interoperability Roadmap, the 2015 Certification Rule, Stage 3 of the CMS Meaningful Use Rule, the CMS Quality Payment Program, and the Precision Medicine Initiative at the National Institutes of Health (NIH). NIH’s All of Us Research Program, under the Precision Medicine Initiative, aims to collect data including PGHD from at least one million U.S. participants.37 ONC also recently updated its Patient Engagement Playbook to include strategies for integrating PGHD into clinical care.
In the private sector, Apple released HealthKit, a common framework to support sharing of PGHD among apps, services, and providers, in 2014. The related ResearchKit was released in 2015 to provide researchers with an open source framework to build apps to support smartphone-based research. ResearchKit enables researchers to use the iPhone’s sensors as well as third-party devices to monitor health variables captured in HealthKit and share those data with researchers and EHRs. In 2018, the American Medical Association and Google co-sponsored an innovation challenge aimed at improving interoperability and developing new methods of collecting and managing PGHD.38 The Patient-Centered Outcomes Research Institute (PCORI) has also devoted funding to building a sustainable foundation to support the use of PGHD in patient-centered outcomes research.39 More information on how to use ResearchKit and other similar tools to capture PGHD for use within a registry can be found in Chapter 5.
Limitations of PGHD
While interest in PGHD has increased, some barriers to the routine use of PGHD in research and clinical care still exist. First, as noted in the Duke-Margolis Center for Health Policy’s mHealth action plan, ‘interoperability as well as common data elements (and tightly bound self-defining metadata) and definitions will be critical, as disparate data streams will increasingly need to be combined to create actionable insights for maintaining an individual’s health and treating disease.’40 Currently, PGHD sources differ in terms of what is measured, how it is measured, how data are structured, and how data may be transferred to other systems. Second, guidance on how to determine if a PGHD source is ‘fit for purpose’ would be useful. For example, some research has raised questions about the accuracy of some devices compared to other sources of information;41 these concerns need to be considered in the context of the study objectives and measures of interest. In addition, multiple types of devices are available for many areas (e.g., FitBit, Jawbone for activity tracking), and it is unclear if these devices operate in the same manner and if data from these devices are interchangeable.42 There is also little guidance on how to transform data from continuous monitoring, often over long periods, into clinically meaningful endpoints.
On the patient side, some patients may not have access to the necessary technology (e.g., a smartphone, remote monitoring devices) to generate and share PGHD. Even patients with access to the technology may be unwilling to complete the steps required to capture and share data; for example, in a pilot study involving patients with asthma, patients needed to complete a setup process that included installing and activating the MyChart app, installing and consenting to the Asthma Health app, and permitting data sharing.34 Patients also may not recognize or understand the potential value of recording these data, or they may be reluctant to share the data with providers because of privacy concerns. Patients and other users of the data may also have different views on ownership of PGHD.43 At the provider level, workflow changes and analytic tools may be necessary to incorporate review of PGHD and appropriate outreach to patients with concerning data. Providers may also have concerns about the accuracy and validity of PGHD from various devices and about setting realistic patient expectations for how these data are used in clinical decision making. For example, providers may be concerned about potential liability if they do not act promptly on urgent information provided through PGHD channels or if they do act on inaccurate PGHD.
Researchers also face challenges when attempting to use PGHD in the context of a clinical study. As noted above, questions about the validity of the data exist, and researchers who enroll patients remotely may have difficulty verifying participant eligibility. Once patients are enrolled, researchers must trust that the submitted PGHD were generated by the enrolled participant (and not, for example, by a family member who borrowed a device). Researchers must also address the selection bias inherent in studies that require use of a specific technology. From an ethical standpoint, researchers may encounter difficulties with Institutional Review Board (IRB) approval and the informed consent process (e.g., with regard to the data collection and privacy practices of third party developers of apps or devices), although this is a rapidly changing area.43 The ONC report on PGHD noted that “the security and privacy protections that apply to PGHD are uneven and do not establish a consistent legal and regulatory framework.”31 Lastly, researchers planning to collect data over a long period must address issues related to technological change and device abandonment. The PGHD landscape is changing constantly with the rapid introduction of new devices and apps and the disappearance of others, making it possible that researchers will need to modify the study protocol to accommodate these changes. Patients may also lose interest in using a device over time and stop tracking data or submitting data to the study.
Further research is needed to support the efficient and effective use of PGHD in clinical practice and research. Specifically, research is needed to identify best practices for incorporating PGHD into research studies and ideally into the patient’s EHR to inform clinical decision making. More research is also needed to understand patients’ views about sharing PGHD with providers and researchers and to address their concerns. On the technical side, standardization of common PGHD measurements could increase the reliability and validity of these data if uniformly applied. In particular, standardized measures that could be captured through patient devices as well as in the clinical setting would increase the utility of these data for research and clinical practice. These standards could be based on existing, patient-centered standardized outcome measures, such as those developed through the AHRQ-funded Outcome Measures Framework (OMF) project (Chapter 3).
Genomic Data
Genomic data originates from an individual’s DNA and may refer to both the information from genetic tests (e.g., genetic markers) as well as the actual biospecimens. Due to recent advances in genomic technology, sequencing and analysis of biospecimens has produced large amounts of genomic data that could be linked to clinical data to help diagnose diseases, identify risk factors for diseases, and monitor responses to treatment. There is significant interest in using genomic data in clinical care and in research.
In clinical care, genomic data forms the foundation for precision medicine efforts. Precision medicine refers to the use of genomic and other data to guide the selection of the appropriate drug and dosage for an individual patient. The concept has received much attention in recent years, particularly with the creation of the NIH’s Precision Medicine Initiative in 2015, the passage of the 21st Century Cures Act in 2016, and the launch of NIH’s All About Us research study in 2017.37 While there is still much work to be done before precision medicine becomes broadly useful in clinical care, the practice of using genetic testing to guide treatment is already common in some areas. For example, in lung cancer, genetic testing is done to detect molecular biomarkers such as EGFR that guide treatment choices. Biomarkers also play a critical role in guiding treatment decisions for patients with invasive breast cancer.44 Beyond oncology, genomic data are used for many purposes, such as diagnosing rare diseases and detecting chromosomal abnormalities of the fetus during pregnancy.
The interest in genomic data and precision medicine has led to substantial investments in research examining how to use genomic data across a wide range of condition areas.45 In addition to individual research studies, several efforts have focused on creating biorepositories or biobanks to store biosamples for use in future research. One of the largest repositories of genomic data in the United States is the National Cancer Institute’s Genomic Data Commons (GDC). Genomic data generated from cancer research studies are available through the GDC for re-use in new research projects, subject to controlled access terms to protect patient privacy.46 Similarly, the RD-Connect project links genomic and phenotypic data to patient registries and other clinical databases, with the goal of streamlining multi-national rare disease research efforts.47
Patient registries may collect genomic data to address many research questions. For example, the American Association for Cancer Research (AACR) launched a registry in 2015 to capture genomic sequencing data from patients with late-stage cancers and link these data to clinical outcomes. The data are aggregated and analyzed to identify possible ways to improve treatment decisions and patient outcomes.48 In another example, the Muscular Dystrophy Association (MDA) has launched the NeuroMuscular ObserVational Research (MOVR) data hub, with the goal of capturing and linking genomic data with clinical data at the national level to support research for four rare diseases: amyotrophic lateral sclerosis, spinal muscular atrophy, Duchenne muscular dystrophy, and Becker muscular dystrophy.49 In this registry, clinical data are captured during routine care and linked with other data, such as genomic data and patient-reported outcomes.
While the use of these data has substantial potential, many barriers remain. Because genomic data contains highly sensitive information, some individuals may be unwilling to provide biosamples for research purposes.50 Many investigators are unwilling to share genetic testing results with patients because there is no clear impact on clinical decision making; this reduces the attractiveness of study participation for patients, who may wish to acquire this information in hopes of future benefits. Concerns have also been raised about the ethical implications of genetic testing, whether information should be shared with family members who may have the same genetic risk factor, and the possibility for identification of genetic mutations unrelated to the patient’s current treatment decision.51, 52 Patient registries that intend to incorporate genomic data must consider these issues during the registry planning phase.
From an interoperability perspective, patient registries typically capture the results of genetic testing (e.g., presence of a specific mutation) within the registry dataset and, in some cases, link to a biorepository containing biosamples and more complete genomic data (see ‘Biorepositories and Registries’ white paper). However, as genome sequencing becomes more widespread and as the ability to store large amounts of data increases, registries may wish to store the results from both array-based sequencing and next generation sequencing, as well as new types of genomic data. Variant Call Format (VCF) files contain information only about specific genomic locations that differ whereas genomic Variant Call Format (gVCF) files contain all assayed nucleotide positions, regardless of whether they are variant.
Radiological Image Data
Imaging data include x rays, magnetic resonance imaging (MRI) scans, ultrasounds, computed tomography (CT) scans, and positron emission tomography (PET) scans that may be used for diagnosis and monitoring purposes. Increasingly, medical imaging plays an important role in guiding treatment decisions, and patient registries may wish to capture these data to support specific research objectives. When considering imaging data, it is important to distinguish between the interpretation or findings from the imaging study and the images themselves. Many registries currently store the findings from imaging studies (e.g., tumor location and size, degree of vertebral slip in lumbar spondylolisthesis). However, in some cases, registries may be interested in storing or linking to the images, as opposed to storing only the interpretation of the image. Access to the original images may be important to confirm a diagnosis, adjudicate study outcomes, or support new research questions that emerge over the course of the registry. In addition, interest in using machine learning and artificial intelligence methods to read medical images is increasing, and registries that link rich clinical data with images could be important resources as training and validation datasets.53
While interest in storing images is increasing, many challenges still exist. First, different imaging technology can result in incompatible imaging files, even within one healthcare setting. This issue becomes even more complicated when attempting to include images in a patient registry that captures data from multiple healthcare settings. Digital Imaging and Communications in Medicine (DICOM) is the current standard for image file format and communication profile for many types of images. This standard provides a format for metadata describing the patient, exam, and other image details, which should facilitate data exchange and interoperability. However, some researchers have noted that many fields are entered incorrectly or left blank, creating complex issues when merging datasets. Linking images from different databases can also be challenging in the absence of a master patient identifier, and direct inclusion of images in registry databases (as opposed to linkages) increases the registry data storage requirements. Further work is needed to explore how best to link or import image files into patient registries.
Clinical Data Warehouses
Clinical data warehouses (CDWs) are used for a variety of clinical, research, and administrative purposes. A CDW is a database or repository containing clinical data from a variety of sources that are standardized for use in analysis and reporting. A widely used definition of a CDW is a “subject-oriented, integrated, time-variant collection of data to support decision making.”54 Other terminology that are used to refer to CDWs are: enterprise data warehouse, medical data warehouse, biomedical data warehouse, biomedical information warehouse, healthcare data warehouse, and clinical data repository. It is important to note that the terms “clinical data warehouse” and “clinical data repository” are often used interchangeably, but they may have specific, distinct meanings within an institution.
CDWs are developed to organize and standardize data that exist in separate silos within or across organizations, enabling analysis and reporting both from a feasibility and efficiency standpoint. Within an organization, data from billing systems, registries, EHRs, pharmacy systems and laboratory systems often reside in different places. When these data are loaded into a common CDW, they can be linked together at the patient level and used in tandem to answer questions that could not be addressed within each individual data silo. For example, prescription fill data from pharmacies may be used in concert with EHR medication orders to examine patient medication use and adherence.
CDWs are designed to contain complex and heterogenous data. Ideally, CDWs have a flexible schema model that allows for the addition of new data sources and types of data at any point in the lifecycle of the CDW.55 Most CDWs contain administrative data, such as billing data, as well as clinical data. Clinical data may come from inpatient or hospital EHR systems, disease or quality improvement registries, laboratories, pharmacies, and imaging centers. A common, unique patient identifier is required to link these disparate data sources together in the CDW. If the input data sources use different patient identifiers, a Master Patient Index must be maintained in the CDW as well.
Both structured and unstructured data may be included in the CDW. Natural language processing and other data mining techniques may be used to extract or manipulate data for inclusion in a CDW. Some examples of different types of data are provided below:
- Pathology: Pathology data may arrive in a report from an outside institution that is scanned and attached to a patient record in the EHR. Important information such as description of pathologic findings in tumor specimen and pathologic staging information must be extracted from these reports into a discrete element for integration into the warehouse. Various technologies are being used to accomplish this.56
- Medical Imaging: As discussed above, medical imaging data are large and complex. Special planning is required to provide the end user of a CDW with access to the image (often through a URL to a web based picture archive) while maintaining patient privacy.
- Genomics: As discussed above, storing genomic data, such as gVCF data, can require a huge amount of database space and may result in slower query run-time, so the end use for the data must be carefully considered when selecting the data to include in the CDW.57 Consideration may be given to mapping variants to Human Genome Organization gene names and indexes.56 As whole genome sequencing decreases in cost and becomes more widely available, CDW storage issues regarding the size of these data will need to be addressed.
In addition to allowing linkage of data from different sources at the patient level, CDWs are used to standardize data elements for ease of analysis. This may include ensuring that data elements are in a common format (e.g. diagnosis code from EHR in format 270.10 vs. code from claims data in format 27010) or mapping data elements to a standard terminology/ontology (e.g., ICD-10, LOINC, SNOMED). Standardization can be extremely time consuming and resource intensive, and the extent to which data are standardized within a CDW depends upon both the intended use cases for the data as well as the available resources within an organization.
In addition to use within a single organization, CDWs may be used to provide a central repository for data from multiple organizations to facilitate shared analysis and reporting.58 CDWs that incorporate data from multiple organizations are typically organized in one of three ways. First, sites may upload their data directly into a centralized CDW that integrates and stores all of the study/registry specific data from all participating sites.59 This model allows for efficient, centralized analyses, but resources are required to maintain a central database. This model may also trigger concerns about patient privacy, security, and data access. Alternately, individual sites may each maintain their own CDW, often with a CDM that is utilized by all participating sites. Analyses may be run at each site using shared code, since the underlying data architecture of each CDW is the same. This is known as a federated or distributed research network.60 Lastly, individual sites may maintain their own CDW, but use a centralized server to store information for data exchange, such as the data model, controlled terminologies, and other metadata.61
Once implemented, CDWs support a variety of objectives. Some CDWs are enterprise wide and provide broad data services to the entire organization.62–64 Others are narrower in scope and may exist only to meet the needs of a specific group within an organization. They may be used to generate ad hoc queries from researchers or clinicians, to run automated reports, or to identify patient populations of interest. Several examples of how CDWs are used in practice are provided below:
- Clinical care: Data from a CDW can be used to provide actionable feedback to clinicians. For example, the Intermountain CDW integrates data from 22 hospitals and 179 outpatient facilities that are part of the Intermountain health system. The CDW is updated daily with data from the EHR and automated reports run that identify patients who have new positive MRSA cultures and notify infection specialists to prevent transmission in the hospital or office setting.62
- Precision medicine for improving treatment for individual patients: Rutgers Cancer Institute created a CDW with a focus on integrating all data sources of importance in the treatment of an individual, including pathology, exon sequencing, radiology images and other data types which are often difficult to store and access.56 The availability of all data points of interest in one warehouse enables clinicians to access to the full array of data needed to tailor treatment for an individual and provides a rich resource for clinical studies.
- Research: CDWs are used to identify patients for recruitment for clinical trials or observational research studies. The availability of diverse data elements can be used to design a targeted and efficient search strategy for appropriate patients.65 In addition, linked data in the CDW can be used to conduct retrospective studies of populations of interest.
- Machine learning and artificial intelligence: Machine learning algorithms use large volumes of complex data to make predictions. The data available in CDW are particularly suitable for machine learning.
Patient registries interact with CDWs in a variety of ways. The data collected by a registry may be uploaded into a CDW and then linked to additional data sources for analysis. Data supplied by other systems within the CDW may be used to validate the data reported in a registry. For example, pharmacy fill data may be used to validate reported medication use. A CDW can be used to generate or enrich a registry population and may be used to feed data in to a traditional registry electronic data capture (EDC) system from EHR, laboratory, or other data tables in the CDW.68 Automatically feeding clinical data into a registry can reduce the time required to enter data and ensure timely availability of data elements, such as laboratory test results, within the registry. However, the registry should carefully consider the impact of automatic data feeds on registry data quality.
CDWs are powerful tools for integrating disparate data sources into a single data repository, but the design of the warehouse schema and the data standardization rules must be carefully planned, both for the initial use cases and to accommodate future use cases that may arise. While it may be desirable to have all data cleaned and mapped, doing so requires a great deal of resources on an on-going basis, and choices must be made as to what is the most efficient model for the specific CDW. As with any existing data source, the data in the CDW will only be as good as the data in the source files. Incomplete or incorrect data in the source files will be replicated in the CDW. Methods to cross-validate and error check are needed but will not be able to account for all data issues.
The patient linkage inherent in CDWs also raises concerns about patient privacy. CDWs must have the necessary access and security controls to ensure appropriate access to patient-level data. In addition, de-identified data that are transferred to the CDW may become identifiable when combined with other data in the warehouse. Data access rules and honest brokers may be necessary to protect patient privacy in these circumstances.
Health Information Exchanges
Electronic Health Information Exchange (HIE) refers to the electronic transfer of patient health information between healthcare providers. HIEs address interoperability issues and enable the bi-directional exchange of data either through a centralized data repository or through a federated network of sites. Although primarily conceptualized a means to improve the quality of care for patients and reduce healthcare costs, HIEs are tools that could be used for the creation or maintenance of a population-based registry. HIE organizations possess technical capabilities, such as data extraction from multiple organizations’ health IT systems, transformation of the data into a common format, and loading of data into a common repository,69 that are highly relevant to patient registries. Since much of the work to address interoperability issues has already been done by the HIE organization, a registry may leverage the existing infrastructure for its own purposes.
Population-based registries with state-mandated reporting requirements are increasingly interfacing with HIEs to allow providers to directly report to the registry through the HIE. For example, in Colorado, the Colorado Regional Health Information Organization (CORHIO) allows the Colorado Department of Public Health and Environment to access the network’s web portal for case finding activities for the Colorado Central Cancer Registry. The cancer registry uses data in the CORHIO network portal to augment information that might be missing on cases reported through pathology lab reporting systems and to identify cancer cases that were not reported.70 The cancer registry staff reduced time spent calling providers to obtain additional information and improved the completeness of case finding by collaborating with the HIE. Other states have developed use cases for reporting to cancer registries directly from the provider’s EHR system via HIE.
Direct reporting is also common with immunization registries. For example, the Michigan Care Improvement Registry (MCIR) is a lifespan registry that contains immunization records for all Michigan residents. The Great Lakes Health Connect HIE enables participants to directly transmit immunization records to MCIR, thereby reducing provider reporting time. The MCIR also uses the HIE to improve immunization record accessibility and to exchange immunization records across state lines. Indiana healthcare providers can send immunization records for Michigan residents to the MCIR via HIE through a collaboration between the Michigan Health Information Network Shared Services and the Michiana Health Information Network. Patients living in the border areas of these two states often travel back and forth across state lines and may access healthcare in both states. Enabling providers to share health information across state lines is an important step in improving continuity of care for these patients and in improving the completeness of immunization records.71
To date, HIE networks and registries have largely collaborated on state-mandated registries, but there are opportunities to leverage HIE networks to create or enhance registries. In particular, HIE data may be a useful source for identifying potential patients for inclusion in a registry. For example, the Maine Health InfoNet HIE network was used to identify patients with congestive heart failure and diabetes via natural language processing72, 73 and to predict incident essential hypertension using machine learning models.74 These efforts could be extended to the establishment of patient registries if appropriate patient consent and data governance rules are established.
Sharing of protected health information (PHI) is a significant barrier to leveraging HIE networks for registry development. State-mandated registries, such as those discussed above, typically have legal authority for the exchange of PHI without explicit informed consent from the patient. However, most other types of registries would need to address the issue of patient consent.75 The underlying model of an HIE may also affect its usefulness for registry activities. HIEs that use a centralized data repository are better suited for aggregating and analyzing data than models in which data are “owned” and maintained at the individual site and are not easily aggregated. However, data currency is an issue in any model where data must be “pushed” to a central repository. While HIEs represent a potential source of data for registries, further work is needed to understand barriers and to develop clear use cases beyond state-mandated registries.
Conclusion
As the ecosystem of health data expands, registries increasingly are interested in linking to or integrating data from other sources to minimize the burden of data entry and to address specific research objectives. Beyond EHRs, many sources of relevant data exist. However, incorporation of these data sources is challenging in many cases. Registries should carefully consider the purpose of the registry and the suitability of the data source for achieving that purpose, as well as the legal and ethical implications of incorporating other data sources, as a first step before addressing the technical interoperability challenges discussed in the next two chapters of this document. Further research to develop tools to help registries understand the quality of other data sources and the potential impact of incorporating these data into the registry database also would be useful to help inform these decisions.
References for Chapter 2
- 1.
- Blumenthal S. The Use of Clinical Registries in the United States: A Landscape Survey. EGEMS (Wash DC). 2017;5(1):26. PMID: 29930965. DOI: 10.5334/egems.248. [PMC free article: PMC5994955] [PubMed: 29930965] [CrossRef]
- 2.
- HIMSS. Electronic Health Records Definition. https://www
.himss.org. Accessed June 10, 2019. - 3.
- Virnig B, Parsons H. Strengths and Limitations of CMS Administrative Data in Research. Research Data Assistance Center (ResDAC). https://www
.resdac.org /articles/strengths-and-limitations-cms-administrative-data-research. Accessed June 10, 2019. - 4.
- Centers for Medicare & Medicaid Services. Medicare Enrollment Dashboard. https://www
.cms.gov/Research-Statistics-Data-and-Systems /Statistics-Trends-and-Reports /Dashboard/Medicare-Enrollment /Enrollment%20Dashboard .html. Accessed June 10, 2019. - 5.
- Centers for Medicare and Medicaid Services. March 2019 Medicaid & CHIP Enrollment Data Highlights. https://www
.medicaid .gov/medicaid/program-information /medicaid-and-chip-enrollment-data /report-highlights/index.html. Accessed June 10, 2019. - 6.
- Research Data Assistance Center (ResDAC). https://www
.resdac.org/cms-data. Accessed June 10, 2019. - 7.
- Doshi JA, Hendrick FB, Graff JS, et al. Data, Data Everywhere, but Access Remains a Big Issue for Researchers: A Review of Access Policies for Publicly-Funded Patient-Level Health Care Data in the United States. EGEMS (Wash DC). 2016;4(2):1204. PMID: 27141517. DOI: 10.13063/2327-9214.1204. [PMC free article: PMC4827788] [PubMed: 27141517] [CrossRef]
- 8.
- Healthcare Cost and Utilization Project (HCUP). Agency for Healthcare Research and Quality. https://www
.hcup-us.ahrq.gov. Accessed June 10, 2019. - 9.
- Healthcare Cost and Utilization Project (HCUP). Databases. Agency for Healthcare Research and Quality. https://www
.hcup-us.ahrq .gov/databases.jsp. Accessed June 10, 2019. - 10.
- All-Payer Claims Databases. Agency for Healthcare Research and Quality. https://www
.ahrq.gov /professionals/quality-patient-safety /quality-resources /apcd/index.html. Accessed June 24, 2019. - 11.
- Hammill BG, Hernandez AF, Peterson ED, et al. Linking inpatient clinical registry data to Medicare claims data using indirect identifiers. Am Heart J. 2009;157(6):995–1000. PMID: 19464409. DOI: 10.1016/j.ahj.2009.04.002. [PMC free article: PMC2732025] [PubMed: 19464409] [CrossRef]
- 12.
- Bozic KJ, Bashyal RK, Anthony SG, et al. Is administratively coded comorbidity and complication data in total joint arthroplasty valid? Clin Orthop Relat Res. 2013;471(1):201–5. PMID: 22528384. DOI: 10.1007/s11999-012-2352-1. [PMC free article: PMC3528892] [PubMed: 22528384] [CrossRef]
- 13.
- Alluri RK, Leland H, Heckmann N. Surgical research using national databases. Ann Transl Med. 2016;4(20):393. PMID: 27867945. DOI: 10.21037/atm.2016.10.49. [PMC free article: PMC5107400] [PubMed: 27867945] [CrossRef]
- 14.
- Joseph L, Bashir M, Xiang Q, et al. Prevalence and Outcomes of Mitral Stenosis in Patients Undergoing Transcatheter Aortic Valve Replacement: Findings From the Society of Thoracic Surgeons/American College of Cardiology Transcatheter Valve Therapies Registry. JACC Cardiovasc Interv. 2018;11(7):693–702. PMID: 29622149. DOI: 10.1016/j.jcin.2018.01.245. [PubMed: 29622149] [CrossRef]
- 15.
- Curtis JR, Chen L, Greenberg JD, et al. The clinical status and economic savings associated with remission among patients with rheumatoid arthritis: leveraging linked registry and claims data for synergistic insights. Pharmacoepidemiol Drug Saf. 2017;26(3):310–9. PMID: 28028867. DOI: 10.1002/pds.4126. [PMC free article: PMC5332325] [PubMed: 28028867] [CrossRef]
- 16.
- Reeves MJ, Fonarow GC, Smith EE, et al. Representativeness of the Get With The Guidelines-Stroke Registry: comparison of patient and hospital characteristics among Medicare beneficiaries hospitalized with ischemic stroke. Stroke. 2012;43(1):44–9. PMID: 21980197. DOI: 10.1161/STROKEAHA.111.626978. [PubMed: 21980197] [CrossRef]
- 17.
- National Cancer Institute. SEER-Medicare Publications by Journal & Year. https:
//healthcaredelivery .cancer.gov/seermedicare /overview/pubs_jour_year.php. Accessed June 10, 2019. - 18.
- The Office of the National Coordinator of Health Information Technology. U.S. Department of Health and Human Services. Definition of Patient-Generated Health Data. https://www
.healthit .gov/topic/scientific-initiatives /patient-generated-health-data. Accessed June 10, 2019. - 19.
- Reeder B, David A. Health at hand: A systematic review of smart watch uses for health and wellness. J Biomed Inform. 2016;63:269–76. PMID: 27612974. DOI: 10.1016/j.jbi.2016.09.001. [PubMed: 27612974] [CrossRef]
- 20.
- Henriksen A, Haugen Mikalsen M, Woldaregay AZ, et al. Using Fitness Trackers and Smartwatches to Measure Physical Activity in Research: Analysis of Consumer Wrist-Worn Wearables. J Med Internet Res. 2018;20(3):e110. PMID: 29567635. DOI: 10.2196/jmir.9157. [PMC free article: PMC5887043] [PubMed: 29567635] [CrossRef]
- 21.
- de Zambotti M, Baker FC, Colrain IM. Validation of Sleep-Tracking Technology Compared with Polysomnography in Adolescents. Sleep. 2015;38(9):1461–8. PMID: 26158896. DOI: 10.5665/sleep.4990. [PMC free article: PMC4531414] [PubMed: 26158896] [CrossRef]
- 22.
- de Zambotti M, Goldstone A, Claudatos S, et al. A validation study of Fitbit Charge 2 compared with polysomnography in adults. Chronobiol Int. 2018;35(4):465–76. PMID: 29235907. DOI: 10.1080/07420528.2017.1413578. [PubMed: 29235907] [CrossRef]
- 23.
- Garabelli P, Stavrakis S, Po S. Smartphone-based arrhythmia monitoring. Curr Opin Cardiol. 2017;32(1):53–7. PMID: 27875477. DOI: 10.1097/HCO.0000000000000350. [PubMed: 27875477] [CrossRef]
- 24.
- Macinnes M, Martin N, Fulton H, et al. Comparison of a smartphone-based ECG recording system with a standard cardiac event monitor in the investigation of palpitations in children. Arch Dis Child. 2019;104(1):43–7. PMID: 29860228. DOI: 10.1136/archdischild-2018-314901. [PubMed: 29860228] [CrossRef]
- 25.
- Melville S, Teskey R, Philip S, et al. A Comparison and Calibration of a Wrist-Worn Blood Pressure Monitor for Patient Management: Assessing the Reliability of Innovative Blood Pressure Devices. J Med Internet Res. 2018;20(4):e111. PMID: 29695375. DOI: 10.2196/jmir.8009. [PMC free article: PMC5943631] [PubMed: 29695375] [CrossRef]
- 26.
- Chandrasekhar A, Kim CS, Naji M, et al. Smartphone-based blood pressure monitoring via the oscillometric finger-pressing method. Sci Transl Med. 2018;10(431). PMID: 29515001. DOI: 10.1126/scitranslmed.aap8674. [PMC free article: PMC6039119] [PubMed: 29515001] [CrossRef]
- 27.
- Milani RV, Lavie CJ, Bober RM, et al. Improving Hypertension Control and Patient Engagement Using Digital Tools. Am J Med. 2017;130(1):14–20. PMID: 27591179. DOI: 10.1016/j.amjmed.2016.07.029. [PubMed: 27591179] [CrossRef]
- 28.
- Heintzman ND. A Digital Ecosystem of Diabetes Data and Technology: Services, Systems, and Tools Enabled by Wearables, Sensors, and Apps. J Diabetes Sci Technol. 2015;10(1):35–41. PMID: 26685994. DOI: 10.1177/1932296815622453. [PMC free article: PMC4738231] [PubMed: 26685994] [CrossRef]
- 29.
- Basatneh R, Najafi B, Armstrong DG. Health Sensors, Smart Home Devices, and the Internet of Medical Things: An Opportunity for Dramatic Improvement in Care for the Lower Extremity Complications of Diabetes. J Diabetes Sci Technol. 2018;12(3):577–86. PMID: 29635931. DOI: 10.1177/1932296818768618. [PMC free article: PMC6154231] [PubMed: 29635931] [CrossRef]
- 30.
- Garde A, Dehkordi P, Wensley D, et al. Pulse oximetry recorded from the Phone Oximeter for detection of obstructive sleep apnea events with and without oxygen desaturation in children. Conf Proc IEEE Eng Med Biol Soc. 2015;2015:7692–5. PMID: 26738074. DOI: 10.1109/EMBC.2015.7320174. [PubMed: 26738074] [CrossRef]
- 31.
- The Office of the National Coordinator for Health Information Technology. U.S. Department of Health and Human Services. Conceptualizing a Data Infrastructure for the Capture, Use, and Sharing of Patient-Generated Health Data in Care Delivery and Research through 2024. White Paper. [Prepared by Accenture Federal Services for the under Contract No. HHSP233201500093I, Order No. HHSP23337001T]. January 2018. https://www
.healthit .gov/sites/default/files /onc_pghd_final_white_paper.pdf. Accessed June 10, 2019. - 32.
- Powell AC, Landman AB, Bates DW. In search of a few good apps. JAMA. 2014;311(18):1851–2. PMID: 24664278. DOI: 10.1001/jama.2014.2564. [PubMed: 24664278] [CrossRef]
- 33.
- Partners HealthCare: Connecting Heart Failure Patients to Providers through Remote Monitoring. Commonwealth Fund, January 30, 2013. https://www
.commonwealthfund .org/publications /case-study/2013 /jan/partners-healthcare-connecting-heart-failure-patients-providers. Accessed June 10, 2019. - 34.
- Genes N, Violante S, Cetrangol C, et al. From smartphone to EHR: a case report on integrating patient-generated health data. npj Digital Medicine. 2018;1(1):23. DOI: 10.1038/s41746-018-0030-8. [PMC free article: PMC6550195] [PubMed: 31304305] [CrossRef]
- 35.
- Howie L, Hirsch B, Locklear T, et al. Assessing the value of patient-generated data to comparative effectiveness research. Health Aff (Millwood). 2014;33(7):1220–8. PMID: 25006149. DOI: 10.1377/hlthaff.2014.0225. [PubMed: 25006149] [CrossRef]
- 36.
- Dreyer NA, Blackburn SC, Mt-Isa S, et al. Direct-to-Patient Research: Piloting a New Approach to Understanding Drug Safety During Pregnancy. JMIR Public Health Surveill. 2015;1(2):e22. PMID: 27227140. DOI: 10.2196/publichealth.4939. [PMC free article: PMC4869223] [PubMed: 27227140] [CrossRef]
- 37.
- All of Us. National Institutes of Health. https://allofus
.nih.gov/. Accessed June 11, 2019. - 38.
- Bresnick J. AMA, Google Launch Health Data Interoperability, PGHD Challenge. Health IT Analytics. April 9, 2018.
- 39.
- Patient-Centered Outcomes Research Institute. Using Patient Generated Health Data to Transform Healthcare. https://www
.pcori.org /research-results/2017 /using-patient-generated-health-data-transform-healthcare. Accessed June 10, 2019. - 40.
- Duke-Margolis Center for Health Policy. Mobilizing mHealth Innovation for Real-World Evidence Generation. September 2017. https:
//healthpolicy .duke.edu/sites/default /files/atoms/files /mobilizing_mhealth _innovation_for_real-world _evidence_generation.pdf. Accessed June 10, 2019. - 41.
- Murakami H, Kawakami R, Nakae S, et al. Accuracy of Wearable Devices for Estimating Total Energy Expenditure: Comparison With Metabolic Chamber and Doubly Labeled Water Method. JAMA Intern Med. 2016;176(5):702–3. PMID: 26999758. DOI: 10.1001/jamainternmed.2016.0152. [PubMed: 26999758] [CrossRef]
- 42.
- Wood WA, Bennett AV, Basch E. Emerging uses of patient generated health data in clinical research. Mol Oncol. 2015;9(5):1018–24. PMID: 25248998. DOI: 10.1016/j.molonc.2014.08.006. [PMC free article: PMC5528746] [PubMed: 25248998] [CrossRef]
- 43.
- Bietz MJ, Bloss CS, Calvert S, et al. Opportunities and challenges in the use of personal health data for health research. J Am Med Inform Assoc. 2016;23(e1):e42–8. PMID: 26335984. DOI: 10.1093/jamia/ocv118. [PMC free article: PMC4954630] [PubMed: 26335984] [CrossRef]
- 44.
- Duffy MJ, Harbeck N, Nap M, et al. Clinical use of biomarkers in breast cancer: Updated guidelines from the European Group on Tumor Markers (EGTM). European journal of cancer (Oxford, England : 1990). 2017;75:284–98. PMID: 28259011. DOI: 10.1016/j.ejca.2017.01.017. [PubMed: 28259011] [CrossRef]
- 45.
- Ginsburg GS, Phillips KA. Precision Medicine: From Science To Value. Health Aff (Millwood). 2018;37(5):694–701. PMID: 29733705. DOI: 10.1377/hlthaff.2017.1624. [PMC free article: PMC5989714] [PubMed: 29733705] [CrossRef]
- 46.
- National Cancer Institute. About Genomic Data Commons. https://gdc
.cancer.gov /about-data/data-sources. Accessed June 10, 2019. - 47.
- Gainotti S, Torreri P, Wang CM, et al. The RD-Connect Registry & Biobank Finder: a tool for sharing aggregated data and metadata among rare disease researchers. Eur J Hum Genet. 2018;26(5):631–43. PMID: 29396563. DOI: 10.1038/s41431-017-0085-z. [PMC free article: PMC5945774] [PubMed: 29396563] [CrossRef]
- 48.
- AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discovery. 2017;7(8):818. [PMC free article: PMC5611790] [PubMed: 28572459]
- 49.
- Howell RR, Zuchner S. MOVR-NeuroMuscular ObserVational Research, a unified data hub for neuromuscular diseases. Genet Med. 2019;21(3):536–8. PMID: 29934516. DOI: 10.1038/s41436-018-0086-5. [PubMed: 29934516] [CrossRef]
- 50.
- Adams JU. Genetics: Big hopes for big data. Nature. 2015;527(7578):S108–9. PMID: 26580158. DOI: 10.1038/527S108a. [PubMed: 26580158] [CrossRef]
- 51.
- Khan A, Capps BJ, Sum MY, et al. Informed consent for human genetic and genomic studies: a systematic review. Clin Genet. 2014;86(3):199–206. PMID: 24646408. DOI: 10.1111/cge.12384. [PubMed: 24646408] [CrossRef]
- 52.
- Liebeskind DS. Innovative Interventional and Imaging Registries: Precision Medicine in Cerebrovascular Disorders. Interv Neurol. 2015;4(1–2):5–17. PMID: 26600792. DOI: 10.1159/000438773. [PMC free article: PMC4640079] [PubMed: 26600792] [CrossRef]
- 53.
- Kohli MD, Summers RM, Geis JR. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session. J Digit Imaging. 2017;30(4):392–9. PMID: 28516233. DOI: 10.1007/s10278-017-9976-3. [PMC free article: PMC5537092] [PubMed: 28516233] [CrossRef]
- 54.
- Inmon WH. Building the data warehouse: John Wiley & Sons; 2005.
- 55.
- Huser V, Cimino JJ. Desiderata for healthcare integrated data repositories based on architectural comparison of three public repositories. AMIA Annu Symp Proc. 2013;2013:648–56. PMID: 24551366. [PMC free article: PMC3900207] [PubMed: 24551366]
- 56.
- Foran DJ, Chen W, Chu H, et al. Roadmap to a Comprehensive Clinical Data Warehouse for Precision Medicine Applications in Oncology. Cancer Inform. 2017;16:1176935117694349. PMID: 28469389. DOI: 10.1177/1176935117694349. [PMC free article: PMC5392017] [PubMed: 28469389] [CrossRef]
- 57.
- Horton I, Lin Y, Reed G, et al. Empowering Mayo Clinic Individualized Medicine with Genomic Data Warehousing. J Pers Med. 2017;7(3). PMID: 28829408. DOI: 10.3390/jpm7030007. [PMC free article: PMC5618153] [PubMed: 28829408] [CrossRef]
- 58.
- Ajayi OJ, Smith EJ, Viangteeravat T, et al. Multisite Semiautomated Clinical Data Repository for Duplication 15q Syndrome: Study Protocol and Early Uses. JMIR Res Protoc. 2017;6(10):e194. PMID: 29046268. DOI: 10.2196/resprot.7989. [PMC free article: PMC5666222] [PubMed: 29046268] [CrossRef]
- 59.
- Kunjan K, Toscos T, Turkcan A, et al. A Multidimensional Data Warehouse for Community Health Centers. AMIA Annu Symp Proc. 2015;2015:1976–84. PMID: 26958297. [PMC free article: PMC4765670] [PubMed: 26958297]
- 60.
- Davies M, Erickson K, Wyner Z, et al. Software-Enabled Distributed Network Governance: The PopMedNet Experience. EGEMS (Wash DC). 2016;4(2):1213. PMID: 27141522. DOI: 10.13063/2327-9214.1213. [PMC free article: PMC4827783] [PubMed: 27141522] [CrossRef]
- 61.
- Skripcak T, Belka C, Bosch W, et al. Creating a data exchange strategy for radiotherapy research: towards federated databases and anonymised public datasets. Radiother Oncol. 2014;113(3):303–9. PMID: 25458128. DOI: 10.1016/j.radonc.2014.10.001. [PMC free article: PMC4648243] [PubMed: 25458128] [CrossRef]
- 62.
- Evans RS, Lloyd JF, Pierce LA. Clinical Use of an Enterprise Data Warehouse. AMIA Annual Symposium Proceedings. 2012;2012:189–98. PMID: PMC3540441. [PMC free article: PMC3540441] [PubMed: 23304288]
- 63.
- Danciu I, Cowan JD, Basford M, et al. Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform. 2014;52:28–35. PMID: 24534443. DOI: 10.1016/j.jbi.2014.02.003. [PMC free article: PMC4133331] [PubMed: 24534443] [CrossRef]
- 64.
- Jannot AS, Zapletal E, Avillach P, et al. The Georges Pompidou University Hospital Clinical Data Warehouse: A 8-years follow-up experience. Int J Med Inform. 2017;102:21–8. PMID: 28495345. DOI: 10.1016/j.ijmedinf.2017.02.006. [PubMed: 28495345] [CrossRef]
- 65.
- Weng C, Bigger JT, Busacca L, et al. Comparing the effectiveness of a clinical registry and a clinical data warehouse for supporting clinical trial recruitment: a case study. AMIA Annu Symp Proc. 2010;2010:867–71. PMID: 21347102. [PMC free article: PMC3041383] [PubMed: 21347102]
- 66.
- Zhang Q, Matsumura Y, Teratani T, et al. The application of an institutional clinical data warehouse to the assessment of adverse drug reactions (ADRs). Evaluation of aminoglycoside and cephalosporin associated nephrotoxicity. Methods Inf Med. 2007;46(5):516–22. PMID: 17938772. [PubMed: 17938772]
- 67.
- O’Leary KJ, Devisetty VK, Patel AR, et al. Comparison of traditional trigger tool to data warehouse based screening for identifying hospital adverse events. BMJ Qual Saf. 2013;22(2):130–8. PMID: 23038408. DOI: 10.1136/bmjqs-2012-001102. [PubMed: 23038408] [CrossRef]
- 68.
- Connolly D, Adagarla B, Nair M, et al. SEINE: Methods for Electronic Data Capture and Integrated Data Repository Synthesis with Patient Registry Use Cases. 2014.
- 69.
- Harris AH, Chen C, Rubinsky AD, et al. Are Improvements in Measured Performance Driven by Better Treatment or “Denominator Management”? J Gen Intern Med. 2016;31: Suppl 1:21–7. PMID: 26951270. DOI: 10.1007/s11606-015-3558-1. [PMC free article: PMC4803672] [PubMed: 26951270] [CrossRef]
- 70.
- State Public Health Department Enhancing Disease Reporting With HIE Data. http://www
.corhio.org /news/2015/9/30/547-state-public-health-department-enhancing-disease-reporting-with-hie-data. Accessed June 10, 2019. - 71.
- Michigan, Indiana in Full Production for Interstate Health Information Exchange. https://detroit
.cbslocal .com/2013/03/04/michigan-indiana-in-full-production-for-interstate-health-information-exchange/. Accessed June 10, 2019. - 72.
- Wang Y, Luo J, Hao S, et al. NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records. Int J Med Inform. 2015;84(12):1039–47. PMID: 26254876. DOI: 10.1016/j.ijmedinf.2015.06.007. [PubMed: 26254876] [CrossRef]
- 73.
- Zheng L, Wang Y, Hao S, et al. Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records. JMIR Med Inform. 2016;4(4):e37. PMID: 27836816. DOI: 10.2196/medinform.6328. [PMC free article: PMC5124114] [PubMed: 27836816] [CrossRef]
- 74.
- Ye C, Fu T, Hao S, et al. Prediction of Incident Hypertension Within the Next Year: Prospective Study Using Statewide Electronic Health Records and Machine Learning. J Med Internet Res. 2018;20(1):e22. PMID: 29382633. DOI: 10.2196/jmir.9268. [PMC free article: PMC5811646] [PubMed: 29382633] [CrossRef]
- 75.
- Mello MM, Adler-Milstein J, Ding KL, et al. Legal Barriers to the Growth of Health Information Exchange-Boulders or Pebbles? Milbank Q. 2018;96(1):110–43. PMID: 29504197. DOI: 10.1111/1468-0009.12313. [PMC free article: PMC5835678] [PubMed: 29504197] [CrossRef]
- Data Sources - Tools and Technologies for Registry Interoperability, Registries ...Data Sources - Tools and Technologies for Registry Interoperability, Registries for Evaluating Patient Outcomes: A User’s Guide, 3rd Edition, Addendum 2
Your browsing activity is empty.
Activity recording is turned off.
See more...