U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gliklich RE, Leavy MB, Dreyer NA, editors. Registries for Evaluating Patient Outcomes: A User’s Guide [Internet]. 4th edition. Rockville (MD): Agency for Healthcare Research and Quality (US); 2020 Sep.

Cover of Registries for Evaluating Patient Outcomes: A User’s Guide

Registries for Evaluating Patient Outcomes: A User’s Guide [Internet]. 4th edition.

Show details

Chapter 11Obtaining Data and Quality Assurance

1. Introduction

This chapter focuses on the procedures for obtaining registry data and associated quality assurance principles. Data management—the integrated system for obtaining, cleaning, storing, monitoring, reviewing, and reporting on registry data—determines the utility of the data for meeting the goals of the registry. Quality assurance, on the other hand, aims to assure that the data were, in fact, collected in accordance with these procedures and that the data stored in the registry database meet the requisite standards of quality, which are generally defined based on the intended purposes. In this chapter, the term registry coordinating activities refers to the centralized procedures performed for a registry, and the term registry coordinating center refers to the entity or entities performing these procedures and overseeing the registry activities at the site and patient levels.

Because the range of registry purposes can be broad, a similar range of data collection procedures may be acceptable, but only certain methodologies may be suitable for particular purposes. Furthermore, certain end users of the data may require that data collection or validation be performed in accordance with their own guidelines or standards. For example, a registry that collects data electronically and intends for those data to be used by the U.S. Food and Drug Administration (FDA) should meet the systems validation requirements of that end user of the data, such as Title 21 of the Code of Federal Regulations Part 11 (21 CFR Part 11). Such requirements may have a substantial effect on the registry procedures. Similarly, registries may be subject to specific processes depending on the type of data collected, the types of authorization obtained, and the applicable governmental regulations.

Requirements for data collection and for quality assurance should be defined during the registry inception and creation phases. Certain requirements may have significant cost implications, and these should be assessed on a cost-to-benefit basis in the context of the intended purposes of the registry. This chapter describes a wide range of centralized and distributed data collection and quality assurance activities currently in use or expected to become more commonly used in patient registries.

2. Obtaining Data

2.1. Database Requirements and Case Report Forms

Chapter 1 defined key characteristics of patient registries for evaluating patient outcomes. They include specific and consistent data definitions for collecting data elements in a uniform manner for every patient. As in randomized controlled trials, the case report form (CRF) is the paradigm for the data structure of the registry. A CRF is a formatted listing of data elements that can be presented in paper or electronic formats. Those data elements and data entry options in a CRF are represented in the database schema of the registry by patient-level variables. Defining the registry CRFs and corresponding database schema are the first steps in data collection for a registry. Chapter 5 escribes the selection of data elements for a registry. All data elements should be modeled within the CRF, even those that might be obtained from secondary sources (Chapter 6). For data that are obtained from secondary sources, the CRF should be structured so that it does not change the meaning of a secondary data element (for instance, populating a series of co-morbidities on a CRF with diagnoses obtained from the electronic health record (EHR).

Two related documents should also be considered part of the database specification: the data dictionary (including data definitions and parameters) and the data validation rules, also known as queries or edit checks. The data dictionary and definitions describe both the data elements and how those data elements are interpreted. The data dictionary contains a detailed description of each variable used by the registry, including the source of the variable, coding information if used, and normal ranges if relevant. For example, the term “current smoker” should be defined as to whether “smoker” refers to tobacco or other substances and whether “current” refers to active or within a recent time period (e.g., within the last year). Data validation rules refer to the logical checks on data entered into the database against predefined rules for either value ranges (e.g., systolic blood pressure less than 300 mmHg) or logical consistency with respect to other data fields for the same patient; these are described more fully below. While neither registry database structures nor database requirements are standardized, the Clinical Data Interchange Standards Consortium (CDISC)1 is actively working on representative models of data interchange and portability using standardized concepts and formats. Chapter 5 further discusses these models, which are applicable to registries as well as clinical trials.

2.2. Procedures and Personnel

Data collection procedures need to be carefully considered in planning the operations of a registry. Successful registries depend on a sustainable workflow model that can be integrated into the day-to-day clinical practice of active physicians, nurses, pharmacists, and patients, with minimal disruption. Registry developers can benefit tremendously from preliminary input from the healthcare workers, study coordinators, or patients who are likely to be participants.

2.2.1. Pilot Testing

One method of gathering input from likely participants before the full launch of a registry is pilot testing. Whereas feasibility testing, which is discussed in Chapter 2, focuses on whether a registry should be implemented, pilot testing focuses on how it should be implemented. Piloting can range from testing a subset of the procedures, CRFs, or data capture systems, to a full launch of the registry at a limited subset of sites with a limited number of patients.

The key to effective pilot testing is to conduct it at a point where the results of the pilot can still be used to modify the registry implementation. Through pilot testing, one can assess comprehension, acceptance, feasibility, and other factors that influence how readily the patient registry processes will fit into patient lifestyles and the normal practices of the healthcare provider. Chapter 5 discusses pilot testing in more detail.

2.2.2. Documentation of Procedures

The data collection procedures for each registry should be clearly defined and described in a detailed manual. The term manual here refers to the reference information in any appropriate form, including hard copy, electronic, or via interactive Web or software-based systems. Although the detail of this manual may vary from registry to registry depending on the intended purpose, the required information generally includes protocols, policies, and procedures; the data collection instrument(s); and a listing of all the data elements and their full definitions. If the registry has optional fields (i.e., fields that do not have to be completed on every patient), these should be clearly specified.

In addition to patient inclusion and exclusion criteria, the screening process should be specified, as should any documentation to be retained at the site level and any plans for monitoring or auditing of screening practices. If sampling is to be performed, the method or systems used should be explained, and tools should be provided to simplify this process for the sites. The manual should clearly explain how patient identification numbers are created or assigned and how duplicate records should be prevented. Any required training for data collectors should also be described.

If paper CRFs are used, the manual should describe specifically how they are used and which parts of the forms (e.g., two-part or three-part no-carbon-required forms) should be retained, copied, submitted, or archived. If electronic CRFs are used, clear user manuals and instructions should be available. These procedures are an important resource for all personnel involved in the registry as well as for external auditors who might be asked to assure the quality of the registry.

The importance of standardizing procedures to ensure that the registry uses uniform and systematic methods for collecting data cannot be overstated. At the same time, some level of customization of data entry methods may be required or permitted to enable the participation of particular sites or subgroups of patients within some practices. As discussed in Chapter 10, if the registry provides payments to sites for participation, then the specific requirements for site payments should be clearly documented, and this information should be provided with the registry documents.

2.2.3. Personnel

All personnel involved in data collection should be identified, and their job descriptions and respective roles in data collection and processing should be described. Examples of such “roles” include patient, physician, data entry personnel, site coordinator, help desk, data manager, data analyst, quality analyst, terminologist, and monitor. The necessary documentation or qualification required for any role should be specified in the registry documentation. As an example, some registries require personnel documentation such as a curriculum vitae, protocol signoff, attestation of intent to follow registry procedures, or confirmation of completion of specified training.

2.3. Data Sources

The sources of data for a registry may include new information collected from the patient, new or existing information reported by or derived from the clinician and the medical record, and ancillary stores of patient information, such as laboratory results. Since registries for evaluating patient outcomes should employ uniform and systematic methods of data collection, all data-related procedures—including the permitted sources of data; the data elements and their definitions; and the validity, reliability, or other quality requirements for the data collected from each source—should be predetermined and defined for all collectors of data. As described in Section 3 below, data quality is dependent on the entire chain of data collection and processing. Therefore, the validity and quality of the registry data as a whole ultimately derive from the least, not the most, rigorous link.

In Chapter 6, data sources are classified as primary or secondary, based on the relationship of the data to the registry purpose and protocol. Primary data sources incorporate data collected for direct purposes of the registry (i.e., primarily for the registry). Secondary data sources consist of data originally collected for purposes other than the registry (e.g., standard medical care, insurance claims processing). A registry may contain one or both kinds of data sources. The sections below incorporate and expand on these definitions.

2.3.1. Primary Data Collection

The major sources of primary data in a registry are patients and clinicians. Patient-reported data are data specifically collected from the patient for the purposes of the registry rather than interpreted through a clinician or an indirect data source (e.g., laboratory value, pharmacy records). Such data may range from basic demographic information to validated scales of patient-reported outcomes (PROs). From an operational perspective, a wide range of issues should be considered in obtaining data directly from patients. These range from presentation (e.g., font size, language, reading level) to technologies (e.g., paper-and-pencil questionnaires, computer inputs, telephone or voice inputs, or hand-held patient diaries). Mistakes at this level can inadvertently bias patient selection, invalidate certain outcomes, or significantly affect cost. Limiting the access for patient reporting to particular languages or technologies may limit participation. Patients with specific diagnoses may have difficulties with specific technologies (e.g., small font size for visually impaired, paper and pencil for those with rheumatoid arthritis). Other choices, such as providing a PRO instrument in a format or method of delivery that differs from how it was validated (e.g., questionnaire rather than interview), may invalidate the results. For more information on patient-reported outcome development and use, see Chapters 4 and 5.

Clinician-reported or -derived data can also be divided into primary and secondary subcategories. As an example, specific clinician rating scales (e.g., the National Institutes of Health Stroke Scale)2 may be required for the registry but not routinely captured in clinical encounters. Some variables might be collected directly by the clinician for the registry. Data elements that the clinician must collect directly (e.g., because of a particular definition or need to assess a specific comorbidity that may or may not be routinely present in the medical record) should be specified. These designations are important because they determine who can collect the data for a particular registry or what changes must be made in the procedures the clinician follows in recording a medical record for a patient in a registry. Furthermore, the types of error that arise in registries (discussed in Section 3) will differ by the degree of use of primary and secondary sources, as well as other factors. As an example, registries that use medical chart abstracters, as discussed below, may be subject to more interpretive errors.3

2.3.2. Secondary Sources

Data from secondary sources can be obtained in several different ways. These include manual abstraction, direct import, transformation, and computational derivation. Each of these methods are described in the sections below, along with potential caveats related to data quality. Note that these quality issues are different than those that would be uncovered during traditional data cleaning procedures (Section 2.4). Those measures are concerned with data once they have been entered into the CRF. The issues described here occur upstream from that process, as data are transferred out of the secondary source. Manual Abstraction

Manual abstraction is the process by which a data collector other than the clinician interacting with the patient extracts clinician-reported data. While physical examination findings, such as height and weight, or laboratory findings, such as white blood cell counts, are straightforward, abstraction usually involves varying degrees of judgment and interpretation.

Clarity of description and standardization of definitions are essential to the assurance of data quality and to the prevention of interpretive errors when using manual abstraction. Knowledgeable registry personnel should be designated as resources for the data collectors in the field, and processes should be put in place to allow the data collectors in the field continuous access to these designated registry personnel for questions on specific definitions and clinical situations. Registries that span long periods, such as those intended for surveillance, might be well served by a structure that permits the review of definitions on a periodic basis to ensure the timeliness and completeness of data elements and definitions, and to add new data elements and definitions. A new product or procedure introduced after the start of a registry is a common reason for such an update.

Abstracting data is often an arduous and tedious process, especially if free text is involved, and it usually requires a human reader. The reader, whose qualifications may range from a trained “medical record analyst” or other health professional to an untrained research assistant, may need to decipher illegible handwriting (paper or scanned documents), translate obscure abbreviations and acronyms, and understand the clinical content to sufficiently extract the desired information. Registry personnel should develop formal chart abstraction guidelines, documentation of processes and practical definitions of terms, and coding forms for the analysts and reviewers to use. Generally, the guidelines include instructions to search for specific types of data that will go into the registry (e.g., specific diagnoses or laboratory results). Often the analyst will be asked to code the data, using either standardized codes from a codebook (e.g., the ICD-10 [International Classification of Diseases, 10th Revision] code) corresponding to a text diagnosis in a chart, or codes that may be unique to the registry (e.g., a severity scale of 1 to 5).

All abstraction and coding instructions must be carefully documented and incorporated into a data dictionary for the registry. Because of lack of precision in natural language, the clinical data abstracted by different abstracters from the same documents may differ. This is a potential source of error in a registry. To reduce the potential for this source of error, registries should ensure proper training on the registry protocol and procedures, condition(s), data sources, data collection systems, and most importantly, data definitions and their interpretation. While training should be provided for all registry personnel, it is particularly important for non-clinician data abstracters. Training time depends on the nature of the source (charts or CRFs), complexity of the data, and number of data items. A variety of training methods, from live meetings to online meetings to interactive multimedia recordings, have all been used with success.4 Training often includes test abstractions using sample charts. For some purposes, it is best practice to train abstracters using standardized test charts. Such standardized tests can be further used both to obtain data on the inter-rater reliability of the CRFs, definitions, and coding instructions and to determine whether individual abstracters can perform up to a defined minimum standard for the registry. Registries that rely on medical chart abstraction should consider reporting on the performance characteristics associated with abstraction, such as inter-rater reliability.5 Examining and reporting on intra-rater reliability may also be useful. Some key considerations in standardizing medical chart abstractions are:

  • Standardized materials (e.g., definitions, instructions)
  • Standardized training
  • Testing with standardized charts
  • Reporting of inter-rater reliability Direct Import

When data in the secondary source are in electronic format, the simplest method of transmitting secondary data to a registry is through direct import. In this case, there is a 1:1 correlation between the fields in the secondary source and the registry CRF. The questions/fields in the secondary source have exactly the same meaning as the CRF, with the same data types and field formats, so no translation is necessary. It is rare for this scenario to occur for more than a handful of variables. Most fields will have differences in their value sets or formatting that require some sort of transformation or mapping, as described in the next section. Transformation

The most common way that secondary data are imported into a registry is through a transformation process. Data are extracted from the secondary source, transformed to look like the target field, then loaded into the registry. This process is often call extract-transform-load, or ETL. For the purposes of this discussion, a distinction is made between transformation (described here) and derivation (described below). It is a somewhat artificial separation, but important in highlighting some of the issues that must be considered. Transformation refers to the process of translating data into a consistent format to support integration and analysis. In the context of transferring secondary data to a registry, transformation typically involves converting data elements from the secondary source to match the format required by the registry.

With numeric data, or data that are meant to represent numeric values (e.g. height and weight measurements), transformation rules should be established to specify whether to include or remove leading zeroes, decimals, dashes, etc. Even though these data are available in many source systems, they may not be represented in the same manner, so registry personnel should decide if they want to receive the data as they are and then transform them or have personnel at each of the participating sites complete the transformation locally. The former approach can reduce the chance for error, since only one group is developing the transformation rules.

For categorical variables, a simple type of transformation occurs when the same variable/concept is collected in different sources, but the value sets differ. This scenario can often be resolved by creating a mapping between the different value sets, but those mappings should be validated to ensure the meaning of the data are not changed. More complicated situations occur when the secondary source may have two registry concepts captured in a single field or vice versa. This frequently occurred with data on Race and Ethnicity that were captured in EHRs prior to Meaningful Use. Hispanic, which is an Ethnicity value, was often listed as a category under Race, whereas many registries captured Race and Ethnicity as two separate fields. When using the EHR data to populate demographics, the registry personnel had to decide how they to handle transform these data (e.g., if “Hispanic” is entered as a value for Race, populate the Ethnicity field with Hispanic and leave the value for Race as “Unknown”), and what steps they want to take, if any, in dealing with the resulting missing data.

Standardization of concepts is also an important type of transformation. For many medical terms, different data sources may have different local terms to represent the same concept (e.g., acetaminophen vs. Tylenol). In these cases, mapping the local terms to a standard vocabulary (e.g., LOINC, RxNorm) is critical to ensure that data from different data sources are integrated and interpreted appropriately.

If obtaining data from a source that has been mapped to a CDM, it is important to understand the mapping between the original source and the CDM and then from the CDM to the registry. In some cases, a CDM may have a limited value set for a given field, whereas there is a great deal of granularity in the source (e.g., EHR encounter type). In these cases, the CDM value set may not be sufficient for the needs of the registry, and it may be necessary to use the raw source values instead. Many CDMs allow these raw source values to be stored as part of each record, but sites may not always populate them.

Documentation of transformations is an obvious but critical part of maintaining traceability to the original source. A formal documentation process for transformations is not only good practice but may be required for registries used for regulatory purposes. Computational Derivation

Due to the ubiquity of secondary data sources, particularly the EHR, it can be tempting to try to use as much of the data as possible to reduce the primary data collection burden. Given that the data elements in the secondary source may not have exactly the same meaning, some type of computational derivation is required to make the secondary data “fit” the primary registry element. Depending on the quality of these derivations, additional data collection or validation may still be necessary.

Examples of some common types of derivations are the use of EHR data to assign conditions or co-morbidities, or “computable phenotypes.” These can be relatively straightforward, such as declaring that someone has a condition based on the presence of specified diagnosis codes, or they can involve machine learning or other techniques that consider a large number of variables. These phenotype algorithms are often designed for specific purposes (e.g., recruitment for clinical trial, identify those with known disease, identify those at risk), so registry personnel ensure that they are using the algorithm that is best fit for their purpose. The Electronic MEdical Records and Genomics (eMERGE) Network has developed and published a number of computable phenotype algorithms over the years.6 Many of the network’s algorithms, and the algorithms of other researchers, can be found in the PheKB repository.7

An additional example is the use of medication orders to determine exposure or history (ever/never exposure). In this scenario, the completeness of the secondary source will determine how much information can be derived. For instance, with EHR orders, the presence of an order may be indicative of exposure (the “ever” case), but the absence of an order does not necessarily mean that the patient was never exposed. If a registry were collecting medication history, additional followup would likely be necessary.

Another form of computational derivation occurs when trying to extract meaning from narrative text. While a great deal of information is captured in the EHR in structured fields, a substantial portion of information is recorded as text. Physician progress notes, consultations, and radiology reports, are all examples of narrative text that may be typed directly by the clinician or dictated and transcribed (many EHRs also include the ability to generate text based on responses to structured fields, which would not fall into this category, since those responses can be extracted as structured data). While manual abstraction of free text occurs frequently, computational methods can be used to extract information from free text that is stored electronically. This is referred to as natural language processing (NLP), which is another form of computational derivation. The goal of NLP is to parse free text into meaningful components based on a set of rules or mathematical probabilities that enable the program to recognize key words, understand grammatical constructions, and resolve word ambiguities. Information can be extracted and delivered to the registry along with structured data, and both can be stored as structured data in the registry database. In registries where some sites are using NLP to populate a field while other sites use different methods (e.g., abstraction, transformation or direct import), it is worth noting the source of the data, as each introduces different types of potential error.

An increasing number of NLP software packages are available (e.g., cTAKES8, CLAMP,9 MetaMap,10 MedLEE,11 and a number of commercial products). NLP software operates best when it is trained in specific clinical domains with structured documents (e.g., radiology, pathology) that are coupled with large training datasets. Despite significant investment and progress in recent years, it is still relatively difficult to deploy NLP at scale for all-purpose chart abstraction. Projects that have found success tend to operate centralized models with a single processing pipeline, which has a lower cost than trying to do NLP at every site. A centralized approach typically requires the transfer of protected health information (PHI), however, which can present legal and regulatory hurdles.

Computational derivations can add tremendously to the ability for registries to obtain efficient and accurate data from secondary sources. However, in order to use derived data from these technologies such as machine learning or NLP, registry evaluators will need to understand the accuracy and methods of validation for the derivations used to the greatest extent possible. Increasingly, standardized performance metrics will be reported with models used in derivations such as Area Under the Curve (AUC), positive predictive value, precision, recall and so forth.

2.4. Data Entry Systems

Once the primary and any secondary data sources for a registry have been identified, the registry team can determine how data will be entered into the registry database. Many techniques and technologies exist for entering or moving data into the registry database, including paper CRFs, direct data entry, facsimile or scanning systems, interactive voice response systems, and electronic CRFs. There are also different models for how quickly those data reach a central repository for cleaning, reviewing, monitoring, or reporting. Each approach has advantages and limitations, and each registry must balance flexibility (the number of options available) with data availability (when the central repository is populated), data validity (whether all methods are equally able to produce clean data), and cost. Appropriate decisions depend on many factors, including the number of data elements, number of sites, location (local preferences that vary by country, language differences, and availability of different technologies), registry duration, followup frequency, and available resources.

2.4.1. Paper CRFs

With paper CRFs, the clinician enters clinical data on the paper form at the time of the clinical encounter, or other data collectors abstract the data from medical records after the clinical encounter. CRFs may include a wide variety of clinical data on each patient gathered from different sources (e.g., medical chart, laboratory, pharmacy) and from multiple patient encounters. Before the data on formatted paper forms are entered into a computer, the forms should be reviewed for completeness, accuracy, and validity. Paper CRFs can be entered into the database by either direct data entry or computerized data entry via scanning systems.

With direct data entry, a computer keyboard is used to enter data into a database. Key entry has a variable error rate depending on personnel, so an assessment of error rate is usually desirable, particularly when a high volume of data entry is performed. Double data entry is a method of increasing the accuracy of manually entered data by quantifying error rates as discrepancies between two different data entry personnel; data accuracy is improved by having up to two individuals enter the data and a third person review and manage discrepancies. With upfront data validation checks on direct data entry, the likelihood of data entry errors significantly decreases. Therefore, the choice of single versus double data entry should be driven by the requirements of the registry for a particular maximal error rate and the ability of each method to achieve that rate in key measures in the particular circumstance. Double data entry, while a standard of practice for registrational trials, may add significant cost. Its use should be guided by the need to reduce an error rate in key measures and the likelihood of accomplishing that by double data entry as opposed to other approaches. In some situations, assessing the data entry error rates by re-entering a sample of the data is sufficient for reporting purposes.

With hard-copy structured forms, entering data using a scanner and special software to extract the data from the scanned image is possible. If data are recorded on a form as marks in checkboxes, the scanning software enables the user to map the location of each checkbox to the value of a variable represented by the text item associated with the checkbox, and to determine whether the box is marked. The presence of a mark in a box is converted by the software to its corresponding value, which can then be transmitted to a database for storage. If the form contains hand-printed or typed text or numbers, optical character recognition software is often effective in extracting the printed data from the scanned image. However, the print font must be of high quality to avoid translation errors, and spurious marks on the page can cause errors. Error checking is based on automated parameters specified by the operator of the system for exception handling. The comments on assessing error rates in the section above are applicable for scanning systems as well.

2.4.2. Electronic CRFs

An electronic CRF (eCRF) is defined as an auditable electronic form designed to record information required by the clinical trial protocol to be reported to the sponsor on each trial subject.12 An eCRF allows clinician-reported data to be entered directly into the electronic system by the data collector (the clinician or other data collector). Site personnel in many registries still commonly complete an intermediate hard-copy worksheet representing the CRF and subsequently enter the data into the eCRF. While this approach increases work effort and error rates, it is still in use because it is not yet practical for all electronic data entry to be performed at the bedside, during the clinical encounter, or in the midst of a busy clinical day.

An eCRF may originate on local systems (including those on an individual computer, a local area network server, or a hand-held device) or directly from a central database server via an Internet-based connection or a private network. For registries that exist beyond a single site, the data from the local system must subsequently communicate with a central data system. An eCRF may be presented visually (e.g., computer screen) or aurally (e.g., telephonic data entry, such as interactive voice response systems). Specific circumstances will favor different presentations. For example, in one clozapine patient registry, both pharmacists and physicians can obtain and enter data via a telephone-based interactive voice response system as well as a Web-based system. The option is successful in this scenario because telephone access is ubiquitous in pharmacies and the eCRF is very brief.

A common method of electronic data entry is to use Web-based data entry forms. Such forms may be used by patients, providers, and interviewers to enter data into a local repository. The forms reside on servers, which may be located at the site of the registry or co-located anywhere on the Internet. To access a data entry form, a user on a remote computer with an Internet connection opens a browser window and enters the address of the Web server. Typically, a login screen is displayed and the user enters a user identification and password, provided by personnel responsible for the website or repository. Once the server authenticates the user, the data entry form is displayed, and the user can begin entering data. As described in “Cleaning Data,” many electronic systems can perform data validation checks or edits at the time of data entry. When data entry is complete, the user submits the form, which is sent over the Internet to the Web server.

Smart phones or other mobile devices may also be used to submit data to a server to the extent such transmissions can be done with appropriate information security controls. Mobility has recently become an important attribute for clinical data collection. Software has been developed that enables wireless devices to collect data and transmit them over the Internet to database servers in fixed locations. As wireless technology continues to evolve and data transmission rates increase, these will become more essential data entry devices for patients and clinicians.

2.4.3. Electronic Upload

Aside from manual entry, the most common way for data to be transferred to a registry is through electronic transfer, or electronic upload. While this can occur with some primary data, it typically occurs with secondary data sources. The ease of extracting data from electronic systems for use in a registry depends on the design of the registry systems, and the ability of the source to make the requested data accessible. Registry systems may support a variety of input formats (e.g., flat files, web services, etc.), and organizations, including HL7,13 the Office of the National Coordinator for Health Information Technology (ONC), the National Institute of Standards and Technology,14 CDISC, and others, have worked to define a number of common formats.

When electronically transferring data from one system to the registry, additional steps for exception handling are necessary. This can include situations where a record in a secondary source is updated or deemed invalid after the data have already been transferred to the registry. The registry software must be able to receive that notification, flag the erroneous value as invalid, and insert the new, corrected value into its database. Additional logic may be necessary if a registry is receiving the same information from multiple sources that could potentially be in conflict (e.g., date of birth recorded in multiple systems). Finally, it is important to recognize that the use of an electronic-to-electronic interchange requires not only testing but also validation of the integrity and quality of the data transferred. For registries that intend to report data to FDA or to other sponsors or data recipients with similar requirements, including electronic signatures, audit trails, and rigorous system validation, the ways in which the registry interacts with these other systems must be carefully considered.

2.5. Cleaning Data

Data cleaning refers to the correction or amelioration of data problems, including missing values, incorrect or out-of-range values, responses that are logically inconsistent with other responses in the database, and duplicate patient records. While all registries strive for “clean data,” in reality, this is a relative term. How and to what level the data will be cleaned should be addressed upfront in a data management manual that identifies the data elements that are intended to be cleaned, describes the data validation rules or logical checks for out-of-range values, explains how missing values and values that are logically inconsistent will be handled, and discusses how duplicate patient records will be identified and managed.

2.5.1. Data Management Manual

Data managers should develop formal data review guidelines for the reviewers and data entry personnel to use. The guidelines should include information on how to handle missing data; invalid entries (e.g., multiple selections in a single-choice field, alphabetic data in a numeric field); erroneous entries (e.g., patients of the wrong gender answering gender-based questions); and inconsistent data (e.g., an answer to one question contradicting the answer to another one). The guidelines should also include procedures to attempt to remediate these data problems. For example, with a data error on an interview form, it may be necessary to query the interviewer or the patient, or to refer to other data sources that may be able to resolve the problem. Documentation of any data review activity and remediation efforts, including dates, times, and results of the query, should be maintained.

For secondary data sources, the data analyst group should define formal data transformation, cleaning, and monitoring rules and procedures. For example, when multiple updates transactions are uploaded from the data source, the rules and procedures should specify whether all the updates or only the last version of the record should be kept in the registry and whether every field in the record should be updated with the most recent value or only a subset of fields.

2.5.2. Automated Data Cleaning

Ideally, automated data checks are preprogrammed into the database for presentation at the time of data entry or data upload. These data checks are particularly useful for cleaning data at the site level while the patient or medical record is readily accessible. Even relatively simple edit checks, such as range values for laboratories, can have a significant effect on improving the quality of data. Many systems allow for the implementation of more complex data edit checks, and these checks can substantially reduce the amount of subsequent manual data cleaning. A variation of this method is to use data cleaning rules to deactivate certain data fields so that erroneous entries cannot even be made. A combination of these approaches can also be used. It should be noted that specifying that a data check is required (i.e., must be resolved before the user can proceed) can sometimes lower data completeness rates, as users abandon data collection entirely if they cannot resolve the failed required check. For paper-based entry methods, automated data checks are not available at the time the paper CRF is being completed but can be incorporated when the data are later entered into the database.

2.5.3. Manual Data Cleaning

Data managers perform manual data checks or queries to review data for unexpected discrepancies. This is the standard approach to cleaning data that are not entered into the database at the site (e.g., for paper CRFs entered via data entry or scanning). By carefully reviewing the data using both data extracts analyzed by algorithms and hand review, data managers identify discrepancies and generate “queries” to send to the sites to resolve. Even eCRF-based data entry with data validation rules may not be fully adequate to ensure data cleaning for certain purposes. Anticipating all potential data discrepancies at the time that the data management manual and edit checks are developed is very difficult. Therefore, even with the use of automated data validation parameters, some manual cleaning is often still performed. For fields populated with data from secondary data sources, remediation may not be possible (e.g., a field was not populated in the EHR during the patient’s visit) without resorting to primary data collection or including data from yet another secondary source. If multiple secondary sources are considered, data checks will be needed to handle situations where the information may be in conflict (e.g., discrepant values for date of death).

2.5.4. Query Reports

The registry coordinating center should generate, on a periodic basis, query reports that relate to the quality of the data received, based on the data management manual and, for some purposes, additional concurrent review by a data manager. The content of these reports will differ depending on what type of data cleaning is required for the registry purpose and how much automated data cleaning has already been performed. Query reports may include missing data, “out-of-range” data, or data that appear to be inconsistent (e.g., positive pregnancy test for a male patient). They may also identify abnormal trends in data, such as sudden increases or decreases in laboratory tests compared with patient historical averages or clinically established normal ranges. Qualified registry personnel should be responsible for reviewing the abnormal trends with designated site personnel. The most effective approach is for sites to provide one contact representative for purposes of queries or concerns by registry personnel. Depending on the availability of the records and resources at the site to review and respond to queries, resolving all queries can sometimes be a challenge. Creating systematic approaches to maximizing site responsiveness is recommended.

2.5.5. Data Tracking

For most registry purposes, tracking of data received (paper CRFs), data entered, data cleaned, and other parameters is an important component of active registry management. By comparing indicators, such as expected to observed rates of patient enrollment, CRF completion, and query rates, the registry coordinating center can identify problems and potentially take corrective action—either at individual sites or across the registry as a whole. This is discussed further in the Quality Assurance section below.

2.5.6. Coding Data

As further described in earlier in this chapter and in Chapter 5, the use of standardized coding dictionaries is an increasingly important tool in the ability to aggregate registry data with other databases and reduce variation in information semantics. As the health information community adopts standards, registries should routinely apply them unless there are specific reasons not to use such standard codes. While such codes should be implemented in the data dictionaries during registry planning, including all codes in the interface is not always possible. Some free text may be entered as a result. When free text data are entered into a registry, recoding these data using standardized dictionaries (e.g., MedDRA, WHODRUG, SNOMED®) may be worthwhile. There is cost associated with recoding, and in general, it should be limited to data elements that will be used in analysis or that need to be combined or reconciled with other datasets, such as when a common safety database is maintained across multiple registries and studies.

2.5.7. Storing and Securing Data

When data on a form are entered into a computer for inclusion in a registry, the form itself, as well as a log of the data entered, should be maintained for the regulatory archival period. Data errors may be discovered long after the data have been stored in the registry. The error may have been made by the patient or interviewer on the original form or during the data entry process. Examination of the original form and the data entry log should reveal the source of the error. If the error is on the form, correcting it may require re-interviewing the patient. If the error occurred during data entry, the corrected data should be entered and the registry updated. By then, the erroneous registry data may have been used to generate reports or create cohorts for population studies. Therefore, instead of simply replacing erroneous data with corrected data, the registry system should have the ability to flag data as erroneous without deleting them and to insert the corrected data for subsequent use.

Once data are entered into the registry, the registry database should be backed up on a regular basis. There are two basic types of backup, and both types should be considered for use as best practice by the registry coordinating center. The first type is real-time disk backup, which is done by the storage hardware used by the registry server. The second is a regular (e.g., daily) backup of the registry to removable media. In the first case, as data are stored on disk in the registry server, they are automatically replicated to two or more physical hard drives. In the simplest example, called “mirroring,” registry data are stored on a primary disk and an exact replica is stored on the mirrored disk. If either disk fails, data continue to be stored on the mirrored disk until the failed disk is replaced. This failure can be completely transparent to the user, who may continue entering and retrieving data from the registry database during the failure. More complex disk backup configurations exist, in which arrays of disks are used to provide protection from single disk failures.

The second type of periodic backup is needed for disaster recovery. Ideally, a daily backup copy of the registry database stored on removable media should be maintained off site. In case of failure of the registry server or disaster that closes the data center, the backup copy can be brought to a functioning server and the registry database restored, with the only potential loss of data being for the interval between the regularly scheduled backups. The lost data can usually be reloaded from local data repositories or re-entered from hard copy. Other advanced and widely available database solutions and disaster recovery techniques may support a “standby” database that can be located at a remote data center. In case of a failure at the primary data center, the standby database can be used, minimizing downtime and preventing data loss.

2.6. Managing Change

As with all other registry processes, the extent of change management will depend on the types of data being collected, the source(s) of the data, and the overall timeframe of the registry. There are two major drivers behind the need for change during the conduct of a registry: internally driven change to refine or improve the registry or the quality of data collected, and externally driven change that comes as a result of changes in the environment in which the registry is being conducted.

Internally driven change is generally focused on changes to data elements or data validation parameters that arise from site feedback, queries, and query trends that may point to a question, definition, or CRF field that was poorly designed or missing. If this is the case, the registry can use the information coming back from sites or data managers to add, delete, or modify the database requirements, CRFs, definitions, or data management manual as required. At times, more substantive changes, such as the addition of new forms or changes to the registry workflow, may be desirable to examine new conditions or outcomes. Externally driven change generally arises in multiyear registries as new information about the disease and/or product under study becomes available, or as new therapies or products are introduced into clinical practice. Secondary data sources may also change, resulting in registry changes. Change and turnover in registry personnel is another type of change, and one that can be highly disruptive if procedures are not standardized and documented.

A more extensive form of change may occur when a registry either significantly changes its CRFs or changes the underlying database. Longstanding registries address this issue from time to time as information regarding the condition or procedure evolves and data collection forms and definitions require updating.

Proper management of change is crucial to the maintenance of the registry. A consistent approach to change management, including decision making, documentation, data mapping, and validation, is an important aspect of maintaining the quality of the registry and the validity of the data (see Chapter 2). While the specific change management processes might depend on the type and nature of the registry, change management in registries that are designed to evaluate patient outcomes requires, at the very least, the following structures and processes:

  • Detailed manual of procedures: As described earlier, a detailed manual that is updated on a regular basis—containing all the registry policies, procedures, and protocols, as well as a complete data dictionary listing all the data elements and their definitions—is vital for the functioning of a registry. The manual is also a crucial component for managing and documenting change management in a registry.
  • Governing body: As described in Chapters 2 and 8, registries require oversight and advisory bodies for a number of purposes. One of the most important is to manage change on a regular basis. Keeping the registry manual and data definitions up to date is one of the primary responsibilities of this governing body. Large prospective registries, such as the National Surgical Quality Improvement Program, have found it necessary to delegate the updating of data elements and definitions to a special definitions committee.
  • Infrastructure for ongoing training: As mentioned above, change in personnel is a common issue for registries. Specific processes and an infrastructure for training should be available at all times to account for any unanticipated changes and turnover of registry personnel or providers who regularly enter data into the registry.
  • Method to communicate change: Since registries frequently undergo change, there should be a standard approach and timeline for communicating to sites when changes will take place.

In addition to instituting these structures, registries should also plan for change from a budget perspective (Chapter 2) and from an analysis perspective (Chapter 13).

3. Quality Assurance

In determining the utility of a registry for decision making, it is critical to understand the quality of the procedures used to obtain the data and the quality of the data stored in the database. As patient registries that meet sufficient quality criteria (discussed in Chapters 1 and 14) are increasingly being seen as important means to generate evidence regarding effectiveness, safety, and quality of care, the quality of data within the registry must be understood in order to evaluate its suitability for use in decision making. Registry planners should consider how to ensure quality to a level sufficient for the intended purposes (as described below) and should also consider how to develop appropriate quality assurance plans for their registries. Those conducting the registry should assess and report on those quality assurance activities.

Methods of quality assurance will vary depending on the intended purpose of the registry. A registry intended to serve as key evidence for decision making15 (e.g., coverage determinations, product safety evaluations or other regulatory decision making, or performance-based payment) will require higher levels of quality assurance than a registry describing the natural history of a disease. Quality assurance activities generally fall under three main categories: (1) quality assurance of data, (2) quality assurance of registry procedures, and (3) quality assurance of computerized systems. Since many registries are large, the level of quality assurance that can be obtained may be limited by budgetary constraints.

To balance the need for sufficient quality assurance with reasonable resource expenditure for a particular purpose, a risk-based approach to quality assurance is highly recommended. A risk-based approach focuses on the most important sources of error or procedural lapses from the perspective of the registry’s purpose. Such sources of error should be defined during inception and design phases. As described below, registries with different purposes may be at risk for different sources of error and focus on different practices and levels of assessment. Standardization of methods for particular purposes (e.g., national performance measurement) will likely become more common in the future if results are to be combined or compared between registries.

3.1. Assurance of Data Quality

Structures, processes, policies, and procedures need to be put in place to ascertain the quality of the data in the registry and to insure against several types of errors, including:

  • Errors in interpretation or coding: An example of this type of error would be two abstracters looking for the same data element in a patient’s medical record but extracting different data from the same chart. Variations in coding of specific conditions or procedures also fall under the category of interpretive errors. Avoidance or detection of interpretive error includes adequate training on definitions, testing against standard charts, testing and reporting on inter-rater reliability, and re-abstraction.
  • Errors in data entry, transfer, or transformation accuracy: These occur when data are entered into the registry inaccurately—for example, a laboratory value of 2.0 is entered as 20. Avoidance or detection of accuracy errors can be achieved through upfront data quality checks (such as ranges and data validation checks), reentering samples of data to assess for accuracy (with the percent of data to be sampled depending on the study purpose), and rigorous attention to data cleaning.
  • Errors of intention: Examples of intentional distortion of data (often referred to as “gaming”) are inflated reporting of preoperative patient risk in registries that compare risk-adjusted outcomes of surgery or selecting only cases with good outcomes to report (“cherry-picking”). Avoidance or detection of intentional error can be challenging. Some approaches include checking for consistency of data between sites, assessing screening log information against other sources (e.g., billing data), and performing onsite audits (including monitoring of source records) either at random or “for cause.”

Steps for assuring data quality include:

  • Provide training: Educate data collectors/abstracters in a structured manner.
  • Ensure data completeness: When possible, provide sites with immediate feedback on issues such as missing or out-of-range values and logical inconsistencies.
  • Maintain data consistency: Compare data across sites and over time and apply consistent data transformation rules across secondary data sources.
  • Use automatic data quality monitoring and alerting: Data quality control at scale is important for secondary data sources with vast amounts of patient data. Automatic data quality trending, variance, regression monitoring, and alerting based on set thresholds can be more cost efficient.
  • Complete onsite audits for a sample of sites: Review screening logs and procedures and/or samples of data.
  • Complete for-cause audits: Use both predetermined and data-informed methods to identify potential sites at higher suspicion for inaccuracy or intentional errors, such as discrepancies between enrollment and screening logs, narrow data ranges, and overly high or low enrollment.

To further minimize or identify these errors and to ensure the overall quality of the data, the following should be considered.

3.1.1. A Designated Individual Accountable for Data Quality at Each Site

Sites submitting data to a registry should have at least one person who is accountable for the quality of these data, irrespective of whether the person is collecting the data as well. The site coordinator should be fully knowledgeable of all protocols, policies, procedures, and definitions in a registry. The site coordinator should ensure that all site personnel involved in the registry are knowledgeable and that all data transmitted to registry coordinating centers are valid and accurate.

3.1.2. Assessment of Training and Maintenance of Competency of Personnel

Thorough training and documentation of maintenance of competency, for both site and registry personnel, are imperative to the quality of the registry. A detailed and comprehensive operations manual, as described earlier, is crucial for the proper training of all personnel involved in the registry. Routine cognitive testing (surveys) of healthcare provider knowledge of patient registry requirements and appropriate product use should be performed to monitor maintenance of the knowledge base and compliance with patient registry requirements. Retraining programs should be initiated when survey results provide evidence of lack of knowledge maintenance. All registry training programs should provide means by which the knowledge of the data collectors about their registries and their competence in data collection can be assessed on a regular basis, particularly when changes in procedures or definitions are implemented.

3.1.3. Data Quality Audits

As described above, the level to which registry data will be cleaned is influenced by the objectives of the registry, the type of data being collected (e.g., clinical data vs. economic data), the sources of the data (e.g., primary vs. secondary), and the timeframe of the registry (e.g., 3-month followup vs. 10-year followup). These registry characteristics often affect the types and number of data queries that are generated, both electronically and manually. In addition to identifying missing values, incorrect or out-of-range values, or responses that are logically inconsistent with other responses in the database, specifically trained registry personnel can review the data queries to identify possible error trends and to determine whether additional site training is required. For example, such personnel may identify a specific patient outcome question or eCRF field that is generating a larger than average proportion of queries, either from one site or across all registry sites. Using this information, the registry personnel can conduct targeted followup with the sites to retrain them on the correct interpretation of the outcome question or eCRF field, with the goal of reducing the future query rate on that particular question or field. These types of “training tips” can also be addressed in a registry newsletter as a way to maintain frequent but unobtrusive communication with the registry sites.

If the registry purpose requires more stringent verification of the data being entered into the database by registry participants, registry planners may decide to conduct audits of the registry sites. Like queries discussed above, the audit plan for a specific registry will be influenced by the purpose of the registry, the type of data being collected, the source of the data, and the overall timeframe of the registry. In addition, registry developers must find the appropriate balance between the extensiveness of an audit and the impact on overall registry costs. Based on the objectives of the registry, a registry developer can define specific data fields (e.g., key effectiveness variables or adverse event data) on which the audit can be focused.

The term audit may describe examination or verification, may take place onsite (sometimes called monitoring) or offsite, and may be extensive or very limited. The audit can be conducted on a random sample of participating sites (e.g., 5 to 20 percent of registry sites); “for cause” (meaning only when there is an indication of a problem, such as one site being an outlier compared with most others); on a random sample of patients; or using sampling techniques based on geography, practice setting (academic center vs. community hospital), patient enrollment rate, or query rate (“risk-based” audit strategy).

The approach to auditing the quality of the data should reflect the most significant sources of error with respect to the purpose of the registry. This is true for both primary and secondary sources of data. For example, registries used for performance measurement may have a higher risk of exclusion of higher risk patients (“cherry-picking”), and the focus of an audit might be on external sources of data to verify screening log information (e.g., billing data) in addition to data accuracy. Finally, the timeframe of the registry may help determine the audit plan. A registry with a short followup period (e.g., 3 months) may require only one round of audits at the end of the study, prior to database lock and data analysis. For example, in the OPTIMIZE-HF registry, a data quality audit was performed, based on predetermined criteria, on a 5-percent random sample of the first 10,000 patient records verified against source documents.16 For registries with multiyear followup, registry personnel may conduct site audits every 1 or 2 years for the duration of the registry.

In addition to the site characteristics mentioned above, sites that have undergone significant staffing changes during a multiyear registry should be considered prime audit targets to help confirm adequate training of new personnel and to quickly address possible inter-rater variability. To minimize any impact on the observational nature of the registry, the audit plan should be documented in the registry manual.

Subsequent to audits (onsite or remote), communication of findings with site personnel should be conducted face to face, along with followup written communication of findings and opportunities for improvement. As appropriate to meet registry objectives, the sponsor may request corrective actions from the site. Site compliance may also be enhanced with routine communication of data generated from the patient registry system to the site for reconciliation.

3.2. Registry Procedures and Systems

3.2.1. External Audits of Registry Procedures

If registry developers determine that external audits are necessary to ensure the level of quality for the specific purpose(s) of the registry, these audits should be conducted in accordance with pre-established criteria. Pre-established criteria could include monitoring of sites with high patient enrollment or with prior audit history of findings that require attention, or monitoring could be based on level of site experience, rate of serious adverse event reporting, or identified problems. The registry coordinating center may perform monitoring of a sample of sites, which could be focused on one or several areas. This approach could range from reviewing procedures and interviewing site personnel, to checking screening logs, to monitoring individual case records.

The importance of having a complete and detailed registry manual that describes policies, structures, and procedures cannot be overemphasized in the context of quality assurance of registry procedures. Such a manual serves both as a basis for conducting the audits and as a means of documenting changes emanating from these audits. As with data quality audits, feedback of the findings of registry procedure audits should be communicated to all stakeholders and documented in the registry manual.

3.2.2. Assurance of System Integrity and Security

All aspects of data management processes should fall under a rigorous life-cycle approach to system development and quality management. Each process is clearly defined and documented. The concepts described below are consistent across many software industry standards and healthcare industry standards (e.g., 21 CFR Part 11, legal security standards), although some specifics may vary. An internal quality assurance function at the registry coordinating center should regularly audit the processes and procedures described. When third parties other than the registry coordinating center perform activities that interact with the registry systems and data, they are typically assessed for risk and are subject to regular audits by the registry coordinating center.

3.2.3. System Development and Validation

All software systems used for patient registries should follow the standard principles of software development, including following one of the standard software development life-cycle (SDLC) models that are well described in the software industry.

In parallel, quality assurance of system development uses approved specifications to create a validation plan for each project. Test cases are created by trained personnel and systematically executed, with results recorded and reviewed. Depending on regulatory requirements, a final validation report is often written and approved. Unresolved product and process issues are maintained and tracked in an issue tracking or CAPA (Corrective Action/Preventive Action) system.

Processes for development and validation should be similarly documented and periodically audited. The information from these audits is captured, summarized, and reviewed with the applicable group, with the aim of ongoing process improvement and quality improvement.

3.3. Security

All registries maintain health information, and therefore security is an important issue. Chapter 7 discusses applicable Federal laws and regulations. This section discusses some of the components of a security program. Security is achieved not simply through technology but by clear processes and procedures. Overall responsibility for security is typically assigned. Security procedures are well documented and posted. The documentation is also used to train staff. Some registries may also maintain personal information, such as information needed to contact patients to remind them to gather or submit patient-reported outcome information. Like any large databases, registries may be vulnerable to cybersecurity threats. Registries should assess these risks and develop appropriate mitigation strategies, which may include some of the security components described below. However, a full discussion of cybersecurity as it relates to registries is beyond the scope of this document.

3.3.1. System Security Plan

A system security plan consists of documented policies and standard operating procedures defining the rules of systems, including administrative procedures, physical safeguards, technical security services, technical security mechanisms, electronic signatures, and audit trails, as applicable. The rules delineate roles and responsibilities. Included in the rules are the policies specifying individual accountability for actions, access rights based on the principle of least privilege, and the need for separation of duties. These principles and the accompanying security practices provide the foundation for the confidentiality and integrity of registry data. The rules also detail the consequences associated with noncompliance.

3.3.2. Security Assessment

Clinical data maintained in a registry can be assessed for the appropriate level of security. Standard criteria exist for such assessments and are based on the type of data being collected. Part of the validation process is a security assessment of the systems and operating procedures. One of the goals of such an assessment is effective risk management, based on determining possible threats to the system or data and identifying potential vulnerabilities.

3.3.3. Education and Training

All staff members of the registry coordinating center should be trained periodically on aspects of the overall systems, security requirements, and any special requirements of specific patient registries. Individuals should receive training relating to their specific job responsibilities and document that appropriate training has been received.

3.3.4. Access Rights

Access to systems and data should be based on the principles of least privilege and separation of duties. No individual should be assigned access privileges that exceed job requirements, and no individual should be in a role that includes access rights that would allow circumvention of controls or the repudiation of actions within the system. In all cases, access should be limited to authorized individuals.

3.3.5. Access Controls

Access controls provide the basis for authentication and logical access to critical systems and data. Since the authenticity, integrity, and auditability of data stored in electronic systems depend on accurate individual authentication, management of electronic signatures (discussed below) is an important topic.

Logical access to systems and computerized data should be controlled in a way that permits only authorized individuals to gain access to the system. This is normally done through a unique access code, such as a unique user ID and password combination that is assigned to the individual whose identity has been verified and whose job responsibilities require such access. The system should require the user to change the password periodically and should detect possible unauthorized access attempts, such as multiple failed logins, and automatically deauthorize the user account if they occur. The identification code can also be an encrypted digital certificate stored on a password-protected device or a biometric identifier that is designed so that it can be used only by the designated individual.

Rules should be established for situations in which access credentials are compromised. New password information should be sent to the individual by a secure method.

Intrusion detection and firewalls should be employed on sites accessible to the Internet, with appropriate controls and rules in place to limit access to authorized users. Desktop systems should be equipped with antivirus software, and servers should run the most recent security patches. System security should be reviewed throughout the course of the registry to ensure that management, operational, personnel, and technical controls are functioning properly.

3.3.6. Data Enclaves

With the growth of clinical data and demands for increasing amounts of clinical data by multiple parties and researchers, new approaches to access are evolving. Data enclaves are secure, remote-access systems that allow researchers to share respondents’ information in a controlled and confidential manner.17 The data enclave uses statistical, technical, and operational controls at different levels chosen for the specific viewer. This can be useful both for enhancing protection of the data and for enabling certain organizations to access data in compliance with their own organization or agency requirements. Data enclaves also can be used to allow other researchers to access a registry’s data in a controlled manner. With the growth of registries and their utility for a number of stakeholders, data enclaves have become increasingly important.18

3.3.7. Electronic Signatures

Electronic signatures provide one of the foundations of individual accountability, helping to ensure an accurate change history when used in conjunction with secure, computer-generated, time-stamped audit trails. Most systems use an electronic signature. For registries that report data to FDA, such signatures must meet criteria specified in 21 CFR Part 11 for general signature composition, use, and control (sections 11.100, 11.200, and 11.300). However, even registries that do not have such requirements should view these as reasonable standards. Before an individual is assigned an electronic signature, it is important to verify the person’s identity and train the individual in the significance of the electronic signature. In cases where a signature consists of a user ID and a password, both management and technical means should be used to ensure uniqueness and compliance with password construction rules. Password length, character composition, uniqueness, and validity life cycle should be based on industry best practices and guidelines published by the National Institute of Standards and Technology. Passwords used in electronic signatures should abide by the same security and aging constraints as those listed for system access controls.

3.3.8. Validation

Systems that store electronic records (or depend on electronic or handwritten signatures of those records) that are required to be acceptable to FDA must be validated according to the requirements set forth in the 21 CFR Part 11 Final Rule,19 dated March 20, 1997. The rule describes the requirements and controls for electronic systems that are used to fulfill records requirements set forth in agency regulations (often called “predicate rules”) and for any electronic records submitted to the agency. FDA publishes nonbinding guidance documents from time to time that outline its current thinking regarding the scope and application of the regulation. The current guidance document is Guidance for Industry: Part 11, Electronic Records; Electronic Signatures – Scope and Application,20 dated August 2003. In June 2017, FDA published draft guidance to clarify, update, and expand upon recommendations in the August 2003 guidance that pertain to clinical investigations conducted under 21 CFR 25 parts 312 and 812.21

Other documents that are useful for determining validation requirements of electronic systems are Guidance for Industry: Computerized Systems Used in Clinical Investigations,22 dated May 2007; General Principles of Software Validation; Final Guidance for Industry and FDA Staff,23 dated January 11, 2002; and Guidance for Industry: Electronic Source Data in Clinical Investigations, dated September 2013.24

4. Resource Considerations

Costs for registries can be highly variable, depending on the registry purpose and objectives. Each of the elements described in this chapter has an associated cost. Table 11-1 provides a list of some of the activities of the registry coordinating center as an example. Not all registries will require or can afford all of the functions, options, or quality assurance techniques described in this chapter. Registry planners must evaluate benefit versus available resources to determine the most appropriate approach to achieve their goals.

Table 11-1. Data activities performed during registry coordination.

Table 11-1

Data activities performed during registry coordination.

References for Chapter 11

Clinical Data Interchange Standards Consortium. http://www​.cdisc.org. Accessed June 10, 2019.
National Institutes of Health. National Institutes of Health Stroke Scale. https://www​.stroke.nih​.gov/documents/NIH_Stroke_Scale_508C​.pdf. Accessed June 10, 2019.
Luck J, Peabody JW, Dresselhaus TR, et al. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med. 2000;108(8):642–9. PMID: 10856412. [PubMed: 10856412]
Reisch LM, Fosse JS, Beverly K, et al. Training, quality assurance, and assessment of medical record abstraction in a multisite study. Am J Epidemiol. 2003;157(6):546–51. PMID: 12631545. DOI: 10.1093/aje/kwg016. [PubMed: 12631545] [CrossRef]
Neale R, Rokkas P, McClure RJ. Interrater reliability of injury coding in the Queensland Trauma Registry. Emerg Med (Fremantle). 2003;15(1):38–41. PMID: 12656785. [PubMed: 12656785]
eMERGE Network. Publications List. https://emerge​.mc.vanderbilt​.edu/publications/. Accessed June 10, 2019.
Phenotype KnowledgeBase. https://phekb​.org/. Accessed June 10, 2019.
Apache cTAKES. http://ctakes​.apache.org. Accessed June 10, 2019.
Clinical Language Annotation, Modeling, and Processing Toolkit. https://clamp​.uth.edu. Accessed June 10, 2019.
MetaMap - A Tool For Recognizing UMLS Concepts in Text. National Library of Medicine. https://metamap​.nlm.nih.gov. Accessed June 10, 2019.
Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008:128–44. PMID: 18660887. [PubMed: 18660887]
U.S. Food and Drug Administration. Guidance for Industry. E6 Good Clinical Practice: Consolidated Guidance. Apr, 1996. https:​//clinicalcenter​.nih.gov/ccc/clinicalresearch​/guidance.pdf. Accessed June 10, 2019.
HL7. http://www​.hl7.org/. Accessed June 10, 2019.
National Institute of Standards and Technology. http://www​.nist.gov. Accessed June 10, 2019.
Mangano DT, Tudor IC, Dietzel C, et al. The risk associated with aprotinin in cardiac surgery. N Engl J Med. 2006;354(4):353–65. PMID: 16436767. DOI: 10.1056/NEJMoa051379. [PubMed: 16436767] [CrossRef]
Gheorghiade M, Abraham WT, Albert NM, et al. Systolic blood pressure at admission, clinical characteristics, and outcomes in patients hospitalized with acute heart failure. JAMA. 2006;296(18):2217–26. PMID: 17090768. DOI: 10.1001/jama.296.18.2217. [PubMed: 17090768] [CrossRef]
National Institutes of Health. NIH Data Sharing Policy and Implementation Guidance. http://grants​.nih.gov​/grants/policy/data_sharing​/data_sharing_guidance​.htm#enclave. Accessed June 10, 2019.
Platt R, Lieu T. Data Enclaves for Sharing Information Derived From Clinical and Administrative Data. JAMA. 2018;320(8):753–4. PMID: 30083726. DOI: 10.1001/jama.2018.9342. [PubMed: 30083726] [CrossRef]
U.S. Food and Drug Administration. CFR - Code of Federal Regulations Title 21, Part 11: Electronic Records; Electronic Signatures. https://www​.ecfr.gov​/cgi-bin/text-idx?SID​=0e09bff792b1f80bc74436ed9d1eea4c&mc​=true&tpl=​/ecfrbrowse/Title21/21cfr11_main_02​.tpl. Accessed June 10, 2019.
U.S. Food and Drug Administration. Guidance for Industry Part 11, Electronic Records; Electronic Signatures — Scope and Application. https://www​.fda.gov/media/75414/download. Accessed June 10, 2019.
U.S. Food and Drug Administration. Use of Electronic Records and Electronic Signatures in Clinical Investigations Under 21 CFR Part 11 – Questions and Answers. DRAFT Guidance for Industry. https://www​.fda.gov/media​/105557/download. Accessed June 10, 2019.
U.S. Food and Drug Administration. Guidance for Industry Computerized Systems Used in Clinical Investigations. http://www​.fda.gov/downloads​/Drugs/GuidanceComplianceRegulatoryInformation​/Guidances/UCM070266.pdf. Accessed June 10, 2019.
U.S. Food and Drug Administration. General Principles of Software Validation; Final Guidance for Industry and FDA Staff. https://www​.fda.gov/media/73141/download. Accessed June 10, 2019.
U.S. Food and Drug Administration. Guidance for Industry Electronic Source Data in Clinical Investigations. https://www​.fda.gov/media/85183/download. Accessed June 10, 2019.
©2020 United States Government, as represented by the Secretary of the Department of Health and Human Services, by assignment.

All rights reserved. The Agency for Healthcare Research and Quality (AHRQ) permits members of the public to reproduce, redistribute, publicly display, and incorporate this work into other materials provided that it must be reproduced without any changes to the work or portions thereof, except as permitted as fair use under the U.S. Copyright Act. This work contains certain tables and figures noted herein that are subject to copyright by third parties. These tables and figures may not be reproduced, redistributed, or incorporated into other materials independent of this work without permission of the third-party copyright owner(s). This work may not be reproduced, reprinted, or redistributed for a fee, nor may the work be sold for profit or incorporated into a profit-making venture without the express written consent of AHRQ. This work is subject to the restrictions of Section 1140 of the Social Security Act, 42 U.S.C. § 1320b-10. When parts of this work are used or quoted, the following citation should be used:

Bookshelf ID: NBK562556


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.0M)

Other titles in this collection

Related information

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...