NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Gliklich RE, Dreyer NA, Leavy MB, editors. Registries for Evaluating Patient Outcomes: A User's Guide [Internet]. 3rd edition. Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Apr.

Cover of Registries for Evaluating Patient Outcomes

Registries for Evaluating Patient Outcomes: A User's Guide [Internet]. 3rd edition.

Show details

11Data Collection and Quality Assurance

1. Introduction

This chapter focuses on data collection procedures and quality assurance principles for patient registries. Data management—the integrated system for collecting, cleaning, storing, monitoring, reviewing, and reporting on registry data—determines the utility of the data for meeting the goals of the registry. Quality assurance, on the other hand, aims to assure that the data were, in fact, collected in accordance with these procedures and that the data stored in the registry database meet the requisite standards of quality, which are generally defined based on the intended purposes. In this chapter, the term registry coordinating activities refers to the centralized procedures performed for a registry, and the term registry coordinating center refers to the entity or entities performing these procedures and overseeing the registry activities at the site and patient levels.

Because the range of registry purposes can be broad, a similar range of data collection procedures may be acceptable, but only certain methodologies may be suitable for particular purposes. Furthermore, certain end users of the data may require that data collection or validation be performed in accordance with their own guidelines or standards. For example, a registry that collects data electronically and intends for those data to be used by the U.S. Food and Drug Administration (FDA) should meet the systems validation requirements of that end user of the data, such as Title 21 of the Code of Federal Regulations Part 11 (21 CFR Part 11). Such requirements may have a substantial effect on the registry procedures. Similarly, registries may be subject to specific processes depending on the type of data collected, the types of authorization obtained, and the applicable governmental regulations.

Requirements for data collection and quality assurance should be defined during the registry inception and creation phases. Certain requirements may have significant cost implications, and these should be assessed on a cost-to-benefit basis in the context of the intended purposes of the registry. This chapter describes a broad range of centralized and distributed data collection and quality assurance activities currently in use or expected to become more commonly used in patient registries.

2. Data Collection

2.1. Database Requirements and Case Report Forms

Chapter 1 defined key characteristics of patient registries for evaluating patient outcomes. They include specific and consistent data definitions for collecting data elements in a uniform manner for every patient. As in randomized controlled trials, the case report form (CRF) is the paradigm for the data structure of the registry. A CRF is a formatted listing of data elements that can be presented in paper or electronic formats. Those data elements and data entry options in a CRF are represented in the database schema of the registry by patient-level variables. Defining the registry CRFs and corresponding database schema are the first steps in data collection for a registry. Chapter 4 describes the selection of data elements for a registry.

Two related documents should also be considered part of the database specification: the data dictionary (including data definitions and parameters) and the data validation rules, also known as queries or edit checks. The data dictionary and definitions describe both the data elements and how those data elements are interpreted. The data dictionary contains a detailed description of each variable used by the registry, including the source of the variable, coding information if used, and normal ranges if relevant. For example, the term “current smoker” should be defined as to whether “smoker” refers to tobacco or other substances and whether “current” refers to active or within a recent time period. Several cardiovascular registries, such as the Get With The Guidelines® Coronary Artery Disease1 program define “current smoker” as someone who smoked tobacco within the last year.

Data validation rules refer to the logical checks on data entered into the database against predefined rules for either value ranges (e.g., systolic blood pressure less than 300 mmHg) or logical consistency with respect to other data fields for the same patient; these are described more fully in Section 2.5, “Cleaning Data,” below. While neither registry database structures nor database requirements are standardized, the Clinical Data Interchange Standards Consortium2 is actively working on representative models of data interchange and portability using standardized concepts and formats. Chapter 4 further discusses these models, which are applicable to registries as well as clinical trials.

2.2. Procedures, Personnel, and Data Sources

Data collection procedures need to be carefully considered in planning the operations of a registry. Successful registries depend on a sustainable workflow model that can be integrated into the day-to-day clinical practice of active physicians, nurses, pharmacists, and patients, with minimal disruption. (See Chapter 10.) Programs can benefit tremendously from preliminary input from the health care workers or study coordinators who are likely to be participants.

2.2.1. Pilot Testing

One method of gathering input from likely participants before the full launch of a registry is pilot testing. Whereas feasibility testing, which is discussed in Chapter 2, Section 2.4, focuses on whether a registry should be implemented, pilot testing focuses on how it should be implemented. Piloting can range from testing a subset of the procedures, CRFs, or data capture systems, to a full launch of the registry at a limited subset of sites with a limited number of patients.

The key to effective pilot testing is to conduct it at a point where the results of the pilot can still be used to modify the registry implementation. Through pilot testing, one can assess comprehension, acceptance, feasibility, and other factors that influence how readily the patient registry processes will fit into patient lifestyles and the normal practices of the health care provider.

For example, some data sources may or may not be available for all patients. Chapter 4, Section 5 discusses pilot testing in more detail.

2.2.2. Documentation of Procedures

The data collection procedures for each registry should be clearly defined and described in a detailed manual. The term manual here refers to the reference information in any appropriate form, including hard copy, electronic, or via interactive Web or software-based systems. Although the detail of this manual may vary from registry to registry depending on the intended purpose, the required information generally includes protocols, policies, and procedures; the data collection instrument; and a listing of all the data elements and their full definitions. If the registry has optional fields (i.e., fields that do not have to be completed on every patient), these should be clearly specified.

In addition to patient inclusion and exclusion criteria, the screening process should be specified, as should any documentation to be retained at the site level and any plans for monitoring or auditing of screening practices. If sampling is to be performed, the method or systems used should be explained, and tools should be provided to simplify this process for the sites. The manual should clearly explain how patient identification numbers are created or assigned and how duplicate records should be prevented. Any required training for data collectors should also be described.

If paper CRFs are used, the manual should describe specifically how they are used and which parts of the forms (e.g., two-part or three-part no-carbon-required forms) should be retained, copied, submitted, or archived. If electronic CRFs are used, clear user manuals and instructions should be available. These procedures are an important resource for all personnel involved in the registry (and for external auditors who might be asked to assure the quality of the registry).

The importance of standardizing procedures to ensure that the registry uses uniform and systematic methods for collecting data cannot be overstated. At the same time, some level of customization of data entry methods may be required or permitted to enable the participation of particular sites or subgroups of patients within some practices. As discussed in Chapter 10, if the registry provides payments to sites for participation, then the specific requirements for site payments should be clearly documented, and this information should be provided with the registry documents.

2.2.3. Personnel

All personnel involved in data collection should be identified, and their job descriptions and respective roles in data collection and processing should be described. Examples of such “roles” include patient, physician, data entry personnel, site coordinator, help desk, data manager, and monitor. The necessary documentation or qualification required for any role should be specified in the registry documentation. As an example, some registries require personnel documentation such as a curriculum vitae, protocol signoff, attestation of intent to follow registry procedures, or confirmation of completion of specified training.

2.2.4. Data Sources

The sources of data for a registry may include new information collected from the patient, new or existing information reported by or derived from the clinician and the medical record, and ancillary stores of patient information, such as laboratory results. Since registries for evaluating patient outcomes should employ uniform and systematic methods of data collection, all data-related procedures—including the permitted sources of data; the data elements and their definitions; and the validity, reliability, or other quality requirements for the data collected from each source—should be predetermined and defined for all collectors of data. As described in Section 3, “Quality Assurance,” below, data quality is dependent on the entire chain of data collection and processing. Therefore, the validity and quality of the registry data as a whole ultimately derive from the least, not the most, rigorous link.

In Chapter 6, data sources are classified as primary or secondary, based on the relationship of the data to the registry purpose and protocol. Primary data sources incorporate data collected for direct purposes of the registry (i.e., primarily for the registry). Secondary data sources consist of data originally collected for purposes other than the registry (e.g., standard medical care, insurance claims processing). The sections below incorporate and expand on these definitions.

2.2.5. Patient-Reported Data

Patient-reported data are data specifically collected from the patient for the purposes of the registry rather than interpreted through a clinician or an indirect data source (e.g., laboratory value, pharmacy records). Such data may range from basic demographic information to validated scales of patient-reported outcomes (PROs). From an operational perspective, a wide range of issues should be considered in obtaining data directly from patients. These range from presentation (e.g., font size, language, reading level) to technologies (e.g., paper-and-pencil questionnaires, computer inputs, telephone or voice inputs, or hand-held patient diaries). Mistakes at this level can inadvertently bias patient selection, invalidate certain outcomes, or significantly affect cost. Limiting the access for patient reporting to particular languages or technologies may limit participation. Patients with specific diagnoses may have difficulties with specific technologies (e.g., small font size for visually impaired, paper and pencil for those with rheumatoid arthritis). Other choices, such as providing a PRO instrument in a format or method of delivery that differs from how it was validated (e.g., questionnaire rather than interview), may invalidate the results. For more information on patient-reported outcome development and use, see Chapter 5.

2.2.6. Clinician-Reported Data

Clinician-reported or -derived data can also be divided into primary and secondary subcategories. As an example, specific clinician rating scales (e.g., the National Institutes of Health Stroke Scale)3 may be required for the registry but not routinely captured in clinical encounters. Some variables might be collected directly by the clinician for the registry or obtained from the medical record. Data elements that the clinician must collect directly (e.g., because of a particular definition or need to assess a specific comorbidity that may or may not be routinely present in the medical record) should be specified. These designations are important because they determine who can collect the data for a particular registry or what changes must be made in the procedures the clinician follows in recording a medical record for a patient in a registry. Furthermore, the types of error that arise in registries (discussed in Section 3, “Quality Assurance”) will differ by the degree of use of primary and secondary sources, as well as other factors. As an example, registries that use medical chart abstracters, as discussed in Section 2.2.7 below, may be subject to more interpretive errors.4

2.2.7. Data Abstraction

Data abstraction is the process by which a data collector other than the clinician interacting with the patient extracts clinician-reported data. While physical examination findings, such as height and weight, or laboratory findings, such as white blood cell counts, are straightforward, abstraction usually involves varying degrees of judgment and interpretation.

Clarity of description and standardization of definitions are essential to the assurance of data quality and to the prevention of interpretive errors when using data abstraction. Knowledgeable registry personnel should be designated as resources for the data collectors in the field, and processes should be put in place to allow the data collectors in the field continuous access to these designated registry personnel for questions on specific definitions and clinical situations. Registries that span long periods, such as those intended for surveillance, might be well served by a structure that permits the review of definitions on a periodic basis to ensure the timeliness and completeness of data elements and definitions, and to add new data elements and definitions. A new product or procedure introduced after the start of a registry is a common reason for such an update.

Abstracting data from unformatted hard copy (e.g., a hospital chart) is often an arduous and tedious process, especially if free text is involved, and it usually requires a human reader. The reader, whose qualifications may range from a trained “medical record analyst” or other health professional to an untrained research assistant, may need to decipher illegible handwriting, translate obscure abbreviations and acronyms, and understand the clinical content to sufficiently extract the desired information. Registry personnel should develop formal chart abstraction guidelines, documentation of processes and practical definitions of terms, and coding forms for the analysts and reviewers to use.

Generally, the guidelines include instructions to search for particular types of data that will go into the registry (e.g., specific diagnoses or laboratory results). Often the analyst will be asked to code the data, using either standardized codes from a codebook (e.g., the ICD-9 [International Classification of Diseases, 9th Revision] code) corresponding to a text diagnosis in a chart, or codes that may be unique to the registry (e.g., a severity scale of 1 to 5).

All abstraction and coding instructions must be carefully documented and incorporated into a data dictionary for the registry. Because of the “noise” in unstructured, hard-copy documents (e.g., spurious marks or illegible writing) and the lack of precision in natural language, the clinical data abstracted by different abstracters from the same documents may differ. This is a potential source of error in a registry.

To reduce the potential for this source of error, registries should ensure proper training on the registry protocol and procedures, condition(s), data sources, data collection systems, and most importantly, data definitions and their interpretation. While training should be provided for all registry personnel, it is particularly important for nonclinician data abstracters. Training time depends on the nature of the source (charts or CRFs), complexity of the data, and number of data items. A variety of training methods, from live meetings to online meetings to interactive multimedia recordings, have all been used with success.5 Training often includes test abstractions using sample charts. For some purposes, it is best practice to train abstracters using standardized test charts. Such standardized tests can be further used both to obtain data on the inter-rater reliability of the CRFs, definitions, and coding instructions and to determine whether individual abstracters can perform up to a defined minimum standard for the registry. Registries that rely on medical chart abstraction should consider reporting on the performance characteristics associated with abstraction, such as inter-rater reliability.6 Examining and reporting on intra-rater reliability may also be useful. Some key considerations in standardizing medical chart abstractions are—

  • Standardized materials (e.g., definitions, instructions)
  • Standardized training
  • Testing with standardized charts
  • Reporting of inter-rater reliability

2.2.8. Electronic Medical Record

An electronic medical record (EMR) is an electronic record of health-related information on an individual that can be created, gathered, managed, and consulted by authorized clinicians and staff within one health care organization. More complete than an EMR, an electronic health record (EHR) is an electronic record of health-related information on an individual that conforms to nationally recognized interoperability standards and that can be created, managed, and consulted by authorized clinicians and staff across more than one health care organization.7 For the purposes of this discussion, we will refer to the more limited capabilities of the EMR.

The EMR (and EHR) will play an increasingly important role as a source of clinical data for registries. The medical community is currently in a transition period in which the primary repository of a patient's medical record is changing from the traditional hard-copy chart to the EMR. The main function of the EMR is to aggregate all clinical electronic data about a patient into one database, in the same way that a hard-copy medical chart aggregates paper records from various personnel and departments responsible for the care of the patient. Depending on the extent of implementation, the EMR may include patient demographics, diagnoses, procedures, progress notes, orders, flow sheets, medications, and allergies. The primary sources of data for the EMR are the health care providers. Data may be entered into the EMR through keyboards or touch screens in medical offices or at the bedside. In addition, the EMR system is usually interfaced with ancillary systems (discussed below), such as laboratory, pharmacy, radiology, and pathology systems. Ancillary systems, which usually have their own databases, export relevant patient data to the EMR system, which imports the data into its database.

Since EMRs include the majority of clinical data available about a patient, they can be a major source of patient information for a registry. What an EMR usually does not include is registry-specific (primary source) data that are collected separately from hard-copy or electronic forms. In the next several years, suitable EMR system interfaces may be able to present data needed by registries in accordance with registry-specified requirements, either within the EMR (which then populates the registry) or in an electronic data capture system (which then populates the EMR). EMRs already serve as secondary data sources in some registries, and this practice will continue to grow as EMRs become more widely used. In these situations, data may be extracted from the EMR, transformed into registry format, and loaded into the registry, where they will reside in the registry database together with registry-specific data imported from other sources. In a sense, this is similar to medical chart abstraction except that it is performed electronically.

Electronic capture differs from manual medical chart abstraction in two key respects. First, the data are “abstracted” once for all records. In this context, abstraction refers to the mapping and other decisionmaking needed to bring the EMR data into the registry database. It does not eliminate the potential for interpretive errors, as described later in this chapter, but it centralizes that process, making the rules clear and easily reviewed. Second, the data are uploaded electronically, eliminating duplicative data entry, potential errors associated with data reentry, and the related cost of this redundant effort.

When the EMR is used as a data source for a registry, a significant problem occurs when the information needed by the registry is stored in the EMR as free text, rather than codified or structured data. Examples of structured data include ICD-9 diagnoses and laboratory results. In contrast, physician progress notes, consultations, radiology reports, et cetera, are usually dictated and transcribed as narrative free text. While data abstraction of free text derived from an EMR can be done by a medical record analyst, with the increasing use of EMRs, automated methods of data abstraction from free text have been developed. Natural language processing (NLP) is the term for this technology. It allows computers to process and extract information from human language. The goal of NLP is to parse free text into meaningful components based on a set of rules and a vocabulary that enable the software to recognize key words, understand grammatical constructions, and resolve word ambiguities. Those components can be extracted and delivered to the registry along with structured data extracted from the EMR, and both can be stored as structured data in the registry database.

An increasing number of NLP software packages are available (e.g., caTIES from the National Cancer Institute,8 i2b2 (Informatics for Integrating Biology and the Bedside),9 and a number of commercial products). However, NLP is still in an early phase of development and cannot yet be used for all-purpose chart abstraction. In general, NLP software operates in specific clinical domains (e.g., radiology, pathology), whose vocabularies have been included in the NLP software's database. Nevertheless, NLP has been used successfully to extract diagnoses and drug names from free text in various clinical settings.

It is anticipated that EMR/EHR use will grow significantly with the incentives provided under the American Recovery and Reinvestment Act of 2009 health information technology provisions. Currently, only a minority of U.S. patients have their data stored in systems that are capable of retrieval at the level of a data element. Furthermore, only a small number of these systems currently store data in structured formats with standardized data definitions for those data elements that are common across different vendors. A significant amount of attention is currently focused on interchange formats between clinical and research systems (e.g., from Health Level Seven [HL-7]10 to Clinical Data Interchange Standards Consortium2 models). Attention is also focused on problems of data syntax and semantics. The adoption of common database structures and open interoperability standards will be critical for future interchange between EHRs and registries. This topic is discussed in depth in Chapter 15.

2.2.9. Other Data Sources

Some of the clinical data used to populate registries may be derived from repositories other than EMRs. Examples of other data sources include billing systems, laboratory databases, and other registries. Chapter 6 discusses the potential uses of other data sources in more detail.

2.3. Data Entry Systems

Once the primary and any secondary data sources for a registry have been identified, the registry team can determine how data will be entered into the registry database. Many techniques and technologies exist for entering or moving data into the registry database, including paper CRFs, direct data entry, facsimile or scanning systems, interactive voice response systems, and electronic CRFs. There are also different models for how quickly those data reach a central repository for cleaning, reviewing, monitoring, or reporting. Each approach has advantages and limitations, and each registry must balance flexibility (the number of options available) with data availability (when the central repository is populated), data validity (whether all methods are equally able to produce clean data), and cost. Appropriate decisions depend on many factors, including the number of data elements, number of sites, location (local preferences that vary by country, language differences, and availability of different technologies), registry duration, followup frequency, and available resources.

2.3.1. Paper CRFs

With paper CRFs, the clinician enters clinical data on the paper form at the time of the clinical encounter, or other data collectors abstract the data from medical records after the clinical encounter. CRFs may include a wide variety of clinical data on each patient gathered from different sources (e.g., medical chart, laboratory, pharmacy) and from multiple patient encounters. Before the data on formatted paper forms are entered into a computer, the forms should be reviewed for completeness, accuracy, and validity. Paper CRFs can be entered into the database by either direct data entry or computerized data entry via scanning systems.

With direct data entry, a computer keyboard is used to enter data into a database. Key entry has a variable error rate depending on personnel, so an assessment of error rate is usually desirable, particularly when a high volume of data entry is performed. Double data entry is a method of increasing the accuracy of manually entered data by quantifying error rates as discrepancies between two different data entry personnel; data accuracy is improved by having up to two individuals enter the data and a third person review and manage discrepancies. With upfront data validation checks on direct data entry, the likelihood of data entry errors significantly decreases. Therefore, the choice of single versus double data entry should be driven by the requirements of the registry for a particular maximal error rate and the ability of each method to achieve that rate in key measures in the particular circumstance. Double data entry, while a standard of practice for registrational trials, may add significant cost. Its use should be guided by the need to reduce an error rate in key measures and the likelihood of accomplishing that by double data entry as opposed to other approaches. In some situations, assessing the data entry error rates by re-entering a sample of the data is sufficient for reporting purposes.

With hard-copy structured forms, entering data using a scanner and special software to extract the data from the scanned image is possible. If data are recorded on a form as marks in checkboxes, the scanning software enables the user to map the location of each checkbox to the value of a variable represented by the text item associated with the checkbox, and to determine whether the box is marked. The presence of a mark in a box is converted by the software to its corresponding value, which can then be transmitted to a database for storage. If the form contains hand-printed or typed text or numbers, optical character recognition software is often effective in extracting the printed data from the scanned image. However, the print font must be of high quality to avoid translation errors, and spurious marks on the page can cause errors. Error checking is based on automated parameters specified by the operator of the system for exception handling. The comments on assessing error rates in the section above are applicable for scanning systems as well.

2.3.2. Electronic CRFs

An electronic CRF (eCRF) is defined as an auditable electronic form designed to record information required by the clinical trial protocol to be reported to the sponsor on each trial subject.11 An eCRF allows clinician-reported data to be entered directly into the electronic system by the data collector (the clinician or other data collector). Site personnel in many registries still commonly complete an intermediate hard-copy worksheet representing the CRF and subsequently enter the data into the eCRF. While this approach increases work effort and error rates, it is still in use because it is not yet practical for all electronic data entry to be performed at the bedside, during the clinical encounter, or in the midst of a busy clinical day.

An eCRF may originate on local systems (including those on an individual computer, a local area network server, or a hand-held device) or directly from a central database server via an Internet-based connection or a private network. For registries that exist beyond a single site, the data from the local system must subsequently communicate with a central data system. An eCRF may be presented visually (e.g., computer screen) or aurally (e.g., telephonic data entry, such as interactive voice response systems). Specific circumstances will favor different presentations. For example, in one clozapine patient registry that is otherwise similar to Case Example 24, both pharmacists and physicians can obtain and enter data via a telephone-based interactive voice response system as well as a Web-based system. The option is successful in this scenario because telephone access is ubiquitous in pharmacies and the eCRF is very brief.

A common method of electronic data entry is to use Web-based data entry forms. Such forms may be used by patients, providers, and interviewers to enter data into a local repository. The forms reside on servers, which may be located at the site of the registry or co-located anywhere on the Internet. To access a data entry form, a user on a remote computer with an Internet connection opens a browser window and enters the address of the Web server. Typically, a login screen is displayed and the user enters a user identification and password, provided by personnel responsible for the Web site or repository. Once the server authenticates the user, the data entry form is displayed, and the user can begin entering data. As described in “Cleaning Data” (Section 2.5), many electronic systems can perform data validation checks or edits at the time of data entry. When data entry is complete, the user submits the form, which is sent over the Internet to the Web server.

Smart phones or other mobile devices may also be used to submit data to a server to the extent such transmissions can be done with appropriate information security controls. Mobility has recently become an important attribute for clinical data collection. Software has been developed that enables wireless devices to collect data and transmit them over the Internet to database servers in fixed locations. As wireless technology continues to evolve and data transmission rates increase, these will become more essential data entry devices for patients and clinicians.

2.4. Advantages and Disadvantages of Data Collection Technologies

When the medical record or ancillary data are in electronic format, they may be abstracted to the CRF by a data collector or, in some cases, uploaded electronically to the registry database. The ease of extracting data from electronic systems for use in a registry depends on the design of the interfaces of ancillary and registry systems, and the ability of the EMR or ancillary system software to make the requested data accessible. However, as system vendors increasingly adopt open standards for interoperability, transferring data from one system to another will likely become easier. Many organizations are actively working toward improved standards, including HL7,10 the National eHealth Collaborative,12 the National Institute of Standards and Technology,13 and others. Chapter 15 describes standards and certifications specific to EHR systems.

Electronic interfaces are necessary to move data from one computer to another. If clinical data are entered into a local repository from an eCRF form or entered into an EMR, the data must be extracted from the source dataset in the local repository, transformed into the format required by the registry, and loaded into the registry database for permanent storage. This is called an “extract, transform, and load” process. Unless the local repository is designed to be consistent with the registry database in terms of the names of variables and their values, data mapping and transformation can be a complex task. In some cases, manual transfer of the data may be more efficient and less time-consuming than the effort to develop an electronic interface. Emerging open standards can enable data to be transferred from an EHR directly into the registry. This topic is discussed in more detail in Chapter 15.

If an interface between a local electronic system and registry system is developed, it is still necessary to communicate to the ancillary system the criteria for retrieval and transmission of a patient record. Typically, the ancillary data are maintained in a relational database, and the system needs to run an SQL (Structured Query Language) query against the database to retrieve the specified information. An SQL query may specify individual patients by an identifier (e.g., a medical record number) or by values or ranges of specific variables (e.g., all patients with hemoglobin A1c over 8 mg/dl). The results of the query are usually stored as a file (e.g., XML, CSV, CDISC ODM) that can be transformed and transferred to the registry system across the interface. A variety of interface protocols may be used to transfer the data.

Because data definitions and formats are not yet nationally standardized, transfer of data from an EMR or ancillary system to a registry database is prone to error. Careful evaluation of the transfer specifications for interpretive or mapping errors is a critical step that the registry coordinating center should verify. Furthermore, a series of test transfers and validation procedures should be performed and documented. Finally, error checking must be part of the transfer process because new formats or other errors not in the test databases may be introduced during actual practice, and these need to be identified and isolated from the registry itself. Even though each piece of data may be accurately transferred, the data may have different representations on the different systems (e.g., value discrepancies such as the meaning of “0” vs. “1,” fixed vs. floating point numbers, date format, integer length, and missing values). In summary, any system used to extract EMR records into registry databases should be validated and should include an interval sampling of transfers to ensure that uploading of this information is consistent over time.

The ancillary system must also notify the registry when an error correction occurs in a record already transferred to the registry. Registry software must be able to receive that notification, flag the erroneous value as invalid, and insert the new, corrected value into its database. Finally, it is important to recognize that the use of an electronic-to-electronic interchange requires not only testing but also validation of the integrity and quality of the data transferred. Few ancillary systems or EMR systems are currently validated to a defined standard. For registries that intend to report data to FDA or to other sponsors or data recipients with similar requirements, including electronic signatures, audit trails, and rigorous system validation, the ways in which the registry interacts with these other systems must be carefully considered.

2.5. Cleaning Data

Data cleaning refers to the correction or amelioration of data problems, including missing values, incorrect or out-of-range values, responses that are logically inconsistent with other responses in the database, and duplicate patient records. While all registries strive for “clean data,” in reality, this is a relative term. How and to what level the data will be cleaned should be addressed upfront in a data management manual that identifies the data elements that are intended to be cleaned, describes the data validation rules or logical checks for out-of-range values, explains how missing values and values that are logically inconsistent will be handled, and discusses how duplicate patient records will be identified and managed.

2.5.1. Data Management Manual

Data managers should develop formal data review guidelines for the reviewers and data entry personnel to use. The guidelines should include information on how to handle missing data; invalid entries (e.g., multiple selections in a single-choice field, alphabetic data in a numeric field); erroneous entries (e.g., patients of the wrong gender answering gender-based questions); and inconsistent data (e.g., an answer to one question contradicting the answer to another one). The guidelines should also include procedures to attempt to remediate these data problems. For example, with a data error on an interview form, it may be necessary to query the interviewer or the patient, or to refer to other data sources that may be able to resolve the problem. Documentation of any data review activity and remediation efforts, including dates, times, and results of the query, should be maintained.

2.5.2. Automated Data Cleaning

Ideally, automated data checks are preprogrammed into the database for presentation at the time of data entry. These data checks are particularly useful for cleaning data at the site level while the patient or medical record is readily accessible. Even relatively simple edit checks, such as range values for laboratories, can have a significant effect on improving the quality of data. Many systems allow for the implementation of more complex data edit checks, and these checks can substantially reduce the amount of subsequent manual data cleaning. A variation of this method is to use data cleaning rules to deactivate certain data fields so that erroneous entries cannot even be made. A combination of these approaches can also be used. For paper-based entry methods, automated data checks are not available at the time the paper CRF is being completed but can be incorporated when the data are later entered into the database.

2.5.3. Manual Data Cleaning

Data managers perform manual data checks or queries to review data for unexpected discrepancies. This is the standard approach to cleaning data that are not entered into the database at the site (e.g., for paper CRFs entered via data entry or scanning). By carefully reviewing the data using both data extracts analyzed by algorithms and hand review, data managers identify discrepancies and generate “queries” to send to the sites to resolve. Even eCRF-based data entry with data validation rules may not be fully adequate to ensure data cleaning for certain purposes. Anticipating all potential data discrepancies at the time that the data management manual and edit checks are developed is very difficult. Therefore, even with the use of automated data validation parameters, some manual cleaning is often still performed.

2.5.4. Query Reports

The registry coordinating center should generate, on a periodic basis, query reports that relate to the quality of the data received, based on the data management manual and, for some purposes, additional concurrent review by a data manager. The content of these reports will differ depending on what type of data cleaning is required for the registry purpose and how much automated data cleaning has already been performed. Query reports may include missing data, “out-of-range” data, or data that appear to be inconsistent (e.g., positive pregnancy test for a male patient). They may also identify abnormal trends in data, such as sudden increases or decreases in laboratory tests compared with patient historical averages or clinically established normal ranges. Qualified registry personnel should be responsible for reviewing the abnormal trends with designated site personnel. The most effective approach is for sites to provide one contact representative for purposes of queries or concerns by registry personnel. Depending on the availability of the records and resources at the site to review and respond to queries, resolving all queries can sometimes be a challenge. Creating systematic approaches to maximizing site responsiveness is recommended.

2.5.5. Data Tracking

For most registry purposes, tracking of data received (paper CRFs), data entered, data cleaned, and other parameters is an important component of active registry management. By comparing indicators, such as expected to observed rates of patient enrollment, CRF completion, and query rates, the registry coordinating center can identify problems and potentially take corrective action— either at individual sites or across the registry as a whole.

2.5.6. Coding Data

As further described in Chapter 4, the use of standardized coding dictionaries is an increasingly important tool in the ability to aggregate registry data with other databases. As the health information community adopts standards, registries should routinely apply them unless there are specific reasons not to use such standard codes. While such codes should be implemented in the data dictionaries during registry planning, including all codes in the interface is not always possible. Some free text may be entered as a result. When free text data are entered into a registry, recoding these data using standardized dictionaries (e.g., MedDRA, WHODRUG, SNOMED®) may be worthwhile. There is cost associated with recoding, and in general, it should be limited to data elements that will be used in analysis or that need to be combined or reconciled with other datasets, such as when a common safety database is maintained across multiple registries and studies.

2.5.7. Storing and Securing Data

When data on a form are entered into a computer for inclusion in a registry, the form itself, as well as a log of the data entered, should be maintained for the regulatory archival period. Data errors may be discovered long after the data have been stored in the registry. The error may have been made by the patient or interviewer on the original form or during the data entry process. Examination of the original form and the data entry log should reveal the source of the error. If the error is on the form, correcting it may require reinterviewing the patient. If the error occurred during data entry, the corrected data should be entered and the registry updated. By then, the erroneous registry data may have been used to generate reports or create cohorts for population studies. Therefore, instead of simply replacing erroneous data with corrected data, the registry system should have the ability to flag data as erroneous without deleting them and to insert the corrected data for subsequent use.

Once data are entered into the registry, the registry must be backed up on a regular basis. There are two basic types of backup, and both types should be considered for use as best practice by the registry coordinating center. The first type is real-time disk backup, which is done by the disk storage hardware used by the registry server. The second is a regular (e.g., daily) backup of the registry to removable media (e.g., tape, CD-ROM, DVD). In the first case, as data are stored on disk in the registry server, they are automatically replicated to two or more physical hard drives. In the simplest example, called “mirroring,” registry data are stored on a primary disk and an exact replica is stored on the mirrored disk. If either disk fails, data continue to be stored on the mirrored disk until the failed disk is replaced. This failure can be completely transparent to the user, who may continue entering and retrieving data from the registry database during the failure. More complex disk backup configurations exist, in which arrays of disks are used to provide protection from single disk failures.

The second type of periodic backup is needed for disaster recovery. Ideally, a daily backup copy of the registry database stored on removable media should be maintained off site. In case of failure of the registry server or disaster that closes the data center, the backup copy can be brought to a functioning server and the registry database restored, with the only potential loss of data being for the interval between the regularly scheduled backups. The lost data can usually be reloaded from local data repositories or re-entered from hard copy. Other advanced and widely available database solutions and disaster recovery techniques may support a “standby” database that can be located at a remote data center. In case of a failure at the primary data center, the standby database can be used, minimizing downtime and preventing data loss.

2.6. Managing Change

As with all other registry processes, the extent of change management will depend on the types of data being collected, the source(s) of the data, and the overall timeframe of the registry. There are two major drivers behind the need for change during the conduct of a registry: internally driven change to refine or improve the registry or the quality of data collected, and externally driven change that comes as a result of changes in the environment in which the registry is being conducted.

Internally driven change is generally focused on changes to data elements or data validation parameters that arise from site feedback, queries, and query trends that may point to a question, definition, or CRF field that was poorly designed or missing. If this is the case, the registry can use the information coming back from sites or data managers to add, delete, or modify the database requirements, CRFs, definitions, or data management manual as required. At times, more substantive changes, such as the addition of new forms or changes to the registry workflow, may be desirable to examine new conditions or outcomes. Externally driven change generally arises in multiyear registries as new information about the disease and/or product under study becomes available, or as new therapies or products are introduced into clinical practice. Change and turnover in registry personnel is another type of change, and one that can be highly disruptive if procedures are not standardized and documented.

A more extensive form of change may occur when a registry either significantly changes its CRFs or changes the underlying database. Longstanding registries address this issue from time to time as information regarding the condition or procedure evolves and data collection forms and definitions require updating. Chapter 14 discusses in more detail the process for making significant modifications to a registry.

Proper management of change is crucial to the maintenance of the registry. A consistent approach to change management, including decisionmaking, documentation, data mapping, and validation, is an important aspect of maintaining the quality of the registry and the validity of the data. While the specific change management processes might depend on the type and nature of the registry, change management in registries that are designed to evaluate patient outcomes requires, at the very least, the following structures and processes:

  • Detailed manual of procedures: As described earlier, a detailed manual that is updated on a regular basis—containing all the registry policies, procedures, and protocols, as well as a complete data dictionary listing all the data elements and their definitions—is vital for the functioning of a registry. The manual is also a crucial component for managing and documenting change management in a registry.
  • Governing body: As described in Chapter 2, Section 6, registries require oversight and advisory bodies for a number of purposes. One of the most important is to manage change on a regular basis. Keeping the registry manual and data definitions up to date is one of the primary responsibilities of this governing body. Large prospective registries, such as the National Surgical Quality Improvement Program, have found it necessary to delegate the updating of data elements and definitions to a special definitions committee.
  • Infrastructure for ongoing training: As mentioned above, change in personnel is a common issue for registries. Specific processes and an infrastructure for training should be available at all times to account for any unanticipated changes and turnover of registry personnel or providers who regularly enter data into the registry.
  • Method to communicate change: Since registries frequently undergo change, there should be a standard approach and timeline for communicating to sites when changes will take place.

In addition to instituting these structures, registries should also plan for change from a budget perspective (Chapter 2) and from an analysis perspective (Chapter 13).

2.7. Using Data for Care Delivery, Coordination, and Quality Improvement

2.7.1. Improving Care

As registries increasingly collect data in electronic format, the time between care delivery and data collection is reduced. This shorter timeframe offers significant opportunities to use registry functionalities to improve care delivery at the patient and population levels. These functionalities (Table 11–1) include the generation of outputs that promote care delivery and coordination at the individual patient level (e.g., decision support, patient reports, reminders, notifications, lists for proactive care, educational content) and the provision of tools that assist with population management, quality improvement, and quality reporting (e.g., risk adjustment, population views, benchmarks, quality report transmissions). A number of registries are designed primarily for these purposes. Several large national registries1, 14-16 have shown large changes in performance during the course of hospital or practice participation in the registry. For example, in one head-to-head study that used hospital data from Hospital Compare, an online database created by the Centers for Medicare & Medicaid Services, patients in hospitals enrolled in the American Heart Association's Get With The Guidelines® Coronary Artery Disease registry, which includes evidence-based reminders and real-time performance measurement reports, fared significantly better in measures of guidelines compliance than those in hospitals not enrolled in the registry.17

Table 11–1. Registry functionalities.

Table 11–1

Registry functionalities.

2.7.2. Special Case: Performance-Linked Access System

A performance-linked access system (PLAS), also known as a restricted access or limited distribution system, is another application of a registry to serve more than an observational goal. Unlike a disease and exposure registry, a PLAS is part of a detailed risk-minimization action plan that sponsors develop as a commitment to enhance the risk-benefit balance of a product when approved for the market. The purpose of a PLAS is to mitigate a certain known drug-associated risk by ensuring that product access is linked to a specific performance measure. Examples include systems that monitor laboratory values, such as white blood cell counts during clozapine administration to prevent severe leukopenia, or routine pregnancy testing during thalidomide administration to prevent in utero exposure to this known teratogenic compound. Additional information on PLAS can be found in FDA's Guidance for Industry: Development and Use of Risk Minimization Action Plans.18

3. Quality Assurance

In determining the utility of a registry for decisionmaking, it is critical to understand the quality of the procedures used to obtain the data and the quality of the data stored in the database. As patient registries that meet sufficient quality criteria (discussed in Chapters 1 and 25) are increasingly being seen as important means to generate evidence regarding effectiveness, safety, and quality of care, the quality of data within the registry must be understood in order to evaluate its suitability for use in decisionmaking. Registry planners should consider how to ensure quality to a level sufficient for the intended purposes (as described below) and should also consider how to develop appropriate quality assurance plans for their registries. Those conducting the registry should assess and report on those quality assurance activities.

Methods of quality assurance will vary depending on the intended purpose of the registry. A registry intended to serve as key evidence for decisionmaking19 (e.g., coverage determinations, product safety evaluations, or performance-based payment) will require higher levels of quality assurance than a registry describing the natural history of a disease. Quality assurance activities generally fall under three main categories: (1) quality assurance of data, (2) quality assurance of registry procedures, and (3) quality assurance of computerized systems. Since many registries are large, the level of quality assurance that can be obtained may be limited by budgetary constraints.

To balance the need for sufficient quality assurance with reasonable resource expenditure for a particular purpose, a risk-based approach to quality assurance is highly recommended. A risk-based approach focuses on the most important sources of error or procedural lapses from the perspective of the registry's purpose. Such sources of error should be defined during inception and design phases. As described below, registries with different purposes may be at risk for different sources of error and focus on different practices and levels of assessment. Standardization of methods for particular purposes (e.g., national performance measurement) will likely become more common in the future if results are to be combined or compared between registries.

3.1. Assurance of Data Quality

Structures, processes, policies, and procedures need to be put in place to ascertain the quality of the data in the registry and to insure against several types of errors, including:

  • Errors in interpretation or coding: An example of this type of error would be two abstracters looking for the same data element in a patient's medical record but extracting different data from the same chart. Variations in coding of specific conditions or procedures also fall under the category of interpretive errors. Avoidance or detection of interpretive error includes adequate training on definitions, testing against standard charts, testing and reporting on inter-rater reliability, and re-abstraction.
  • Errors in data entry, transfer, or transformation accuracy: These occur when data are entered into the registry inaccurately—for example, a laboratory value of 2.0 is entered as 20. Avoidance or detection of accuracy errors can be achieved through upfront data quality checks (such as ranges and data validation checks), reentering samples of data to assess for accuracy (with the percent of data to be sampled depending on the study purpose), and rigorous attention to data cleaning.
  • Errors of intention: Examples of intentional distortion of data (often referred to as “gaming”) are inflated reporting of preoperative patient risk in registries that compare risk-adjusted outcomes of surgery, or selecting only cases with good outcomes to report (“cherry-picking”). Avoidance or detection of intentional error can be challenging. Some approaches include checking for consistency of data between sites, assessing screening log information against other sources (e.g., billing data), and performing onsite audits (including monitoring of source records) either at random or “for cause.”

Steps for assuring data quality include:

  • Training: Educate data collectors/abstracters in a structured manner.
  • Data completeness: When possible, provide sites with immediate feedback on issues such as missing or out-of-range values and logical inconsistencies.
  • Data consistency: Compare across sites and over time.
  • Onsite audits for a sample of sites: Review screening logs and procedures and/or samples of data.
  • For-cause audits: Use both predetermined and data-informed methods to identify potential sites at higher suspicion for inaccuracy or intentional errors, such as discrepancies between enrollment and screening logs, narrow data ranges, and overly high or low enrollment.

To further minimize or identify these errors and to ensure the overall quality of the data, the following should be considered.

3.1.1. A Designated Individual Accountable for Data Quality at Each Site

Sites submitting data to a registry should have at least one person who is accountable for the quality of these data, irrespective of whether the person is collecting the data as well. The site coordinator should be fully knowledgeable of all protocols, policies, procedures, and definitions in a registry. The site coordinator should ensure that all site personnel involved in the registry are knowledgeable and that all data transmitted to registry coordinating centers are valid and accurate.

3.1.2. Assessment of Training and Maintenance of Competency of Personnel

Thorough training and documentation of maintenance of competency, for both site and registry personnel, are imperative to the quality of the registry. A detailed and comprehensive operations manual, as described earlier, is crucial for the proper training of all personnel involved in the registry. Routine cognitive testing (surveys) of health care provider knowledge of patient registry requirements and appropriate product use should be performed to monitor maintenance of the knowledge base and compliance with patient registry requirements. Retraining programs should be initiated when survey results provide evidence of lack of knowledge maintenance. All registry training programs should provide means by which the knowledge of the data collectors about their registries and their competence in data collection can be assessed on a regular basis, particularly when changes in procedures or definitions are implemented.

3.1.3. Data Quality Audits

As described above, the level to which registry data will be cleaned is influenced by the objectives of the registry, the type of data being collected (e.g., clinical data vs. economic data), the sources of the data (e.g., primary vs. secondary), and the timeframe of the registry (e.g., 3-month followup vs. 10-year followup). These registry characteristics often affect the types and number of data queries that are generated, both electronically and manually. In addition to identifying missing values, incorrect or out-of-range values, or responses that are logically inconsistent with other responses in the database, specifically trained registry personnel can review the data queries to identify possible error trends and to determine whether additional site training is required. For example, such personnel may identify a specific patient outcome question or eCRF field that is generating a larger than average proportion of queries, either from one site or across all registry sites. Using this information, the registry personnel can conduct targeted followup with the sites to retrain them on the correct interpretation of the outcome question or eCRF field, with the goal of reducing the future query rate on that particular question or field. These types of “training tips” can also be addressed in a registry newsletter as a way to maintain frequent but unobtrusive communication with the registry sites.

If the registry purpose requires more stringent verification of the data being entered into the database by registry participants, registry planners may decide to conduct audits of the registry sites. Like queries discussed above, the audit plan for a specific registry will be influenced by the purpose of the registry, the type of data being collected, the source of the data, and the overall timeframe of the registry. In addition, registry developers must find the appropriate balance between the extensiveness of an audit and the impact on overall registry costs. Based on the objectives of the registry, a registry developer can define specific data fields (e.g., key effectiveness variables or adverse event data) on which the audit can be focused.

The term audit may describe examination or verification, may take place onsite (sometimes called monitoring) or offsite, and may be extensive or very limited. The audit can be conducted on a random sample of participating sites (e.g., 5 to 20 percent of registry sites); “for cause” (meaning only when there is an indication of a problem, such as one site being an outlier compared with most others); on a random sample of patients; or using sampling techniques based on geography, practice setting (academic center vs. community hospital), patient enrollment rate, or query rate (“risk-based” audit strategy).

The approach to auditing the quality of the data should reflect the most significant sources of error with respect to the purpose of the registry. For example, registries used for performance measurement may have a higher risk of exclusion of higher risk patients (“cherry-picking”), and the focus of an audit might be on external sources of data to verify screening log information (e.g., billing data) in addition to data accuracy. (See Case Example 25.) Finally, the timeframe of the registry may help determine the audit plan. A registry with a short followup period (e.g., 3 months) may require only one round of audits at the end of the study, prior to database lock and data analysis. For example, in the OPTIMIZE-HF registry, a data quality audit was performed, based on predetermined criteria, on a 5-percent random sample of the first 10,000 patient records verified against source documents.20 For registries with multiyear followup, registry personnel may conduct site audits every 1 or 2 years for the duration of the registry.

In addition to the site characteristics mentioned above, sites that have undergone significant staffing changes during a multiyear registry should be considered prime audit targets to help confirm adequate training of new personnel and to quickly address possible inter-rater variability. To minimize any impact on the observational nature of the registry, the audit plan should be documented in the registry manual.

Registries that are designed for the evaluation of patient outcomes and the generation of scientific information, and that use medical chart abstracters, should assess inter-rater reliability in data collection with sufficient scientific rigor for their intended purpose(s). For example, in one registry that uses abstractions extensively, a detailed system of assessing inter-rater reliability has been devised and published; in addition to requiring that abstracters achieve a certain level of proficiency, a proportion of charts are scheduled for re-abstraction on the basis of predefined criteria. Statistical measures of reliability from such re-abstractions are maintained and reported (e.g., kappa statistic).21

Subsequent to audits (onsite or remote), communication of findings with site personnel should be conducted face to face, along with followup written communication of findings and opportunities for improvement. As appropriate to meet registry objectives, the sponsor may request corrective actions from the site. Site compliance may also be enhanced with routine communication of data generated from the patient registry system to the site for reconciliation.

3.2. Registry Procedures and Systems

3.2.1. External Audits of Registry Procedures

If registry developers determine that external audits are necessary to ensure the level of quality for the specific purpose(s) of the registry, these audits should be conducted in accordance with pre-established criteria. Pre-established criteria could include monitoring of sites with high patient enrollment or with prior audit history of findings that require attention, or monitoring could be based on level of site experience, rate of serious adverse event reporting, or identified problems. The registry coordinating center may perform monitoring of a sample of sites, which could be focused on one or several areas. This approach could range from reviewing procedures and interviewing site personnel, to checking screening logs, to monitoring individual case records.

The importance of having a complete and detailed registry manual that describes policies, structures, and procedures cannot be overemphasized in the context of quality assurance of registry procedures. Such a manual serves both as a basis for conducting the audits and as a means of documenting changes emanating from these audits. As with data quality audits, feedback of the findings of registry procedure audits should be communicated to all stakeholders and documented in the registry manual.

3.2.2. Assurance of System Integrity and Security

All aspects of data management processes should fall under a rigorous life-cycle approach to system development and quality management. Each process is clearly defined and documented. The concepts described below are consistent across many software industry standards and health care industry standards (e.g., 21 CFR Part 11, legal security standards), although some specifics may vary. An internal quality assurance function at the registry coordinating center should regularly audit the processes and procedures described. When third parties other than the registry coordinating center perform activities that interact with the registry systems and data, they are typically assessed for risk and are subject to regular audits by the registry coordinating center.

3.2.3. System Development and Validation

All software systems used for patient registries should follow the standard principles of software development, including following one of the standard software development life-cycle (SDLC) models that are well described in the software industry.

In parallel, quality assurance of system development uses approved specifications to create a validation plan for each project. Test cases are created by trained personnel and systematically executed, with results recorded and reviewed. Depending on regulatory requirements, a final validation report is often written and approved. Unresolved product and process issues are maintained and tracked in an issue tracking or CAPA (Corrective Action/Preventive Action) system.

Processes for development and validation should be similarly documented and periodically audited. The information from these audits is captured, summarized, and reviewed with the applicable group, with the aim of ongoing process improvement and quality improvement.

3.3. Security

All registries maintain health information, and therefore security is an important issue. The HIPAA (Health Insurance Portability and Accountability Act of 1996) Security Rule establishes the standards for security for electronic protected health information that must be implemented by health plans, health care clearinghouses, and most health care providers (collectively, “covered entities”), as well as their business associates.22 Therefore, covered entities and business associates that maintain registries with individually identifiable health information in electronic form must implement the technical, administrative, and physical safeguards specified and required by the HIPAA Security Rule with respect to the registry data. In addition, other Federal and State security laws may apply to registry data, depending on who maintains the registry, the type of data maintained, and other circumstances.

Aside from what may be required by applicable laws, this section generally discusses some of the components of a security program. Security is achieved not simply through technology but by clear processes and procedures. Overall responsibility for security is typically assigned. Security procedures are well documented and posted. The documentation is also used to train staff. Some registries may also maintain personal information, such as information needed to contact patients to remind them to gather or submit patient-reported outcome information.

3.3.1. System Security Plan

A system security plan consists of documented policies and standard operating procedures defining the rules of systems, including administrative procedures, physical safeguards, technical security services, technical security mechanisms, electronic signatures, and audit trails, as applicable. The rules delineate roles and responsibilities. Included in the rules are the policies specifying individual accountability for actions, access rights based on the principle of least privilege, and the need for separation of duties. These principles and the accompanying security practices provide the foundation for the confidentiality and integrity of registry data. The rules also detail the consequences associated with noncompliance.

3.3.2. Security Assessment

Clinical data maintained in a registry can be assessed for the appropriate level of security. Standard criteria exist for such assessments and are based on the type of data being collected. Part of the validation process is a security assessment of the systems and operating procedures. One of the goals of such an assessment is effective risk management, based on determining possible threats to the system or data and identifying potential vulnerabilities.

3.3.3. Education and Training

All staff members of the registry coordinating center should trained periodically on aspects of the overall systems, security requirements, and any special requirements of specific patient registries. Individuals should receive training relating to their specific job responsibilities and document that appropriate training has been received.

3.3.4. Access Rights

Access to systems and data should be based on the principles of least privilege and separation of duties. No individual should be assigned access privileges that exceed job requirements, and no individual should be in a role that includes access rights that would allow circumvention of controls or the repudiation of actions within the system. In all cases, access should be limited to authorized individuals.

3.3.5. Access Controls

Access controls provide the basis for authentication and logical access to critical systems and data. Since the authenticity, integrity, and auditability of data stored in electronic systems depend on accurate individual authentication, management of electronic signatures (discussed below) is an important topic.

Logical access to systems and computerized data should be controlled in a way that permits only authorized individuals to gain access to the system. This is normally done through a unique access code, such as a unique user ID and password combination that is assigned to the individual whose identity has been verified and whose job responsibilities require such access. The system should require the user to change the password periodically and should detect possible unauthorized access attempts, such as multiple failed logins, and automatically deauthorize the user account if they occur. The identification code can also be an encrypted digital certificate stored on a password-protected device or a biometric identifier that is designed so that it can be used only by the designated individual.

Rules should be established for situations in which access credentials are compromised. New password information should be sent to the individual by a secure method.

Intrusion detection and firewalls should be employed on sites accessible to the Internet, with appropriate controls and rules in place to limit access to authorized users. Desktop systems should be equipped with antivirus software, and servers should run the most recent security patches. System security should be reviewed throughout the course of the registry to ensure that management, operational, personnel, and technical controls are functioning properly.

3.3.6. Data Enclaves

With the growth of clinical data and demands for increasing amounts of clinical data by multiple parties and researchers, new approaches to access are evolving. Data enclaves are secure, remote-access systems that allow researchers to share respondents' information in a controlled and confidential manner.23 The data enclave uses statistical, technical, and operational controls at different levels chosen for the specific viewer. This can be useful both for enhancing protection of the data and for enabling certain organizations to access data in compliance with their own organization or agency requirements. Data enclaves also can be used to allow other researchers to access a registry's data in a controlled manner. With the growth of registries and their utility for a number of stakeholders, data enclaves will become increasingly important.

3.3.7. Electronic Signatures

Electronic signatures provide one of the foundations of individual accountability, helping to ensure an accurate change history when used in conjunction with secure, computer-generated, time-stamped audit trails. Most systems use an electronic signature. For registries that report data to FDA, such signatures must meet criteria specified in 21 CFR Part 11 for general signature composition, use, and control (sections 11.100, 11.200, and 11.300). However, even registries that do not have such requirements should view these as reasonable standards. Before an individual is assigned an electronic signature, it is important to verify the person's identity and train the individual in the significance of the electronic signature. In cases where a signature consists of a user ID and a password, both management and technical means should be used to ensure uniqueness and compliance with password construction rules. Password length, character composition, uniqueness, and validity life cycle should be based on industry best practices and guidelines published by the National Institute of Standards and Technology. Passwords used in electronic signatures should abide by the same security and aging constraints as those listed for system access controls.

3.3.8. Validation

Systems that store electronic records (or depend on electronic or handwritten signatures of those records) that are required to be acceptable to FDA must be validated according to the requirements set forth in the 21 CFR Part 11 Final Rule,24 dated March 20, 1997. The rule describes the requirements and controls for electronic systems that are used to fulfill records requirements set forth in agency regulations (often called “predicate rules”) and for any electronic records submitted to the agency. FDA publishes nonbinding guidance documents from time to time that outline its current thinking regarding the scope and application of the regulation. The current guidance document is Guidance for Industry: Part 11, Electronic Records; Electronic Signatures – Scope and Application,25 dated August 2003. Other documents that are useful for determining validation requirements of electronic systems are Guidance for Industry: Computerized Systems Used in Clinical Investigations,26 dated May 2007, and General Principles of Software Validation; Final Guidance for Industry and FDA Staff,27 dated January 11, 2002.

4. Resource Considerations

Costs for registries can be highly variable, depending on the overall goals. Costs are also associated with the total number of sites, the total number of patients, and the geographical reach of the registry program. Each of the elements described in this chapter has an associated cost. Table 11–2 provides a list of some of the activities of the registry coordinating center as an example. Not all registries will require or can afford all of the functions, options, or quality assurance techniques described in this chapter. Registry planners must evaluate benefit versus available resources to determine the most appropriate approach to achieve their goals.

Table 11–2. Data activities performed during registry coordination.

Table 11–2

Data activities performed during registry coordination.

Case Examples for Chapter 11

Case Example 24Developing a performance-linked access system

DescriptionThe Teva Clozapine Patient Registry is one of several national patient registries for patients taking clozapine. The registry is designed as a performance-linked access system (PLAS) mandated by the U.S. Food and Drug Administration (FDA) to comply with a Risk Evaluation Mitigation Strategy. The goal is to prevent clozapine rechallenge in patients at risk for developing clozapine- induced agranulocytosis by monitoring lab data for signs of leukopenia or granulocytopenia.
SponsorTeva Pharmaceuticals USA
Year Started1997
Year EndedOngoing
No. of SitesOver 50,000 active physicians and pharmacies
No. of Patients57,000 active patients


Clozapine is classified as an atypical antipsychotic and is indicated for patients with severe schizophrenia who fail standard therapy, and for reducing the risk of recurrent suicidal behavior in schizophrenia or schizoaffective disorder. However, clozapine is known to be associated with a risk of developing agranulocytosis, a potentially life-threatening condition. The primary goal of the registry is to prevent clozapine-induced agranulocytosis. Patients at risk of developing clozapine-induced agranulocytosis are those who have a history of severe leukopenia or granulocytopenia (white blood cell [WBC] <2,000/mm3 or absolute neutrophil [ANC] <1,000/mm3).

Because of the potential serious side effects, FDA requires manufacturers of clozapine to maintain a patient monitoring system. Designed as a PLAS, the registry needs to ensure the eligibility of patients, pharmacies, and physicians; monitor white blood cell (WBC) and absolute neutrophil (ANC) reports for low counts; ensure compliance with laboratory report submission timelines; and respond to inquiries and reports of adverse events.

Proposed Solution

The risk of developing agranulocytosis is mitigated by regular hematological monitoring and is a condition of access to the drug, also known as the “no-blood/no drug” requirement. Since there are multiple manufacturers of clozapine, FDA requires each company to share information with the single national non-rechallenge master file (NNRMF). The Teva Clozapine Patient Registry was developed to meet these goals. The core components of the system are a call center, a Web site, and a reminder system. Patients must be enrolled prior to receiving clozapine, and they must be assigned to a dispensing pharmacy and treating physician. After the patient has initiated therapy, a current and acceptable WBC count and ANC value are required prior to dispensing clozapine. Once a patient is enrolled and eligibility is confirmed, a 1-, 2-, or 4-week supply of clozapine can be dispensed, depending on patient experience and the physician's prescription.

Health care professionals are required to submit laboratory reports to the registry based on the patients' monitoring frequency. Patients are monitored weekly for the first 6 months. If there are no low counts, the patient can be monitored every 2 weeks for an additional 6 months. Afterward, if no low counts are detected after continuous therapy, the patient may qualify for monitoring every 4 weeks (depending on the physician's prescription). The registry provides reminders if laboratory data are not submitted according to the schedule. If a low count is identified, registry staff inform the health care providers to make sure that they are aware of the event and appropriate action is taken. If severe leukopenia or granulocytopenia is detected, the patient is posted to the NNRMF to prevent future exposure to the drug.


Results indicate that the registry is achieving its goal of reducing the risk of agranulocytosis associated with the use of clozapine by serving as an early warning system. By linking access to clozapine to a strict schedule of laboratory data submissions, the sponsor can ensure that only eligible patients are taking the drug. The sponsor is also able to detect low counts, prevent inappropriate rechallenge (or re-exposure) in at-risk patients, and monitor the patient population for any adverse events. This system provides the sponsor with data on the frequency and severity of adverse events while ensuring that only the proper patient population receives the drug.

Key Point

A PLAS can ensure that only appropriate patients receive treatment. A secure, fully functional Web site allows health care professionals to manage their patients electronically. A reminder system permits rapid notification to providers to ensure that appropriate actions are taken when low counts are detected or if laboratory reports are not submitted in a timely manner. A call center with after-hours service ensures 24/7 availability, and data sharing with the NNRMF prevents rechallenge regardless of manufacturer. These systems can also help sponsors monitor the patient population to learn more about adherence, compliance, and the frequency of adverse events.

For More Information

Clozapine Package Insert (2012).

Honigfeld G. The Clozapine National Registry System: forty years of risk management. J Clin Psychiatry Monograph. 1996;14(2):29–32.

Karukin M, Conner J, Lage M. Incidence of leukopenia and agranulocytosis with the use of clozapine: Evidence from the Teva Clozapine Patient Registry; Poster Presented at the 23rd Annual U.S. Psychiatric and Mental Health Congress, Poster #219; November 20, 2010..

Peck CC. FDA's position on the clozaril patient management system. Hosp Community Psychiatry. 1990 Aug;41(8):876–7. [PubMed: 2401475].

Case Example 25Using audits to monitor data quality

DescriptionThe Vascular Study Group of New England (VSGNE) is a voluntary, cooperative group of clinicians, hospital administrators, and research personnel, organized to improve the care of patients with vascular disease. The purpose of the registry is to collect and exchange information to support continuous improvements in the quality, safety, effectiveness, and cost of caring for patients with vascular disease.
SponsorFunded by participating institutions. Initial funding was provided by the Centers for Medicare & Medicaid Services.
Year Started2002
Year EndedOngoing
No. of Sites30 hospitals in Connecticut, Rhode Island, Massachusetts, Maine, New Hampshire, and Vermont
No. of PatientsOver 25,000


VSGNE established a registry in 2002 as part of an effort to improve quality of care for patients undergoing carotid endarterectomy, carotid stenting, lower extremity arterial bypass, and open and endovascular repair of abdominal aortic aneurysms. The registry collects more than 120 patient, process, and outcome variables for each procedure at the time of hospitalization, and 1-year results are collected during a followup visit at the treating physician's office. All patients receiving one of the procedures of interest at a participating hospital are eligible for enrollment in the registry.

In considering the areas of greatest risk in evaluating the quality of this registry, the registry developers determined that incomplete enrollment of eligible patients was one major potential area for bias. It was determined that an audit of participating sites, focusing on included versus eligible patients, could reasonably address whether this was a significant issue. However, the group needed to overcome two logistical challenges: (1) the audit had to review thousands of eligible patients at participating hospitals in a timely, cost-effective manner; and (2) the audit could not overburden the hospitals, as they participate in the study voluntarily.

Proposed Solution

The registry team developed a plan to conduct the audit using electronic claims data files from the hospitals. Each hospital was asked to send claims data files for the appropriate time periods and procedures of interest to the registry. The registry team at Dartmouth-Hitchcock Medical Center then matched the claims data to the registry enrollment using ICD-9 (International Classification of Diseases, 9th Revision) codes with manual review of some patient files that did not match using a computer-matching process.


The first audit performed in 2003 found that approximately 7 percent of eligible patients had not been enrolled in the registry. Because of concerns that the missing patients may have had different outcomes than the patients who had been enrolled in the registry, the registry team asked participating hospitals to complete registry forms for all missing patients. This effort increased the percentage of eligible patients enrolled in the registry to over 99 percent. The team also compared the discharge status of the missing patients and the enrolled patients, and found no significant differences in outcomes. The team concluded that the patients had been missed at random and that there were no systematic enrollment issues. Discussions with the hospitals identified the reasons for not enrolling patients as confusion about eligibility requirements, training issues, and questions about informed consent requirements.

Subsequent audits in 2006 and 2008 had similar outcomes, but considerable time was required to clarify ICD-9 coding differences with procedures in the registry, since ICD-9 codes are not granular for vascular procedures. In 2011, the VSGNE model for regional vascular quality improvement was adopted by the Society for Vascular Surgery as the Vascular Quality Initiative, now a national network of regional quality groups like VSGNE, organized under the umbrella of the Society for Vascular Surgery's patient safety organization. In 2012, the now nationwide audit mechanism for data completeness switched from using ICD-9 codes to physician current procedural terminology (CPT®) claims data, since CPT codes are more precise for specific vascular procedures. Preliminary results in 2012 show more precise matching with registry data using CPT-based claims.

Key Point

For many registries, audits of participating sites are an important tool for ensuring that the data are reliable and valid. However, registries that rely on voluntary site participation must be cautious to avoid overburdening sites during the audit process. A remote audit using readily available electronic files, such as claims files, provided a reasonable assessment of the percentage of eligible patients enrolled in the registry without requiring large amounts of time or resources from participating sites.

For More Information

Cronenwett JL, Likosky DS, Russell MT, et al. A regional registry for quality assurance and improvement: the Vascular Study Group of Northern New England (VSGNNE). J Vasc Surg. 2007;46:1093–1102. [PubMed: 17950568].

Cronenwett JL, Kraiss LW, Cambria RP. The Society for Vascular Surgery Vascular Quality Initiative. J Vasc Surg. 2012;55:1529–37. [PubMed: 22542349].

References for Chapter 11

Clinical Data Interchange Standards Consortium. [May 1, 2013]. http://www​
National Institutes of Health. National Institutes of Health Stroke Scale. [May 1, 2013]. http://www​.ninds.nih​.gov/doctors/NIH_Stroke_Scale.pdf.
Luck J, Peabody JW, Dresselhaus TR, et al. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. Am J Med. 2000 Jun 1;108(8):642–9. [PubMed: 10856412]
Reisch LM, Fosse JS, Beverly K, et al. Training, quality assurance, and assessment of medical record abstraction in a multisite study. Am J Epidemiol. 2003 Mar 15;157(6):546–51. [PubMed: 12631545]
Neale R, Rokkas P, McClure RJ. Interrater reliability of injury coding in the Queensland Trauma Registry. Emerg Med (Fremantle). 2003 Feb;15(1):38–41. [PubMed: 12656785]
cancer Text Information Extraction System (caTIES). [August 7, 2013]. http://caties​ [PubMed: 18693990]
Informatics for Integrating Biology and the Bedside. [May 1, 2013]. https://www​
Health Level Seven. [May 1, 2013]. http://www​
U.S. Food and Drug Administration. Guidance for Industry. E6 Good Clinical Practice: Consolidated Guidance. Apr, 1996. [August 7, 2013]. http://www​​/Drugs/Guidances/ucm073122.pdf.
National eHealth Collaborative. [May 1, 2013]. http://www​
National Institute of Standards and Technology. [May 1, 2013]. http://www​
OPTIMIZE-HF. Organized Program to Initiate LifeSaving Treatment in Hospitalized Patients with Heart Failure. [May 1, 2013]. http://www​ [PubMed: 19006680]
Fonarow GC, Heywood JT, Heidenreich PA, et al. Temporal trends in clinical characteristics, treatments, and outcomes for heart failure hospitalizations, 2002 to 2004: findings from Acute Decompensated Heart Failure National Registry (ADHERE). Am Heart J. 2007 Jun;153(6):1021–8. [PubMed: 17540205]
Lewis WR, Peterson ED, Cannon CP, et al. An organized approach to improvement in guideline adherence for acute myocardial infarction: results with the Get With The Guidelines quality improvement program. Arch Intern Med. 2008 Sep 8;168(16):1813–9. [PMC free article: PMC3086550] [PubMed: 18779470]
U.S. Food and Drug Administration. Guidance for Industry: Development and Use of Risk Minimization Action Plans. [May 1, 2013]. http://www​​/RegulatoryInformation​/Guidances/UCM126830.pdf.
Mangano DT, Tudor IC, Dietzel C, et al. The risk associated with aprotinin in cardiac surgery. N Engl J Med. 2006 Jan 26;354(4):353–65. [PubMed: 16436767]
Gheorghiade M, Abraham WT, Albert NM, et al. Systolic blood pressure at admission, clinical characteristics, and outcomes in patients hospitalized with acute heart failure. JAMA. 2006 Nov 8;296(18):2217–26. [PubMed: 17090768]
Fink AS, Campbell DA Jr., Mentzer RM Jr., et al. The National Surgical Quality Improvement Program in non-veterans administration hospitals: initial demonstration of feasibility. Ann Surg. 2002 Sep;236(3):344–53. discussion 53-4. [PMC free article: PMC1422588] [PubMed: 12192321]
The HIPAA Security Rule: Health Insurance Reform: Security Standards, February 20, 2003. 68 FR 8334. [PubMed: 12596712]
National Institutes of Health. NIH Data Sharing Policy and Implementation Guidance. [May 1, 2013]. http://grants​​/grants/policy/data_sharing​/data_sharing_guidance​.htm#enclave.
U.S. Food and Drug Administration. CFR - Code of Federal Regulations Title 21, Part 11: Electronic Records; Electronic Signatures. [May 1, 2013]. http://www​.accessdata​​/cfdocs/cfcfr/CFRSearch​.cfm?CFRPart=11&showFR=1.
U.S. Food and Drug Administration. Guidance for Industry Part 11, Electronic Records; Electronic Signatures — Scope and Application. [May 1, 2013]. http://www​​/RegulatoryInformation​/Guidances/ucm125125.pdf.
U.S. Food and Drug Administration. Guidance for Industry Computerized Systems Used in Clinical Investigations. [May 1, 2013]. http://www​​/Drugs/GuidanceComplianceRegulatoryInformation​/Guidances/UCM070266.pdf.
U.S. Food and Drug Administration. General Principles of Software Validation; Final Guidance for Industry and FDA Staff. [May 1, 2013]. http://www​​/MedicalDevices​/DeviceRegulationandGuidance​/GuidanceDocuments/UCM085371​.pdf.


  • PubReader
  • Print View
  • Cite this Page

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...