NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Henriksen K, Battles JB, Marks ES, et al., editors. Advances in Patient Safety: From Research to Implementation (Volume 2: Concepts and Methodology). Rockville (MD): Agency for Healthcare Research and Quality (US); 2005 Feb.

Cover of Advances in Patient Safety: From Research to Implementation (Volume 2: Concepts and Methodology)

Advances in Patient Safety: From Research to Implementation (Volume 2: Concepts and Methodology).

Show details

Development of a Multipurpose Dataset to Evaluate Potential Medication Errors in Ambulatory Settings

, on behalf of for the HMO Research Network CERT Patient Safety Investigators.

Author Information


Ten health maintenance organizations (HMOs) of the HMO Research Network Center for Education and Research on Therapeutics (CERT) participated in a descriptive study of the frequency of potential medication errors in ambulatory settings. This report describes the data development process and demographic information of the study subjects. As this data resource has served as the basis for seven patient safety studies, it is crucial to describe in detail the rationale for this data development scheme and the demographic and membership attrition information of the study population. Approximately 200,000 health plan members who had pharmacy benefits at anytime from January 1, 1999, through June 30, 2001, were selected from each of the 10 HMOs. The study population included all of the health plan members in a randomly selected clinical practice of each HMO, resulting in a representative sample of the source population. For each study subject, claims records of drug dispensing, hospitalization, ambulatory visits, and special examinations during the 2.5-year study period were organized in a standard format at each HMO. The source data reside at each HMO, and only de-identified and processed data were combined for final analyses. In addition to supporting patient safety research, this 2-million-subject data source has been used in a number of epidemiology and health services research studies.


Medical errors kill more people each year in the United States than motor vehicle accidents, HIV infection, or breast cancer Medical errors are now clearly recognized as a serious and common problem in the delivery of health care. 1 Errors involving medication use—including mistakes in prescribing, dispensing, administering, and monitoring medications and errors in systems management— are among the most common types of medical error. 1, 2 While considerable attention has been paid to medical and medication errors in hospitalized patients, remarkably little is known about medication errors in the outpatient setting, where more than 2 billion prescriptions are dispensed each year. 3 Although the MedWatch spontaneous reporting system of the Food and Drug Administration (FDA) includes reports of medical errors among the 300,000 adverse drug event reports it receives each year, 4 investigators cannot use these reports to evaluate the frequency or impact of medication errors in ambulatory care settings. Recognizing this limitation, the FDA has called for collaborative research with networks, such as the HMO Research Network Center for Education and Research on Therapeutics (CERT), that can evaluate error signals generated from the MedWatch reports from a population-based perspective. 5 In response to a request for proposals by the Agency for Healthcare Research and Quality (AHRQ), the HMO Research Network CERT has initiated a research project among health plan members to study the frequency of apparent medication errors in routine ambulatory care. We used existing automated pharmacy dispensing information and automated inpatient and outpatient diagnosis and procedure data, plus limited full- text record review, to assess the frequency of these events. The construction of the study cohort and summary statistics of the cohort are described in this report.


HMO research network

The HMO Research Network is a consortium of health maintenance organization (HMO) research programs. The mission of the HMO Research Network is to encourage high quality, public domain research involving health maintenance and managed care organizations by enhancing the capabilities of individual HMO members, fostering collaborative research, and influencing the national research agenda ( As of July 2004, there are 13 member organizations. The HMO Research Network CERT is one of several research collaborations that involve Network members; others include the Cancer Research Network funded by the National Cancer Institute, 6 and the Integrated Delivery Systems Research Network 7 funded by AHRQ. The Vaccine Safety Datalink of the National Immunization Program 8 is also comprised largely of members of the HMO Research Network. Ten health plans of the HMO Research Network have participated in this study of medication errors in ambulatory care settings. Together, these organizations are responsible for the health care of approximately 7 million individuals in all regions of the country. This care is provided by more than 59,000 physicians, including nearly 16,000 primary care providers in more than 1,000 sites. These health plans serve diverse populations, with the proportion of black members as high as 33 percent in some HMOs, the proportion of Hispanics as high as 38 percent, and the proportion of Asians as high as 13 percent. All participating health plans serve Medicaid and Medicare beneficiaries, as well as commercially insured members. These plans also represent a variety of managed care organizational models, including staff model, group, network, and independent physicians association (IPA) systems. Individual HMOs in this group have up to 69 percent of their members in networks, while others have up to 66 percent in IPAs. Additional information about the HMOs and their members is shown in Table 1.

Table 1. Description of the 10 HMOs (data from 1999).

Table 1

Description of the 10 HMOs (data from 1999).

Database development

Our primary aim was to identify a cohort that was sufficiently large so that frequency of medication errors associated with commonly used drugs in the general population could be assessed with reasonable accuracy. The data have since been used for descriptive epidemiology, drug utilization, and drug safety research as well. We wanted to have equal contribution from each participating HMO to facilitate disguising the identity of the HMO. To accomplish this, we selected from each HMO approximately 200,000 subjects who had ever been a health plan member with pharmacy benefits between January 1, 1999, and June 30, 2001—a total population of 2 million subjects. The sampling unit was a medical practice, and all HMO members, including children and individuals who filled no prescriptions, affiliated with the randomly selected practice were included. Practices with at least 100 HMO members were eligible for selection. We adopted this sampling scheme because this facilitated practice-based interventions and simplified retrieval of full-text medical records for review.

Data elements in the research database

For each selected health plan member, demographic, membership, pharmacy dispensing, and health care utilization information were extracted from the HMO databases and stored in the same format in the research unit at each HMO. These data elements are summarized in Table 2.

Table 2. Data elements in the HMO Research Network Patient Safety dataset.

Table 2

Data elements in the HMO Research Network Patient Safety dataset.

Demographic information included date of birth and gender. In addition, we adopted the geocoding methodology developed by Krieger and colleagues 11 to generate variables that could serve as a proxy measure of the socioeconomic status for each health plan member. Addresses of study subjects were linked to census tracts, and the average of socioeconomic status variables for that particular census tract was assigned to individual subjects. Three U.S. Census variables were chosen: Race (P006), Sex by Educational Attainment Population 25 and Over (P037), and Poverty Status by Age (P087). 11 Based on research by Krieger and colleagues in a similar population, these variables were chosen, with poverty status in particular, as a proxy for socioeconomic measures. Membership information was important because it helped define eligible periods during which events of interest could have occurred. For example, we needed continuous membership information before a dispensing to evaluate whether a dispensing was an initiation of therapy. Or, if there was no followup laboratory test after a dispensing, membership information could help us ascertain if the lack of followup was due to membership termination or failure to order the test. It was not uncommon for health plan members to switch from one benefit plan to another, and we established a data structure to accommodate the changes of benefit plans for individual health plan members during the 2.5-year study period. Pharmacy benefit and copayment amount are important attributes of benefit plans. We included start and stop dates for enrollment, pharmacy benefit, type of insurance coverage (commercial, Medicare, Medicaid, others), and primary care provider associated with each membership period. We bridged all gaps in membership of 45 days or fewer. Pharmacy dispensings in the automated databases included the following data elements: National Drug Code (NDC ); generic and product name ; formulation ; dosage ; date dispensed ; quantity dispensed ; days of supply ; refill indicator ; and prescribe identifier. We used the NDC as the primary identifier for all medications. Some HMO-owned pharmacies assigned HMO-specific codes to represent repackaged products, and we identified the nonstandard codes for drugs of interest. For health care utilization, we identified all ambulatory visits and hospitalizations among study subjects during the study period and obtained all diagnosis and procedure codes associated with the encounter. Most of the diagnoses (primary and secondary) were coded according to the ICD-9-CM system. The procedures—including laboratory tests, radiology examinations, and other special tests—were usually coded according to the CPT-4 or ICD-9-CM system. Some ambulatory or hospital systems use other coding conventions, and we identified these codes for diseases and procedures of interest.


Taking into account the proprietary nature of HMO data and Health Insurance Portability and Accountability Act (HIPAA) regulations, we adopted the principle of “minimum necessary data”; source data were retained at each HMO, and no personal-level data left the HMOs unless absolutely necessary. For purely descriptive studies in which the study objectives were to describe the frequency of events, summary tables from each HMO were generated and combined; no personal-level data left the HMOs. When personal-level data moved across HMOs, only data elements needed to support predefined analyses were transferred after appropriate aggregation and de-identification. The following paragraphs in this section describe the de-identification processes being used when personal-level information needed to be transferred outside of the HMOs for combination into analytic datasets. For each study subject, the HMO membership number was used as the unique identifier (ID) that linked the demographic, membership, pharmacy dispensing, and health care utilization information described above. This HMO ID was kept at each HMO and never left the HMO. If personal-level data needed to be transferred outside of the HMO, a randomly generated Study ID was generated to replace the HMO ID. The crosswalk between the HMO ID and Study ID was securely stored at each HMO. As this data set was developed to support multiple studies, the Study IDs were not reused. If a single health plan member had used multiple drugs of interest, that person would have multiple Study IDs for different research projects. For example, if a person received a nonsteroidal anti-inflammatory drug (NSAID) and a thiazolidenedione, that person would have one Study ID for an analysis that evaluated inappropriate comedications with NSAID and another Study ID for an analysis that evaluated the adherence to liver function monitoring among patients on thiazolidenedione therapy. We did not transmit dates in personal-level datasets. Date of birth was not included in data files shared outside of the HMOs. Age as of an index event for a study subject was calculated. In this study, the index event was always the dispensing of a drug of interest, for example, the first dispensing of isotretinoin after January 1, 1999. As a study subject's exact age is rarely needed in adult population studies, we assigned study subjects to 5-year or 10-year age groups in the data set that was shared outside of the HMOs, and we grouped all individuals of older age into a single group. For example, we have used “older than 75,” “older than 80,” or “older than 85” as the oldest age group. After the index event was defined, the occurrence of all other events of interest was specified relative to the index event. For example, we defined a specific time window after the dispensing of a drug of interest; the laboratory test that followed that dispensing was recorded in number of days after the index dispensing, rather than the date that the laboratory test was ordered. While preserving the temporal sequence of events of interest, data prepared in this fashion had no date at the personal level that was shared outside of the HMO.

For drug dispensings, diagnoses, and procedures of interest, we translated the codes into clinically meaningful entities, including therapeutic classes for drugs, diseases for diagnosis codes, and laboratory tests (such as liver function tests) for procedure codes, before personal-level data were transferred outside of the HMOs. This data development process not only would ensure the appropriate mapping of codes to clinical constructs, but also added another level of de-identification. For example, in a study of angiotensin-converting enzyme inhibitors (ACEI), data sets were developed such that the investigators would only know about dispensings of ACEI during the study period, without specific knowledge about which particular ACEI was used. For patients with multiple diseases and receiving several drugs, the analytic data sets would include only the drugs, diseases, and/or laboratory tests of interest. Predefined comorbidity and comedications of interest were also included as unique entities or as aggregate comorbidity measures. Other dispensings or health care utilization information was not included. Although the geocoding process described above required access to health plan members' addresses, once the addresses were geocoded into the three socioeconomic variables described above, they were erased from the datasets. Datasets from individual HMOs were forwarded to a Data Coordinating Center (described below) for quality checks and concatenation. After datasets were combined, HMO identities were disguised by a letter and the crosswalk between the letter and individual HMOs destroyed. The resulting datasets, with no personal identifier and no dates, were sent to the lead investigator for final analysis. This rigorous de-identification process provided maximal protection of patients' privacy and of proprietary information of each HMO. This data development process has been approved by the Human Subjects Committee of the participating institutions.

Distributed data development

A Data Coordinating Center for the HMO Research Network CERT has been established at the Channing Laboratory, a research institution under the Brigham and Women's Hospital and Harvard Medical School. To ensure consistent implementation of data extraction, detailed instructions were prepared for each HMO to assemble the demographic, membership, drug dispensing, and health care utilization information described above. These data were organized in the same format at each HMO. Statistical analytical system (SAS; SAS Institute, Inc., Cary, NC) programs developed at the Data Coordinating Center were executed at each HMO to verify data integrity and consistency. For each individual study, a series of SAS programs was developed and distributed to participating HMOs. Typically these SAS codes qualified study subjects on membership, demographic, and drug use criteria; then study-specific data elements were generated. The process of the same SAS programs being executed at each HMO ensured consistent implementation of the study protocols across multiple HMOs and decreased the programming costs at each HMO. The products were fully de-identified datasets that supported the final analyses for each study. This data development environment also allowed efficient implementation of manual review of full- text medical records. SAS programs developed centrally to select subjects for medical records review were distributed to each HMO. As the true HMO IDs of the study subjects were kept within the data sets at the HMO, research staff at each HMO could readily identify the medical care provider information and request full- text medical records. Review of records was routinely carried out at the HMO. Abstracted information entered into standardized forms with the Study ID, but not the true HMO ID, along with copies of the portion of anonymized medical records were forwarded to the Data Coordinating Center for further analysis. The Study ID on the forms allowed linkage of abstracted information with processed automated information described above.

Comorbidity measures

We have implemented methods to calculate two commonly used comorbidity measures: the Chronic Disease Score and the Deyo version of the Charlson Index. The Chronic Disease Score, a metric based on outpatient utilization of drugs for chronic diseases to represent a study subject's general health status, was developed in one HMO 12 and validated in multiple HMOs in predicting subsequent hospitalizations. 9 The Charlson Index was originally developed with information based on manual abstraction of medical records and then implemented with automated claims data. 10 We have posted the drug dictionary and SAS codes at the AHRQ Patient Safety Web site and have shared them with investigators from all over the world.


Study population

We identified 2,020,037 health plan members who had at least 1 day of health plan membership with pharmacy benefits from January 1, 1999, through June 30, 2001, from the 10 participating HMOs. Age and gender distribution of the study subjects are shown in Table 3. The distribution basically reflects a population whose majority are employed persons and their family members. The number of subjects from each HMO ranged from 200,000 to 206,865. Almost half of the study subjects (992,239 or 49 percent) had continuous membership throughout the 2.5-year study period, and 1,282,235 (64 percent) had continuous membership throughout 2000. Among the study cohort, total health plan membership counts on January 1, 2000, June 30, 2000, and January 1, 2001, were approximately 1.5 million, indicating that the number of study subjects who joined the HMO during the study period was somewhat equal to the number of study subjects who terminated health plan membership during the study period. The average length of observation for each study subject at each HMO ranged from 1.64 years to 1.94 years, and the overall average length of observation per study subject was 1.82 years.

Table 3. Age and sex distribution of 2,020,037 study subjects from 10 HMOs.

Table 3

Age and sex distribution of 2,020,037 study subjects from 10 HMOs.

Health care utilizations of the study population

Selected health care utilization parameters are given in Table 4. The study subjects received 34.6 million dispensings during the study period, an average of 17.2 dispensings per person. Most (94.2 percent) of all dispensings occurred during a membership period that the health plan member had pharmacy benefits. More than 99 percent of the dispensings had readily identifiable NDCs. Some of the selected health plan members had no record of drug dispensing or encounter with the health plan, and the proportion of health plan members in this category ranged from 6 percent to 17 percent across the 10 HMOs.

Table 4. Frequency of drug dispensings and health care utilization among 2,020,037 health plan members from 10 HMOs from January 1999 through June 2001.

Table 4

Frequency of drug dispensings and health care utilization among 2,020,037 health plan members from 10 HMOs from January 1999 through June 2001.

As for the ICD-9 diagnosis codes within the health care utilization file, more than 98 percent of the codes from 7 of the 10 HMOs were readily identifiable according to the standard ICD-9 dictionary; at an 8th HMO, 96 percent of the codes were readily identifiable. Customized procedure codes were much more prevalent among some HMOs. At one HMO, 60 percent of its procedure codes were nonstandard (non-CPT or non-ICD-9). At another HMO, 45 percent of its procedure codes were nonstandard. These are not indicators of data quality, but a reflection of the wide range of data systems and coding schemes being used across a diverse HMO population.

Of all the diagnosis codes within the data sets, 93.5 percent were on a date within the health plan member's membership period. For procedure codes, 92.4 percent were on a date within the study subject's membership period.

Medication safety research

Collectively, the data sets have been referred to as the first Patient Safety dataset of the HMO Research Network CERT, and the data have been used to support seven research projects related to potential medication errors (Table 5). For several research projects on medication errors, medical records have been selected for manual review and validation of the apparent error. In addition, the dataset have been used for four studies that are not directly related to medication errors.

Table 5. Studies using the HMO Research Network CERT Patient Safety dataset.

Table 5

Studies using the HMO Research Network CERT Patient Safety dataset.


In this report we described the data development process and selected summary statistics of the HMO Research Network Patient Safety dataset. The extracted data have been found to be of high quality and have been used to support Patient Safety research as well as non-Patient Safety research. In addition to providing data for observational studies to quantify the frequency of apparent medication errors, the same health care delivery systems and the data sources have been used to test intervention strategies to prevent errors.

The limitations for this dataset are related to automated data based on insurance claims. We inferred drug use through drug dispensing records and could not quantify the level of misclassification, due to imperfect adherence or use of medications not reimbursed by the health plans. Based on dispensing records, we could not precisely define dosing level for some medications. The Patient Safety dataset will extend its observation period through the end of 2003, providing up to 5 years of longitudinal observation time for a large number (up to 1 million) of health plan members. In addition, another Patient Safety cohort of 2 million health plan members from July 1, 2001, through the end of 2003 will be constructed in the same manner.


An infrastructure to evaluate and prevent potential medication errors in ambulatory care settings within managed care organizations has been developed. These data sources—with large numbers of health plan members with complete capture of drug dispensing and health care utilization data— will serve as the basis for Patient Safety research, health services research, and epidemiology studies.


This project is supported by a grant from AHRQ (U18HS11843). Investigators for the HMO Research Network CERT Patient Safety study are Richard Platt, M.D., M.S. ( principal investigator), Department of Ambulatory Care and Prevention, Harvard Medical School, Harvard Pilgrim Health Care, and Channing Laboratory, Brigham and Women's Hospital, and Harvard Medical School; K. Arnold Chan, M.D., Sc.D., Channing Laboratory, Brigham and Women's Hospital, and Harvard Medical School; Jennifer Elston-Lafata, Ph.D., Henry Ford Health System Center for Health Services Research; Robert L. Davis, M.D., M.P.H., Center for Health Studies, GroupHealth Cooperative, and Departments of Epidemiology Pediatrics, University of Washington; Margaret J. Gunter;, Ph.D., Lovelace Clinic Foundation; Jerry H. Gurwitz, M.D., Meyers Primary Care Institute, Fallon Clinic, and University of Massachusetts Medical School; Joseph V. Selby, M.D., M.P.H., Division of Research, Kaiser Permanente Northern California; Michael Maciosek, Ph.D., HealthPartners Research Foundation; Marsha A. Raebel, Pharm.D., Kaiser Permanente Colorado; Dennis Tolsma, M.P.H., Research Department, Kaiser Permanente Georgia; and David H. Smith, Ph.D., Center for Health Research, Kaiser Permanente Northwest. The investigators thank Parker Pettus, Kimberly Lane, and Michelle Platt of Channing Laboratory, Brigham and Women's Hospital, and Harvard Medical School; Jackie Cernieux and Rachel Kasper of Meyers Primary Care Institute; Julia Hecht of Center for Health Studies, GroupHealth Cooperative; Hugo Xi and Richard Krajenta of Henry Ford Health System Center for Health Services Research; Inna Dashevsky of Harvard Pilgrim Health Care; Dave McClure and Beth Newsome of Kaiser Permanente Colorado; Robert Diseker of the Research Department, Kaiser Permanente Georgia; Connie Uratsu of the Division of Research, Kaiser Permanente Northern California; Xiuhai Yang of Kaiser Permanente Northwest; and Hans Peterson and Melissa Roberts of Lovelace Respiratory Research Institute for their technical assistance in the construction of the Patient Safety dataset.


Kohn LT, Corrigan JM, Donaldson MS, editors. To err is human: building a safer health system. A report of the Committee on Quality of Health Care in America, Institute of Medicine. Washington, D. C.: National Academy Press; 2000.
Bates DW, Cullen DJ, Laird N. et al. Incidence of adverse drug events and potential adverse drug events. Implications for prevention. ADE Prevention Study Group. JAMA. 1995;274:29–34. [PubMed: 7791255]
National Wholesale Druggist' Association. Industry profile and healthcare factbook. Reston, VA: 1998.
Goldman SA, Kennedy DL, Graham DJ, editors. The clinical impact of adverse event reporting. MedWatch Continuing Education Article. Rockville, MD: Food and Drug Administration, October 1996. Available at: http://www​​/articles/medcont/medcont.htm.
Task Force on Risk Management. Managing the risks from medical product usecreating a risk management framework. Report to the FDA Commissioner. Washington, DC: U.S. Department of Health and Human Services, Food and Drug Administration; May 1999.
Wagner EH, Brown M, Field TS, et al. Collaborative cancer research across multiple HMOs: the Cancer Research Network. Abstract presented at the 9th Annual HMO Research Network Conference, 2003 April 2; Denver. http://www​.hmoresearchnetwork​.org/archives​/2003abst/03_pa_50.pdf.
Selby J, Fraser I, Gunter M, et al. Results from IDSRN rapid cycle research projects. Abstract presented at the 9th Annual HMO Research Network Conference, 2003 April 2; Denver. www​.hmoresearchnetwork​.org/archives/2003abst/03_ca_a4.pdf.
Chen RT, DeStefano F, Davis RL. et al. The vaccine safety datalink: immunization research in health maintenance organizations in the USA. Bull WHO. 2000;78:186–94. [PMC free article: PMC2560695] [PubMed: 10743283]
Putnam KG, Buist DSM, Fishman P. et al. Chronic disease score as a predictor of subsequent hospitalization: a multiple HMO study. Epidemiol. 2002;13:340–6. [PubMed: 11964937]
Charlson ME, Pompei P, Ales KL. et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chron Dis. 1987;40:373–83. [PubMed: 3558716]
Krieger N, Chen JT, Waterman PD. et al. Geocoding and monitoring U.S. socioeconomic inequalities in mortality and cancer incidence: does choice of area-based measure and geographic level matter?—the Public Health Disparities Geocoding Project. Am J Epidemiol. 2002;156:471–82. [PubMed: 12196317]
Von Korff M, Wagner EH, Saunders K. A chronic disease score from automated pharmacy data. J Clin Epidemiol. 1992;45:197–203. [PubMed: 1573438]
McPhillips HA, Davis RL, Hecth JA, et al. Off-label prescription drug use in children. Abstract presented at the 9th Annual HMO Research Network Conference, 2003 April 1–2; Denver. Available at: http://www​.hmoresearchnetwork​.org/archives​/2003abst/03_pa_32.pdf.
Raebel MA, Magid DM, Chester EA, et al. Translating research into practice in real time: optimizing laboratory monitoring at initiation of drug therapy. Abstract presented at the 9th Annual HMO Research Network Conference, 2003 April 1–2; Denver. Available at: http://www​.hmoresearchnetwork​.org/archives​/2003abst/03_pa_37.pdf.
Simon SR, Gurwitz JH, Chan KA, et al. Rates of potentially inappropriate medication use among elderly persons in the United States, 2000-2001. J Am Geriatr Soc (in press).


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this page (237K)

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...