Case Example 19Integrating Data From Multiple Sources With Patient ID Matching

DescriptionKIDSNET is Rhode Island’s computerized registry to track children’s use of preventive health services. The program collects data from multiple sources and uses those data to help providers and public health professionals identify children in need of services. The purpose of the program is to ensure that all children in the State receive appropriate preventive care measures in a timely manner.
SponsorState of Rhode Island, Centers for Disease Control and Prevention, and others
Year Started1997
Year EndedOngoing
No. of Sites228 participating practices plus other authorized users
No. of Patients289,120


In the 1990s, the Rhode Island Department of Health recognized that its data on children’s health were fragmented and program specific. The State had many children’s health initiatives, such as programs for hearing assessment and lead poisioning prevention, but these programs collected data separately and did not attempt to link the information. This type of fragmented structure is common in public health agencies, as many programs receive funding to fulfill a specific need but no funding to link that information with other programs. This type of linkage would benefit the department’s activities, as children who are at risk for one health issue are often at risk for other health issues. By integrating the data, the department would be able to better integrate services and provide better service.

To integrate the data from these multiple sources and to allow new data to be entered directly into the program, the department implemented the KIDSNET computerized registry. The registry consolidates data from 11 different sources to provide an overall picture of a child’s use of preventive health care services. The sources are newborn developmental risk screening; the immunization registry; lead screening; hearing assessment; Women, Infants, and Children (WIC); home visiting; early intervention; blood spot screening; foster care; birth defects; and vital records data. The goals of the registry are to monitor and assure the use of preventive health services, provide decision support for immunization administration, give providers reporting capacity to identify children who are behind in services, and provide recall services and quality assurance.

After being launched in 1997, the registry began accumulating data on children who were born in the State or receiving preventive health care services in the State. Some of the 11 data sources entered data directly into the registry, and some of the data sources sent data from another database to the registry. The registry then consolidated data from these 11 sources into a single patient record for each child by matching the records using simple deterministic logic. As the registry began importing records, the system held some records as questionable matches, since it could not determine if the record was new or a match to an existing record. These records required manual review to resolve the issue, which was time consuming, at approximately 3 minutes per record.

Without resources to devote to the manual review, the number of records held as questionable matches increased to 48,685 by 2004. The time to resolve these records manually was estimated at 17 months, and the registry did not have the resources to devote to that task. However, the incomplete data resulting from so many held records made the registry less successful at tracking children’s health and less utilized by providers.

Proposed Solution

To resolve the issue of patient matching, the sponsor implemented an automated solution to the matching problem after evaluating several options, including probabilistic and deterministic matching strategies and commercial and open-source options for matching software. Since the State had limited funds for the project, an open-source product, Febrl, was selected.

A set of rules to process incoming records was developed, and an interface was created for the manual review of questionable records. Using the rules, the software determines the probability of a match for each record. The registry then sets probability thresholds above which a record is considered a certain match and below which a record is considered a new record. All of the records that fall into the middle ground require manual review.


After considerable testing, the new system was launched in spring 2004. Immediately upon implementation, 95 percent of the held records were processed and removed from the holding category, resulting in the addition of approximately 11,000 new patient records to the registry. The new interface for manual review reduced the time to resolve an error from 3 minutes to 40 seconds. With these improvements, the registry now imports 95 percent of the data sent to the database and is able to process the questionable records through the improved interface.

Key Point

Many strategies and products exist to deal with matching patients from multiple data sources. Once a product has been selected, careful consideration must be given to the probability thresholds for establishing a match. Setting the threshold for matches too high may result in an unmanageable burden of manual review. However, setting the threshold too low could affect data quality, as records may be merged inappropriately. A careful balance must be found between resources and data quality in order for matching software to help the registry. In addition, matching quality should be monitored over time, as matching rules and probability thresholds may need to be adjusted if the underlying data quality issues change.

For More Information

  1. Wild EL, Hastings TM, Gubernick R. et al. Key elements for successful integrated health information systems: lessons learned from the states. J Public Health Manag Pract. 2004 Suppl:S36–S47. [PubMed: 15643357]

From: Chapter 6, Data Sources for Registries

Cover of Registries for Evaluating Patient Outcomes: A User's Guide
Registries for Evaluating Patient Outcomes: A User's Guide. 2nd edition.
Gliklich RE, Dreyer NA, editors.

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.