Send to

Choose Destination
Pharmacoepidemiol Drug Saf. 2001 Aug-Sep;10(5):393-7.

Pattern recognition in health insurance claims databases.

Author information

Epidemiology Division, Ingenix Pharmaceutical Services, Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.


Information in claims databases resides in data patterns rather than in data elements. Finding this information requires new terminology, a willingness to pose questions of form rather than specific hypotheses, and a quality control system that elevates the correctness of data relations above the validity of single facts. The language of claims data is a newspeak of CPT (Current Procedural Terminology), HCPCS (Health Care Financing Agency Common Procedure Coding System), ICD (International Classification of Disease), and NDC (National Drug Codes) for pharmaceutical codes. The techniques of pattern discovery are really ways of asking the data for classes of relations, and they vary in their reliance on external information. Sometimes, the question is entirely constrained by preceding factors. Other times we may recast the natural history of disease into a claims context and ask the data to give us the shape of disease evolution. We can use highly automated systems to evaluate the relations between prespecified factors, or empirical techniques to search out common relations that we have not specified in advance. Using massive data sets requires that quality control corresponds to the nature of the high-level information that we derive from large databases.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center