Probabilistic methods in matching census samples to the National Death Index

J Chronic Dis. 1986;39(9):719-34. doi: 10.1016/0021-9681(86)90155-4.

Abstract

The National Death Index (NDI) of the National Center for Health Statistics is a powerful tool for identifying deaths in epidemiologic studies. The NDI will generate a list of possible matches for every input record according to the NDI matching criteria. The task of determining a true or correct match out of the list of possible matches becomes formidable when a large number of records are being investigated. In the National Longitudinal Mortality Study nearly one million Census records are being matched to the NDI, thus requiring an efficient and accurate method to screen out the false positive matches. In a pilot study to the larger mortality follow-up, Census Bureau files containing 226,000 person records were matched to the 1979 NDI. The results of this match were used to generate a probabilistic method to separate the possible matches into categories of true positives, false positives and those of questionable status requiring manual review of the Census record and the death certificate. Of the 5542 possible matches about one-third were ultimately determined to be true positives and two-thirds false positives. The probabilistic method was validated by replications on subsets of the data and promises to save considerable time in review of records in the large national study of mortality.

MeSH terms

  • Computers
  • Female
  • Humans
  • Longitudinal Studies
  • Male
  • Mathematics
  • Mortality*
  • Pilot Projects
  • Population*
  • Probability
  • Records*