NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Dusetzina SB, Tyree S, Meyer AM, et al. Linking Data for Health Services Research: A Framework and Instructional Guide [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2014 Sep.

Cover of Linking Data for Health Services Research

Linking Data for Health Services Research: A Framework and Instructional Guide [Internet].

Show details


Clancy CM, Slutsky JR. Commentary: a progress report on AHRQ’s Effective Health Care Program. Health Serv Res. 2007 Oct;42(5):xi–xix. [PMC free article: PMC2254568] [PubMed: 17850519]
Institute of Medicine. Initial National Priorities for Comparative Effectiveness Research. Washington, DC: National Academies Press; 2009.
Congressional Budget Office. Research on the Comparative Effectiveness of Medical Treatments: Issues and Options for an Expanded Federal Role. Washington, DC: 2007. Pub No. 2975.
Smith S. Preface. Med Care. 2007;45(10 Suppl 2):S1–S2.
Sox HC, Greenfield S. Comparative effectiveness research: a report from the Institute of Medicine. Ann Intern Med. 2009;151(3):203–5. [PubMed: 19567618]
VanLare JM, Conway PH, Sox HC. Five next steps for a new national program for comparative-effectiveness research. N Engl J Med. 2010;362(11):970–3. [PubMed: 20164480]
Bloomrosen M, Detmer D. Advancing the framework: use of health data--a report of a working conference of the American Medical Informatics Association. J Am Med Inform Assoc. 2008 Nov–Dec;15(6):715–22. [PMC free article: PMC2585531] [PubMed: 18755988]
Centers for Disease Control and Prevention. FOA: Enhancing Cancer Registry Data for Comparative Effectiveness Research. Atlanta, GA: 2010.
Sturmer T, Jonsson Funk M, Poole C, et al. Nonexperimental comparative effectiveness research using linked healthcare databases. Epidemiology. 2011;22(3):298–301. [PMC free article: PMC4012640] [PubMed: 21464649]
Institute of Medicine. Engineering a Learning Healthcare System: A Look at the Future: Workshop Summary. Washington, DC: National Academies Press; 2011. [PubMed: 21977540]
Blakely T, Salmond C. Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol. 2002;31(6):1246–52. [PubMed: 12540730]
Bohensky MA, Jolley D, Sundararajan V, et al. Data linkage: a powerful research tool with potential problems. BMC Health Serv Res. 2010;10:346. [PMC free article: PMC3271236] [PubMed: 21176171]
Howe GR. Use of computerized record linkage in cohort studies. Epidemiol Rev. 1998;20(1):112–21. [PubMed: 9762514]
Lipscomb J, Gotay C, Snyder C. Outcomes Assessment in Cancer: Measures, Methods, and Applications. Cambridge: Cambridge University Press; 2005.
Brookhart MA, Sturmer T, Glynn RJ, et al. Confounding control in healthcare database research: challenges and potential approaches. Med Care. 2010;48(6 Suppl):S114–20. [PMC free article: PMC4024462] [PubMed: 20473199]
Gliklich R, Dreyer N, editors. Registries for Evaluating Patient Outcomes: A User’s Guide. Agency for Healthcare Research and Quality; Rockville, MD: 2007. AHRQ Publication No. 07-EHC001-1.
Jutte DP, Roos LL, Brownell MD. Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011;32:91–108. [PubMed: 21219160]
Mortensen PB. The untapped potential of case registers and record-linkage studies in psychiatric epidemiology. Epidemiol Rev. 1995;17(1):205–9. [PubMed: 8521938]
Warren JL, Feuer E, Potosky AL, et al. Use of Medicare hospital and physician data to assess breast cancer incidence. Med Care. 1999;37(5):445–56. [PubMed: 10335747]
Hummler HD, Poets C. [Mortality of extremely low birthweight infants - large differences between quality assurance data and the national birth/death registry]. Z Geburtshilfe Neonatol. 2011;215(1):10–17. [PubMed: 21344345]
Li Q, Glynn RJ, Dreyer NA, et al. Validity of claims-based definitions of left ventricular systolic dysfunction in Medicare patients. Pharmacoepidemiol Drug Saf. 2011;20(7):700–8. [PubMed: 21608070]
Setoguchi S, Solomon DH, Glynn RJ, et al. Agreement of diagnosis and its date for hematologic malignancies and solid tumors between Medicare claims and cancer registry data. Cancer Causes Control. 2007;18(5):561–9. [PubMed: 17447148]
Stürmer T, Schneeweiss S, Avorn J, et al. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol. 2005;162(3):279–89. [PMC free article: PMC1444885] [PubMed: 15987725]
Winglee M, Valliant R, Schuren F. A case study in record linkage. Survey Methodol. 2005;31(1):3–11.
Pentecost MJ. HIPAA and the law of unintended consequences. J Am Coll Radiol. 2004;1(3):164–5. [PubMed: 17411551]
Dracup K, Bryan-Brown CW. The law of unintended consequences. Am J Crit Care. 2004;13(2):97–9. [PubMed: 15043236]
Kulynych J, Korn D. The new HIPAA (Health Insurance Portability and Accountability Act of 1996) Medical Privacy Rule: help or hindrance for clinical research? Circulation. 2003;108(8):912–4. [PubMed: 12939240]
Salem DN, Pauker SG. The adverse effects of HIPAA on patient care. N Engl J Med. 2003;349(3):309. [PubMed: 12867622]
Kulynych J, Korn D. The new federal medical-privacy rule. N Engl J Med. 2002;347(15):1133–4. [PubMed: 12374872]
Beebe TJ, Ziegenfuss JY, St Sauver JL, et al. Health Insurance Portability and Accountability Act (HIPAA) authorization and survey nonresponse bias. Med Care. 2011;49(4):365–70. [PMC free article: PMC3179247] [PubMed: 21368682]
Institute of Medicine. Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press; 2009. [PubMed: 20662116]
Bradley CJ, Penberthy L, Devers KJ, et al. Health services research and data linkages: issues, methods, and directions for the future. Health Serv Res. 2010;45(5 Pt 2):1468–88. [PMC free article: PMC2965887] [PubMed: 21054367]
Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64(328):1183–210.
Safran C, Bloomrosen M, Hammond WE, et al. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Informat Assoc. 2007;14(1):1–9. [PMC free article: PMC2329823] [PubMed: 17077452]
Hammill BG, Hernandez AF, Peterson ED, et al. Linking inpatient clinical registry data to Medicare claims data using indirect identifiers. Am Heart J. 2009;157(6):995–1000. [PMC free article: PMC2732025] [PubMed: 19464409]
Tromp M, Ravelli AC, Bonsel GJ, et al. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol. 2011;64(5):565–72. [PubMed: 20952162]
Keyhani S, Woodward M, Federman AD. Physician views on the use of comparative effectiveness research: a national survey. Ann Intern Med. 2010;153(8):551–2. [PubMed: 20956718]
Sox HC. Comparative effectiveness research: a progress report. Ann Intern Med. 2010;153(7):469–72. [PubMed: 20679544]
Sox HC, Helfland M, Grimshaw J, et al. Comparative effectiveness research: challenges for medical journals. J Clin Epidemiol. 2010;63(8):862–4. [PubMed: 20434882]
Newcombe HB, Smith ME, Howe GR, et al. Reliability of computerized versus manual death searches in a study of the health of Eldorado uranium workers. Computers Biol Med. 1983;13(3):157–69. [PubMed: 6617166]
Roos LL, Wajda A. Record linkage strategies. Part 1: estimating information and evaluating approaches. Methods Information Med. 1991;30:117–23. [PubMed: 1857246]
Howe HL, Lake AJ, Shen T. Method to assess identifiability in electronic data files. Am J Epidemiol. 2007;165(5):597–601. [PubMed: 17182982]
Cook LJ, Olson LM, Dean JM. Probabilistic record linkage: relationships between file sizes, identifiers, and match weights. Methods Information Med. 2001;40:196–203. [PubMed: 11501632]
Marsolo K. Approaches to facilitate institutional review board approval of multicenter research studies. Med Care. 2012;50(Suppl):S77–81. [PubMed: 22692264]
Quantin C, Bouzelat H, Allaert FA, et al. Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods Information Med. 1998;37(3):271–7. [PubMed: 9787628]
Quantin C, Bouzelat H, Alleart FA, et al. How to ensure data security of an epidemiological follow up: quality assessment of an anonymous record linkage procedure. Int J Med Informat. 1998;49:117–22. [PubMed: 9723810]
Schneier B. Applied Cryptography, Protocols, Algorithms, and Source Code. Chichester: Wiley; 1994.
Dhir R, Patel AA, Winters S, et al. A multidisciplinary approach to honest broker services for tissue banks and clinical data: a pragmatic and practical model. Cancer. 2008;113(7):1705–15. [PMC free article: PMC2745185] [PubMed: 18683217]
Dwork C. Differential privacy: a survey of results. Theory Applications Models Computation Proc. 2008;4978:1–19.
Fienberg S. Encyclopedia of Social Measurement. Academic Press; 2005. Confidentiality, privacy and disclosure limitation.
Kum HC, Ahalt S, et al. Security and Privacy in Social Network. Springer; 2012. Privacy preserving data integration using decoupled data.
Kum HC, Krishnamurthy A, Pathak D, et al. Secure Decoupled Linkage (SDLink) System for Building a Social Genome; 2013 IEEE International Conference on Big Data (IEEE BigData 2013); 2013.
Kum HC, Krishnamurthy A, Machanavajjhala A, et al. Privacy preserving interactive record linkage (PPIRL). J Am Med Informat Assoc. 2014;21(2):212–20. [PMC free article: PMC3932473] [PubMed: 24201028]
Hertzman CP, Meagher N, McGrail KM. Privacy by design at Population Data BC: a case study describing the technical, administrative, and physical controls for privacy-sensitive secondary use of personal information for research in the public interest. J Am Med Informat Assoc. 2013;20(1):25–8. [PMC free article: PMC3555322] [PubMed: 22935136]
Gladwell M. The Tipping Point: How Little Things Can Make a Big Difference. Boston: Brown and Company; 2000.
Hall KL, Feng AX, Moser RP, et al. Moving the science of team science forward - collaboration and creativity. Am J Prev Med. 2008;35(2):S243–249. [PMC free article: PMC3321548] [PubMed: 18619406]
Stokols D, Hall KL, Taylor BK, et al. The science of team science - overview of the field and introduction to the supplement. Am J Prev Med. 2008;35(2):S77–89. [PubMed: 18619407]
Roos LL, Wajda A, Nicol JP. The art and science of record linkage methods that work with few identifiers. Comput Biol Med. 1986;16(1):45–57. [PubMed: 3948494]
Levenshtein V. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady. 1966;10:707–10.
Jaro MA. Advances in record linkage methodology as applied to matching the 1985 Census of Tampa, Florida. J Am Stat Assoc. 1989;84(406):414–20.
Winkler WE. Proceedings of the Section on Survey Research Methods. American Statistical Association; 1990. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage.
Wajda A, Roos LL. Simplifying record linkage: software and strategy. Comput Biol Med. 1987;17(4):239–48. [PubMed: 3665453]
Randall SM, Ferrante AM, Boyd JH, et al. The effect of data cleaning on record linkage quality. BMC Med Informat Decis Making. 2013;13:64. [PMC free article: PMC3688507] [PubMed: 23739011]
Krewski DA, Wang Y, Bartlett S, et al. The effect of record linkage errors on risk estimates in cohort mortality studies. Survey Methodology. 2005;31(1):13–21.
Newcombe HB, Kennedy JM. Record linkage: making maximum use of the discriminating power of identifying information. Communications of the ACM. 1962;5(11):563–6.
Rogot E, Feinleib M, Ockay KA, et al. On the feasibility of linking census samples to the National Death Index for epidemiologic studies: a progress report. Am J Public Health. 1983 Nov;73(11):1265–9. [PMC free article: PMC1651138] [PubMed: 6625029]
Rogot E, Sorlie P, Johnson NJ. Probabilistic methods in matching census samples to the National Death Index. J Chron Dis. 1986;39(9):719–34. [PubMed: 3734026]
Clark DE. Development of a statewide trauma registry using multiple linked sources of data. Proc Annu Symp Comput Appl Med Care. 1993:654–8. [PMC free article: PMC2850657] [PubMed: 8130556]
Bell RM, Keesey J, Richards T. The urge to merge: linking vital statistics records and Medicaid claims. Med Care. 1994 Oct;32(10):1004–18. [PubMed: 7934268]
Clark DE, Hahn DR. Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. Proc Annu Symp Comput Appl Med Care. 1995:397–401. [PMC free article: PMC2579122] [PubMed: 8563310]
Jamieson E, Roberts J, Browne G. The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Methods Information Med. 1995;34:371–7. [PubMed: 7476469]
Muse AG, Mikl J, Smith PF. Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. Stat Med. 1995 Mar 15–Apr 15;14(5–7):499–509. [PubMed: 7792444]
Doebbeling BN, Wyant DK, McCoy KD, et al. Linked insurance-tumor registry database for health services research. Med Care. 1999 Nov;37(11):1105–15. [PubMed: 10549613]
Grannis SJ, Overhage JM, Hui S, et al. Analysis of a probabilistic record linkage technique without human review. AMIA 2003 Symposium Proc. 2003:259–63. [PMC free article: PMC1479910] [PubMed: 14728174]
Weiner M, Stump TE, Callahan CM, et al. A practical method of linking data from Medicare claims and a comprehensive electronic medical records system. Int J Med Informat. 2003;71:57–69. [PubMed: 12909159]
Bradley CJ, Given CW, Luo Z, et al. Medicaid, Medicare, and the Michigan Tumor Registry: a linkage strategy. Med Decis Making. 2007 Jul–Aug;27(4):352–63. [PubMed: 17641138]
Jacobs JP, Edwards FH, Shahian DM, et al. Successful linking of the Society of Thoracic Surgeons adult cardiac surgery database to Centers for Medicare and Medicaid Services Medicare data. Ann Thoracic Surg. 2010;90:1150–7. [PubMed: 20868806]
Nadpara PA, Madhavan SS. Linking Medicare, Medicaid, and cancer registry data to study the burden of cancers in West Virginia. Medicare Medicaid Res Rev. 2012;2(4) [PMC free article: PMC4006474] [PubMed: 24800152]
Li B, Quan H, Fong A, et al. Assessing record linkage between health care and vital statistics databases using deterministic methods. BMC Health Serv Res. 2006;6(1):1–10. [PMC free article: PMC1534029] [PubMed: 16597337]
Potosky AL, Riley GF, Lubitz JD, et al. Potential for cancer related health services research using a linked Medicare-tumor registry database. Med Care. 1993;31(8):732–48. [PubMed: 8336512]
Warren JL, Klabunde CN, Schrag D, et al. Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population. Med Care. 2002 Aug;40(8 Suppl):IV-3–18. [PubMed: 12187163]
National Cancer Institute. SEER-Medicare Program. Search SEER-Medicare Publications. 2011. [Accessed March 4, 2011]. http:​//healthservices​​/overview/publications.html.
Warren J, Carpenter WR. Email: Details on SEER-Medicare linkage methods. Aug 20, 2010. (Report Author)
Quantin C, Bouzelat H, Allaert FAA, et al. How to ensure data security of an epidemiological follow up: quality assessment of an anonymous record linkage procedure. Int J Med Informat. 1998;49:117–22. [PubMed: 9723810]
Quantin C, Allaert FA, Avillach P, et al. Building application-related patient identifiers: what solution for a European country? Int J Telemed Applications. 2008:1–5. [PMC free article: PMC2288643] [PubMed: 18401447]
Christen P, Goiser K. Quality and complexity measures for data linkage and deduplication. Stud Computational Intelligence. 2007;43:127–51.
Hernandez MA, Stolfo SJ. The merge/purge for large databases. Proceedings of the SIGMOD 95 Conference. 1995:127–38.
Belin TR, Rubin DB. A method for calibrating false matches in record linkage. J Am Stat Assoc. 1995;90:694–707.
Dey D, Sarkar S, De P. Entity matching in heterogenous databases: a distance based decision model; Proceedings of the 31st Hawaii International Conference on System Sciences; 1998.
Cochinwala M, Kurien V, Lalk G, et al. Efficient Data Reconciliation. Bellcore. 1998
Verykios VS, Moustakides GV. A cost optimal decision model for record matching. Workshop on Data Quality: Challenges for Computer Science and Statistics. 2001
Van Rijsbergen CJ. Information Retrieval: Data Structures and Algorithms. London: Butterworths; 1979.


Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...