Format

Send to

Choose Destination
Eur J Hum Genet. 2016 Apr;24(4):521-8. doi: 10.1038/ejhg.2015.165. Epub 2015 Aug 26.

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Author information

1
Department of Medical Epidemiology and Biostatistics, Swedish e-Science Research Centre, Karolinska Institutet, Stockholm, Sweden.
2
Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
3
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.
4
Uniquer Sarl, rue de la Mercerie, Lausanne, Switzerland.
5
Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Biomedicum Helsinki 2U, Helsinki, Finland.
6
Institute of Epidemiology II, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany.
7
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany.
8
Department of Public Health and General Practice, HUNT Research Centre, Norwegian University of Science and Technology, Levanger, Norway.
9
Department of Clinical Sciences, Diabetes and Endocrinology, Lund University, Lund, Sweden.
10
Lund University Diabetes Centre, CRC at Skåne University Hospital, Malmö, Sweden.
11
Estonian Genome Center, University of Tartu, Tartu, Estonia.
12
Institute of Genetic Epidemiology, Helmholtz Zentrum München, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg.
13
McGill University Health Centre, Montreal, Quebec, Canada.
14
Department of Biological Psychology, FGB, VU University, Amsterdam, The Netherlands.
15
Latvian Genome Data Base (LGDB), Latvian Biomedical Research and Study Centre, Ratsupites 1 k-1, Riga, Latvia.
16
BBMRI-ERIC, Neue Stiftingtalstrasse 2/B/6, Graz, Austria.
17
National Institute for Health and Welfare, Helsinki, Finland.
18
University of Jyvaskyla, Jyväskylä, Finland.
19
Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.
20
Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Headington, Oxford, UK.
21
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
22
Oxford NIHR Biomedical Research Centre, Churchill Hospital, Headington, Oxford, UK.
23
Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands.
24
Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
25
Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
26
Department of Genomics of Common Disease, School of Public Health, Imperial College London, London, UK.
27
Division of Epidemiology, Department of Genes and Environment, The Norwegian Institute of Public Health, Oslo, Norway.

Abstract

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.

PMID:
26306643
PMCID:
PMC4929882
DOI:
10.1038/ejhg.2015.165
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center