Format

Send to

Choose Destination
BMC Bioinformatics. 2016 Jan 8;17:24. doi: 10.1186/s12859-015-0865-9.

Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts.

Author information

1
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA. joerg.hakenberg@gmail.com.
2
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA. ninpy.weiyi@gmail.com.
3
Current affiliation: Illumina, Inc., 451 El Camino Real, Suite 210, Santa Clara, 95050, USA. ninpy.weiyi@gmail.com.
4
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA. thomas@informatik.hu-berlin.de.
5
Current affiliation: Roche Parma Research and Early Development, Informatics, Roche Innovation Center New York, 430 East 29th St, New York, 10016, USA. thomas@informatik.hu-berlin.de.
6
Department of Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany. ying-chih.wang@mssm.edu.
7
Current affiliation: German Research Centre for Artificial Intelligence (DFKI), Alt Moabit 91c, Berlin, 10559, Germany. ying-chih.wang@mssm.edu.
8
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA. andrew.uzilov@mssm.edu.
9
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, Box 1498, New York, 10029, USA. rong.chen@mssm.edu.

Abstract

BACKGROUND:

Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms.

DESCRIPTION:

We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples.

CONCLUSIONS:

RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization.

AVAILABILITY:

A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.

PMID:
26746786
PMCID:
PMC4706706
DOI:
10.1186/s12859-015-0865-9
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center