Format

Send to

Choose Destination
Bioinformatics. 2017 Nov 15;33(22):3511-3517. doi: 10.1093/bioinformatics/btx482.

Reference genome assessment from a population scale perspective: an accurate profile of variability and noise.

Author information

1
Computational Genomics, Principe Felipe Research Centre, Valencia 46012.
2
Estadística e investigación Operativa, Universitat de València, Burjassot 46100.
3
Clinical Bioinformatics Area, Fundación Progreso y Salud, Hospital Virgen del Rocio, Sevilla 46100.
4
Functional Genomics Node (INB), Fundación Progreso y Salud, Hospital Virgen del Rocio, Sevilla 46100.
5
Bioinformatics in Rare Diseases (BiER), Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Fundación Progreso y Salud, Hospital Virgen del Rocio, Sevilla 46100, Spain.

Abstract

Motivation:

Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome.

Results:

The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples.

Availability and implementation:

This tool is freely available at http://gitlab.com/carbonell/ces.

Contact:

jcarbonell.cipf@gmail.com or joaquin.dopazo@juntadeandalucia.es.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
28961772
PMCID:
PMC5870781
DOI:
10.1093/bioinformatics/btx482
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center