Format

Send to

Choose Destination
Bioinformatics. 2015 Nov 15;31(22):3569-76. doi: 10.1093/bioinformatics/btv435. Epub 2015 Jul 27.

Canonical, stable, general mapping using context schemes.

Author information

1
UC Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA 95064 and.
2
NYU School of Medicine, 550 First Avenue, New York, NY 10016, USA.

Abstract

MOTIVATION:

Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general.

RESULTS:

We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique best mapping, and define this criterion uniformly for all reference bases. Mappings under context schemes can also be made stable, so that extension of the query string (e.g. by increasing read length) will not alter the mapping of previously mapped positions. Context schemes are general in several senses. They natively support the detection of arbitrary complex, novel rearrangements relative to the reference. They can scale over orders of magnitude in query sequence length. Finally, they are trivially extensible to more complex reference structures, such as graphs, that incorporate additional variation. We demonstrate empirically the existence of high-performance context schemes, and present efficient context scheme mapping algorithms.

AVAILABILITY AND IMPLEMENTATION:

The software test framework created for this study is available from https://registry.hub.docker.com/u/adamnovak/sequence-graphs/.

CONTACT:

anovak@soe.ucsc.edu

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
26220960
PMCID:
PMC4757953
DOI:
10.1093/bioinformatics/btv435
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center