Format

Send to

Choose Destination
Bioinformatics. 2013 Feb 15;29(4):444-50. doi: 10.1093/bioinformatics/btt001. Epub 2013 Jan 7.

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data.

Author information

1
Regenerative Medicine Program, Ottawa Hospital Research Institute, K1H 8L6, Ottawa, Canada.

Abstract

MOTIVATION:

Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of its impact on the accuracy of the enriched regions identified by peak-calling algorithms. Although many peak-calling algorithms include a fragment-length estimation subroutine, the problem has not been adequately solved, as demonstrated by the variability of the estimates returned by different algorithms.

RESULTS:

In this article, we investigate the use of strand cross-correlation to estimate mean fragment length of single-end data and show that traditional estimation approaches have mixed reliability. We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates. We propose a new approach, called mappability-sensitive cross-correlation (MaSC), which removes this bias and allows for accurate and reliable fragment-length estimation. We analyze the computational complexity of this approach, and evaluate its performance on a test suite of NGS datasets, demonstrating its superiority to traditional cross-correlation analysis.

AVAILABILITY:

An open-source Perl implementation of our approach is available at http://www.perkinslab.ca/Software.html.

PMID:
23300135
PMCID:
PMC3570216
DOI:
10.1093/bioinformatics/btt001
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center