Format

Send to

Choose Destination
Genome Biol. 2019 Mar 19;20(1):58. doi: 10.1186/s13059-019-1667-6.

Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads.

Author information

1
Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan. satomits@yokohama-cu.ac.jp.
2
Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan. mcfrith@edu.k.u-tokyo.ac.jp.
3
Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan. mcfrith@edu.k.u-tokyo.ac.jp.
4
Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Shinjuku-ku, Tokyo, Japan. mcfrith@edu.k.u-tokyo.ac.jp.
5
Department of Human Genetics, Yokohama City University Graduate School of Medicine, Fukuura 3-9, Kanazawa-ku, Yokohama, 236-0004, Japan.
6
Department of Neurology, University of Occupational and Environmental Health School of Medicine, Kitakyushu, Fukuoka, Japan.
7
Department of Liberal Arts, Faculty of Medicine, Saitama Medical University, Iruma, Saitama, Japan.
8
Department of Bioinformatics and Molecular Neuropathology, Meiji Pharmaceutical University, Kiyose, Tokyo, Japan.
9
Department of Applied Biochemistry, School of Engineering, Tokai University, Hiratsuka, Kanagawa, Japan.

Abstract

Tandemly repeated DNA is highly mutable and causes at least 31 diseases, but it is hard to detect pathogenic repeat expansions genome-wide. Here, we report robust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome. Our method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data. This may help to elucidate the many genetic diseases whose causes remain unknown.

KEYWORDS:

Long-read sequencing; Nanopore; PacBio; Repeat diseases; Tandem repeat

PMID:
30890163
PMCID:
PMC6425644
DOI:
10.1186/s13059-019-1667-6
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center