Format

Send to

Choose Destination
Am J Hum Genet. 2016 May 5;98(5):919-933. doi: 10.1016/j.ajhg.2016.04.001. Epub 2016 Apr 25.

Population-Scale Sequencing Data Enable Precise Estimates of Y-STR Mutation Rates.

Author information

1
New York Genome Center, New York, NY 10013, USA; Computational and Systems Biology Program, MIT, Cambridge, MA 02139, USA; Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02139, USA.
2
New York Genome Center, New York, NY 10013, USA; Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02139, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA.
3
Program in Biomedical Informatics, Stanford University, Stanford, CA 94305, USA; Department of Genetics, Stanford University, Stanford, CA 94305, USA.
4
Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.
5
New York Genome Center, New York, NY 10013, USA; Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02139, USA; Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA; Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA. Electronic address: yaniv@cs.columbia.edu.

Abstract

Short tandem repeats (STRs) are mutation-prone loci that span nearly 1% of the human genome. Previous studies have estimated the mutation rates of highly polymorphic STRs by using capillary electrophoresis and pedigree-based designs. Although this work has provided insights into the mutational dynamics of highly mutable STRs, the mutation rates of most others remain unknown. Here, we harnessed whole-genome sequencing data to estimate the mutation rates of Y chromosome STRs (Y-STRs) with 2-6 bp repeat units that are accessible to Illumina sequencing. We genotyped 4,500 Y-STRs by using data from the 1000 Genomes Project and the Simons Genome Diversity Project. Next, we developed MUTEA, an algorithm that infers STR mutation rates from population-scale data by using a high-resolution SNP-based phylogeny. After extensive intrinsic and extrinsic validations, we harnessed MUTEA to derive mutation-rate estimates for 702 polymorphic STRs by tracing each locus over 222,000 meioses, resulting in the largest collection of Y-STR mutation rates to date. Using our estimates, we identified determinants of STR mutation rates and built a model to predict rates for STRs across the genome. These predictions indicate that the load of de novo STR mutations is at least 75 mutations per generation, rivaling the load of all other known variant types. Finally, we identified Y-STRs with potential applications in forensics and genetic genealogy, assessed the ability to differentiate between the Y chromosomes of father-son pairs, and imputed Y-STR genotypes.

PMID:
27126583
PMCID:
PMC4863667
DOI:
10.1016/j.ajhg.2016.04.001
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center