Send to

Choose Destination
Nucleic Acids Res. 2019 Jan 8;47(D1):D39-D45. doi: 10.1093/nar/gky969.

CAGm: a repository of germline microsatellite variations in the 1000 genomes project.

Author information

Edward Via College of Osteopathic Medicine, 2265 Kraft Drive, Blacksburg, VA 24060, USA.
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.
Arthritis and Clinical Immunology Research Program, Division of Genomics and Data Sciences Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA.
Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA.
One Health Research Center, Virginia-Maryland College of Veterinary Medicine, 1410 Prices Fork Rd, Blacksburg, VA 24060, USA.
Institute of Evolution,University of Haifa, Abba Khoushy Ave 199, Haifa, 3498838, Israel.
Gibbs Cancer Center & Research Institute, 101 E Wood St., Spartanburg, SC 29303, USA.


The human genome harbors an abundance of repetitive DNA; however, its function continues to be debated. Microsatellites-a class of short tandem repeat-are established as an important source of genetic variation. Array length variants are common among microsatellites and affect gene expression; but, efforts to understand the role and diversity of microsatellite variation has been hampered by several challenges. Without adequate depth, both long-read and short-read sequencing may not detect the variants present in a sample; additionally, large sample sizes are needed to reveal the degree of population-level polymorphism. To address these challenges we present the Comparative Analysis of Germline Microsatellites (CAGm): a database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: a mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center