MACGT: multi-dimensional automated clustering genotyping tool for analysis of microarray-based mini-sequencing data

Bioinformatics. 2006 May 1;22(9):1147-9. doi: 10.1093/bioinformatics/btl080. Epub 2006 Mar 7.

Abstract

Multi-dimensional Automated Clustering Genotyping Tool (MACGT) is a Java application that clusters complex multi-dimensional vector data derived from single nucleotide polymorphism (SNP) genotyping experiments using mini-sequencing based microarray chemistries such as arrayed primer extension (APEX). Spot intensity output files from microarray experiments across multiple samples are imported into MACGT. The datasets can include four channels of intensity data for each spot, replica spots for each SNP probe and multiple probe types (APEX and allele-specific APEX probes) on both DNA strands for each SNP. MACGT automatically clusters these multi-dimensionality datasets for each SNP across multiple samples. Incorporation of additional array datasets from known samples that have previously validated SNP genotype calls allows unknown samples to be automatically assigned a genotype based on the clustering, along with numerical measures of confidence for each genotype call. Calling accuracy by MACGT exceeds 98% when applied to genotyping data from APEX microarrays, and can be increased to >99.5% by applying thresholds to the confidence measures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Base Sequence
  • Cluster Analysis*
  • Genotype
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • Software*