Format

Send to

Choose Destination
Bioinformatics. 2013 Jul 1;29(13):1704-5. doi: 10.1093/bioinformatics/btt261. Epub 2013 May 6.

pyGenClean: efficient tool for genetic data clean up before association testing.

Author information

1
Montreal Heart Institute Research Center, Beaulieu-Saucier Université de Montréal Pharmacogenomics Centre, 5000 Bélanger Street, Montréal, Canada. louis-philippe.lemieux.perreault@umontreal.ca

Abstract

Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis.

AVAILABILITY AND IMPLEMENTATION:

pyGenClean is an open source Python 2.7 software and is freely available, along with documentation and examples, from http://www.statgen.org.

PMID:
23652425
PMCID:
PMC3694635
DOI:
10.1093/bioinformatics/btt261
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center