Format

Send to

Choose Destination
Bioinformatics. 2014 Jun 15;30(12):1759-61. doi: 10.1093/bioinformatics/btu099. Epub 2014 Feb 14.

ADaCGH2: parallelized analysis of (big) CNA data.

Author information

1
Department of Biochemistry, Universidad Autónoma de Madrid, Instituto de Investigaciones Biomédicas 'Alberto Sols' (UAM-CSIC), 28029 Madrid, Spain.

Abstract

MOTIVATION:

Studies of genomic DNA copy number alteration can deal with datasets with several million probes and thousands of subjects. Analyzing these data with currently available software (e.g. as available from BioConductor) can be extremely slow and may not be feasible because of memory requirements.

RESULTS:

We have developed a BioConductor package, ADaCGH2, that parallelizes the main segmentation algorithms (using forking on multicore computers or parallelization via message passing interface, etc., in clusters of computers) and uses ff objects for reading and data storage. We show examples of data with 6 million probes per array; we can analyze data that would otherwise not fit in memory, and compared with the non-parallelized versions we can achieve speedups of 25-40 times on a 64-cores machine.

AVAILABILITY AND IMPLEMENTATION:

ADaCGH2 is an R package available from BioConductor. Version 2.3.11 or higher is available from the development branch: http://www.bioconductor.org/packages/devel/bioc/html/ADaCGH2.html.

PMID:
24532724
DOI:
10.1093/bioinformatics/btu099
[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for Universidad Autonoma de Madrid Biblos-e Archivo
Loading ...
Support Center