Format

Send to

Choose Destination
BMC Bioinformatics. 2015 Feb 25;16:63. doi: 10.1186/s12859-015-0478-3.

Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat.

Author information

1
Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA. CKStein@uams.edu.
2
Cancer Research and Biostatistics, Seattle, WA, USA. pingping@crab.org.
3
Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA. EpsteinJoshua@uams.edu.
4
Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA. AFBuros@uams.edu.
5
Cancer Research and Biostatistics, Seattle, WA, USA. adamr@crab.org.
6
Cancer Research and Biostatistics, Seattle, WA, USA. johnc@crab.org.
7
Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA. GJMorgan@uams.edu.
8
Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR, USA. BarlogieBart@uams.edu.

Abstract

BACKGROUND:

Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects are often unavoidable in long-term studies and meta-analysis. In order to reduce the impact of batch effects on microarray data, Johnson, Rabinovic, and Li developed ComBat for use when combining batches of gene expression microarray data. We propose a modification to ComBat that centers data to the location and scale of a pre-determined, 'gold-standard' batch. This modified ComBat (M-Combat) is designed specifically in the context of meta-analysis and batch effect adjustment for use with predictive models that are validated and fixed on historical data from a 'gold-standard' batch.

RESULTS:

We combined data from MIRT across two batches ('Old' and 'New' Kit sample preparation) as well as external data sets from the HOVON-65/GMMG-HD4 and MRC-IX trials into a combined set, first without transformation and then with both ComBat and M-ComBat transformations. Fixed and validated gene risk signatures developed at MIRT on the Old Kit standard (GEP5, GEP70, and GEP80 risk scores) were compared across these combined data sets. Both ComBat and M-ComBat eliminated all of the differences among probes caused by systematic batch effects (over 98% of all untransformed probes were significantly different by ANOVA with 0.01 q-value threshold reduced to zero significant probes with ComBat and M-ComBat). The agreement in mean and distribution of risk scores, as well as the proportion of high-risk subjects identified, coincided with the 'gold-standard' batch more with M-ComBat than with ComBat. The performance of risk scores improved overall using either ComBat or M-Combat; however, using M-ComBat and the original, optimal risk cutoffs allowed for greater ability in our study to identify smaller cohorts of high-risk subjects.

CONCLUSION:

M-ComBat is a practical modification to an accepted method that offers greater power to control the location and scale of batch-effect adjusted data. M-ComBat allows for historical models to function as intended on future samples despite known, often unavoidable systematic changes to gene expression data.

PMID:
25887219
PMCID:
PMC4355992
DOI:
10.1186/s12859-015-0478-3
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center