Format

Send to

Choose Destination
Stat Appl Genet Mol Biol. 2017 Sep 26;16(4):217-242. doi: 10.1515/sagmb-2016-0072.

FC1000: normalized gene expression changes of systematically perturbed human cells.

Abstract

The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics - termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).

KEYWORDS:

gene expression; normalization; p-value inflation; remove unwanted variation

PMID:
28862994
DOI:
10.1515/sagmb-2016-0072
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Sheridan PubFactory
Loading ...
Support Center