Format

Send to

Choose Destination
Proc Natl Acad Sci U S A. 2016 Apr 19;113(16):4290-5. doi: 10.1073/pnas.1521171113. Epub 2016 Apr 6.

Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks.

Author information

1
Department of Statistics, University of California, Berkeley, CA 94720; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720;
2
Department of Statistics, University of California, Berkeley, CA 94720; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; Walmart Labs, San Bruno, CA 94066;
3
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720;
4
Department of Statistics, University of California, Berkeley, CA 94720; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720 binyu@stat.berkeley.edu erwin@fruitfly.org.
5
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720; binyu@stat.berkeley.edu erwin@fruitfly.org.

Abstract

Spatial gene expression patterns enable the detection of local covariability and are extremely useful for identifying local gene interactions during normal development. The abundance of spatial expression data in recent years has led to the modeling and analysis of regulatory networks. The inherent complexity of such data makes it a challenge to extract biological information. We developed staNMF, a method that combines a scalable implementation of nonnegative matrix factorization (NMF) with a new stability-driven model selection criterion. When applied to a set ofDrosophilaearly embryonic spatial gene expression images, one of the largest datasets of its kind, staNMF identified 21 principal patterns (PP). Providing a compact yet biologically interpretable representation ofDrosophilaexpression patterns, PP are comparable to a fate map generated experimentally by laser ablation and show exceptional promise as a data-driven alternative to manual annotations. Our analysis mapped genes to cell-fate programs and assigned putative biological roles to uncharacterized genes. Finally, we used the PP to generate local transcription factor regulatory networks. Spatially local correlation networks were constructed for six PP that span along the embryonic anterior-posterior axis. Using a two-tail 5% cutoff on correlation, we reproduced 10 of the 11 links in the well-studied gap gene network. The performance of PP with theDrosophiladata suggests that staNMF provides informative decompositions and constitutes a useful computational lens through which to extract biological insight from complex and often noisy gene expression data.

KEYWORDS:

principal patterns; sparse decomposition; spatial gene expression; spatially local networks; stability selection

PMID:
27071099
PMCID:
PMC4843452
DOI:
10.1073/pnas.1521171113
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center