Format

Send to

Choose Destination
Cell Syst. 2019 May 22;8(5):380-394.e4. doi: 10.1016/j.cels.2019.04.003.

MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease.

Author information

1
Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA; Childhood Cancer Data Laboratory, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA.
2
National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA.
3
Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
4
Division of Nephrology, Department of Internal Medicine, Michigan Medicine, Ann Arbor, MI, USA.
5
Division of Nephrology, Department of Internal Medicine, Michigan Medicine, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, Michigan Medicine, Ann Arbor, MI, USA.
6
Division of Rheumatology and the Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
7
Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA; Childhood Cancer Data Laboratory, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA; Institute of Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA; Institute of Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA. Electronic address: greenescientist@gmail.com.

Abstract

Most gene expression datasets generated by individual researchers are too small to fully benefit from unsupervised machine-learning methods. In the case of rare diseases, there may be too few cases available, even when multiple studies are combined. To address this challenge, we utilize transfer learning to extract coordinated expression patterns and use learned patterns to analyze small rare disease datasets. We trained a pathway-level information extractor (PLIER) model on a large public data compendium comprising multiple experiments, tissues, and biological conditions and then transferred the model to small datasets in an approach we call MultiPLIER. Models constructed from the public data compendium included features that aligned well to known biological factors and were more comprehensive than those constructed from individual datasets or conditions. When transferred to rare disease datasets, the models describe biological processes related to disease severity more effectively than models trained only on a given dataset.

KEYWORDS:

genomics; machine learning; medulloblastoma; rare diseases; transcriptomics; transfer learning; unsupervised learning; vasculitis

PMID:
31121115
PMCID:
PMC6538307
[Available on 2020-05-22]
DOI:
10.1016/j.cels.2019.04.003
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center