Format

Send to

Choose Destination
Nat Commun. 2019 Apr 9;10(1):1649. doi: 10.1038/s41467-019-09639-3.

A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies.

Author information

1
Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
2
Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, 36849, USA.
3
Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA.
4
School of Medicine, Tsinghua University, Beijing, 100084, China.
5
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA.
6
Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA.
7
Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
8
School of Medicine, Tulane University, New Orleans, LA, 70112, USA.
9
Tumor Microenvironment Center, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA.
10
Cancer Immunology and Immunotherapy Program, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA.
11
Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
12
Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA. yingding@pitt.edu.
13
Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA. hum@ccf.org.
14
Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA. wei.chen@chp.edu.
15
Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA. wei.chen@chp.edu.

Abstract

The recently developed droplet-based single-cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we develop a Bayesian mixture model for single-cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. BAMM-SC takes raw count data as input and accounts for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from extensive simulation studies and applications of BAMM-SC to in-house experimental scRNA-seq datasets using blood, lung and skin cells from humans or mice demonstrate that BAMM-SC outperformed existing clustering methods with considerable improved clustering accuracy, particularly in the presence of heterogeneity among individuals.

PMID:
30967541
PMCID:
PMC6456731
DOI:
10.1038/s41467-019-09639-3
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center