Format

Send to

Choose Destination
Bioinformatics. 2014 Jul 1;30(13):1867-75. doi: 10.1093/bioinformatics/btu134. Epub 2014 Mar 10.

Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data.

Author information

1
Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany.
2
Institute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, GermanyInstitute of Computational Biology, Helmholtz-Zentrum München, 85764 Neuherberg, Germany, Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research and Wellcome Trust & MRC Cambridge Stem Cell Institute, Cambridge CB2 0XY, UK and Department of Mathematics, TU München, 85748 Garching, Germany.

Abstract

MOTIVATION:

High-throughput single-cell quantitative real-time polymerase chain reaction (qPCR) is a promising technique allowing for new insights in complex cellular processes. However, the PCR reaction can be detected only up to a certain detection limit, whereas failed reactions could be due to low or absent expression, and the true expression level is unknown. Because this censoring can occur for high proportions of the data, it is one of the main challenges when dealing with single-cell qPCR data. Principal component analysis (PCA) is an important tool for visualizing the structure of high-dimensional data as well as for identifying subpopulations of cells. However, to date it is not clear how to perform a PCA of censored data. We present a probabilistic approach that accounts for the censoring and evaluate it for two typical datasets containing single-cell qPCR data.

RESULTS:

We use the Gaussian process latent variable model framework to account for censoring by introducing an appropriate noise model and allowing a different kernel for each dimension. We evaluate this new approach for two typical qPCR datasets (of mouse embryonic stem cells and blood stem/progenitor cells, respectively) by performing linear and non-linear probabilistic PCA. Taking the censoring into account results in a 2D representation of the data, which better reflects its known structure: in both datasets, our new approach results in a better separation of known cell types and is able to reveal subpopulations in one dataset that could not be resolved using standard PCA.

AVAILABILITY AND IMPLEMENTATION:

The implementation was based on the existing Gaussian process latent variable model toolbox (https://github.com/SheffieldML/GPmat); extensions for noise models and kernels accounting for censoring are available at http://icb.helmholtz-muenchen.de/censgplvm.

PMID:
24618470
PMCID:
PMC4071202
DOI:
10.1093/bioinformatics/btu134
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center