TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses

Whole-transcriptome spatial profiling of genes at single-cell resolution remains a challenge. To address this limitation, spatial gene expression prediction methods have been developed to infer the spatial expression of unmeasured transcripts, but the quality of these predictions can vary greatly. Here we present TISSUE (Transcript Imputation with Spatial Single-cell Uncertainty Estimation) as a general framework for estimating uncertainty for spatial gene expression predictions and providing uncertainty-aware methods for downstream inference. Across eleven benchmark datasets, TISSUE provides well-calibrated prediction intervals for predicted expression values. Moreover it consistently reduces false discovery rates for differential gene expression analysis, improves clustering and visualization of predicted spatial transcriptomics, and improves the performance of supervised learning models trained on predicted gene expression profiles. Applying TISSUE to a MERFISH spatial transcriptomics dataset of the adult mouse subventricular zone, we identified subtypes within the neural stem cell lineage and developed subtype-specific regional classifiers. TISSUE is publicly available as a flexible wrapper method for existing spatial gene expression prediction methods to assist researchers with implementing uncertainty-aware analyses of spatial transcriptomics data.

: Evidence of gene expression similarity between spatial neighbors. (A) Cosine similarity of gene expression profiles for 250 cells paired with all their neighbors in the TISSUE spatial graph compared to pairings with randomly drawn cells across all eleven spatial transcriptomics datasets. The boxplot corresponds to the quartiles of the cosine similarity measurements. The center line corresponds to median cosine similarity, which was strictly higher in the neighbor-paired comparisons than the random-paired comparisons across all datasets. Whiskers span up to 1.5 times the interquartile range of the metrics and values outside this range are shown as dots. (B) Scatter plots of the cosine similarities of gene expression profiles for 250 cells paired with their neighbors for either the training gene set or the test gene set determined by random train-test split of all genes (50% train, 50% test). Shown are cosine similarity pairs for 10 train-test splits for the two datasets with the most genes. Figure S3: Cell-centric variability and calibration score distributions for individual datasets and prediction methods. (A) Correlation of cell-centric variability and absolute prediction error shown individually for each dataset and prediction method combination computed over 10-fold cross-validation. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. (B) Pearson correlation of all cell-centric variability measures obtained for different numbers of neighbors in building the TISSUE spatial graph compared to the default setting of 15 neighbors. Shown are results across six representative datasets. (C) Histograms showing the distribution of Pearson correlations between either gene-wise or cell-wise similarities of prediction errors and similarities of predicted expression values across all spatial transcriptomic datasets. (D) Distribution of TISSUE calibration scores shown individually for each dataset and prediction method combination ((k g , k c ) = (4, 1)). Details on each dataset and prediction method can be found in Methods.

54
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; Figure S4: Further evaluation of TISSUE prediction intervals. (A) Gene-level calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation. Each line corresponds to an independent gene in the spatial transcriptomics dataset. (B-C) Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation (B) under automated setting of (k g , k c ) for stratified grouping; and (C) for two technical replicates of the mouse gastrulation seqFISH dataset with (k g , k c ) = (4, 1). The calibration error is annotated for each prediction method (see Methods). (D-F) Correlation plots across all dataset and prediction method combinations computed over 10-fold cross-validation for (D) the 67% prediction interval width and absolute prediction error, both normalized by the absolute value of the predicted expression; (E) 50% prediction interval width and absolute prediction error; (F) 80% prediction interval width and absolute prediction error. Log density with added pseudocount (Log1p) is shown by color, with a maximum of 1000 cells and 300 genes sampled from each dataset to provide more uniform representation. (G) Calibration curves for TISSUE prediction intervals showing empirical coverage as a function of the specified confidence level across 10-fold cross-validation for the mouse somatosensory cortex osmFISH dataset with different combinations of Sprod de-noising or Sprod-based spatial similarity graph instead of the TISSUE spatial neighbors graph. The calibration error is annotated for each prediction method (see Methods). (H) Correlation plot of 67% prediction interval width with TISSUE spatial neighbors graph with cosine similarity weighting and 67% prediction interval width with Sprod similarity graph and weighting for the mouse somatosensory cortex osmFISH dataset and all prediction methods computed over 10-fold crossvalidation. 55 56 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; Figure S6: Additional differential gene expression analysis with TISSUE. (A) False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the p-value threshold for significance and with (k g , k c ) = (4, 1) settings for stratified grouping.(B) False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with automated stratified grouping. (C) False discovery rate of differentially expressed genes between cell type or anatomic region labels (one versus all approach) as a function of the number of discoveries and with (k g , k c ) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the "greater than" one-sided, two-sample Wilcoxon/Mann-Whitney test. (D) False discovery rate of spatially variable genes as a function of the number of discoveries and with (k g , k c ) = (4, 1) settings for stratified grouping for the alternative TISSUE multiple imputation framework using the SpatialDE test. (E) Correlation plot of the log p-values obtained from the TISSUE multiple imputation t-test framework between two technical replicates of the mouse gastrulation seqFISH dataset.

57
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. Figure S7: Uncertainty-aware clustering and label separation with TISSUE-WPCA. (A) Schematic illustration of the weighted principal component analysis (WPCA) pipeline where the inverse TISSUE prediction interval width is used to obtain principal components from WPCA, which are then used for downstream tasks of clustering and label separation. (B) Linear separability measured as the binary classification accuracy of a linear kernel support vector classifier fitted on the two cell clusters in the simulated spatial transcriptomics data as a function of the simulated mix-in proportion. The classifier was trained on the top 15 principal components obtained from the measured gene expression profiles with PCA, predicted gene expression profiles with PCA, and predicted gene expression profiles with TISSUE-WPCA. For TISSUE-WPCA, weights were determined by binarizing the inverse normalized 67% prediction interval width (see Methods). Results were obtained using automated stratified grouping. Bands represent the interquartile range and solid line denotes the median linear separability across 20 simulated datasets. (C) Same as in panel B except with TISSUE-WPCA weighting using the log-transformed inverse normalized 67% prediction interval width. (D) Adjusted rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from PCA on the predicted expression or TISSUE-WPCA on the predicted gene expression for six real spatial transcriptomics dataset and labe pairings and all prediction methods. P-value was computed using a paired two-sample t-test. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics.

58
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; Figure S8: Additional experiments for uncertainty-aware supervised learning, clustering, and visualization. (A-C) Downstream task performance metrics for the three most prominent anatomic region class labels for the mouse somatosensory osmFISH dataset. Shown are metrics for all three prediction methods with automated stratified grouping settings. P-value was computed using a paired two-sample t-test. The box corresponds to quartiles of the metrics and the whiskers span up to 1.5 times the interquartile range of the metrics. (A) Accuracy, F1 score, and ROC-AUC (receiver-operator characteristic area under the curve) metrics for logistic regression models trained on the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (B) Adjusted rand index (ARI) for k-means clustering (k = 3) on the top 15 principal components obtained from the predicted gene expression, TISSUEfiltered predicted gene expression, or measured gene expression for classification. (C) Linear separability measured as classification accuracy of linear kernel support vector classifier fitted on the top 15 principal components obtained from the predicted gene expression, TISSUE-filtered predicted gene expression, or measured gene expression for classification. (D) Log2 fold change in improvement of performance metrics using TISSUE-filtered PCA in lieu of regular PCA on predicted expression for supervised learning (logistic regression models trained to predict class labels; Accuracy, F1, ROC-AUC), clustering (k-means clustering on top 15 principal components; Adjusted Rand Index (ARI)), and visualization (linear separability measured as the classification accuracy of a linear kernel support vector classifier fitted on the top 15 principal components) for the top three classes across all dataset and class label combinations. Results were obtained using the 50% prediction interval width for filtering. (E) Same as panel E except with the 80% prediction interval width for filtering.

59
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; Figure S9: TISSUE is necessary to identify ambiguous NSC lineage subtype. (A) Heatmap of the scaled log-normalized gene expression of original cell type markers in the adult mouse subventricular zone MER-FISH dataset for each of the identified cell type clusters. The Ambiguous cell type cluster in the first row exhibits high expression of qNSC/astrocyte, aNSC/NPC, and neuroblast markers. (B) Additional predicted marker genes for the second ambiguous subcluster are differentially expressed for all qNSC/astrocyte and aNSC/NPC markers under traditional hypothesis testing with two-sample t-test on the predicted gene expression (Predicted). With TISSUE multiple imputation t-test, there are substantially more aNSC/NPC markers that are differentially over-expressed in the ambiguous subcluster (TISSUE), permitting identification of this subcluster as an aNSC/NPC subtype cluster. P-values are shown for all predicted marker genes with significance threshold of Bonferroni-adjusted p < 0.1. (C) Relative proportion of each of the three TISSUE-identified subtypes in the neural stem cell lineage cluster for either the left or right lateral ventricle. (D) Relative proportions of aNSC/NPC and neuroblast populations across the MERFISH dataset and three single-cell RNAseq datasets of the mouse subventricular zone. The qNSC/astrocyte proportions were not compared since they were aggregated with astrocytes of the striatum in the single-cell RNAseq datasets. (E) Spatial visualization of the cells in the neural stem cell lineage cluster colored by dorsal or ventral spatial location labels. (F) Dorsal versus ventral classification performance of TISSUE-filtered penalized logistic regression models and baseline unfiltered penalized logistic regression models evaluated using 10-fold crossvalidation across F1 score, accuracy, area under the receiver-operator curve, and average precision.

60
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; Figure S10: Computational runtime for TISSUE. (A) Bar plots of total runtimes for spatial gene expression prediction computations over 10 predictions to generate estimated predictions on all calibration genes. Error bars correspond to 95% confidence interval over an outer 10-fold cross-validation. (B) Bar plots of total runtimes for TISSUE prediction interval calculation including computation of cell-centric variability and calibration score sets. Error bars correspond to 95% confidence interval over an outer 10-fold cross-validation.

61
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 3, 2023. ; https://doi.org/10. 1101/2023