SpatialExperiment: infrastructure for spatially-resolved transcriptomics data in R using Bioconductor

Abstract Summary SpatialExperiment is a new data infrastructure for storing and accessing spatially-resolved transcriptomics data, implemented within the R/Bioconductor framework, which provides advantages of modularity, interoperability, standardized operations and comprehensive documentation. Here, we demonstrate the structure and user interface with examples from the 10x Genomics Visium and seqFISH platforms, and provide access to example datasets and visualization tools in the STexampleData, TENxVisiumData and ggspavis packages. Availability and implementation The SpatialExperiment, STexampleData, TENxVisiumData and ggspavis packages are available from Bioconductor. The package versions described in this manuscript are available in Bioconductor version 3.15 onwards. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Spatially-resolved transcriptomics (SRT) refers to a new set of highthroughput technologies, which measure up to transcriptome-wide gene expression along with the spatial coordinates of the measurements. Technological platforms differ in terms of the number of measured genes (from hundreds to full transcriptome) and spatial resolution (from multiple cells per coordinate to approximately single-cell to sub-cellular). Examples of SRT platforms include Spatial Transcriptomics (Ståhl et al., 2016), 10x Genomics Visium (10x Genomics, 2021a), Slide-seq (Rodriques et al., 2019), Slide-seqV2 (Stickels et al., 2020), sci-Space (Srivatsan et al., 2021), seqFISH (Lubeck et al., 2014;Shah et al., 2016), seqFISHþ (Eng et al., 2019), osmFISH (Codeluppi et al., 2018) and MERFISH (Chen et al., 2015;Moffitt et al., 2016;Xia et al., 2019). These can be classified into spot-based and molecule-based platforms. Spotbased platforms measure transcriptome-wide gene expression at a series of spatial coordinates (spots) on a tissue slide (Spatial Transcriptomics, 10x Genomics Visium, Slide-seq, Slide-seqV2 and sci-Space), while molecule-based platforms detect large sets of distinct individual messenger RNA (mRNA) molecules in situ at up to sub-cellular resolution (seqFISH, seqFISHþ, osmFISH and MERFISH). SRT platforms have been applied to investigate spatial patterns of gene expression in a variety of biological systems, including the human brain (Maynard et al., 2021), mouse brain (Ortiz et al., 2020), cancer (Berglund et al., 2018;Ji et al., 2020) and mouse embryogenesis (Lohoff et al., 2021;Srivatsan et al., 2021). By combining molecular and spatial information, these platforms promise to continue to generate new insights about biological processes that manifest with spatial specificity within tissues.
However, to effectively analyze these data, specialized and robust data infrastructures are required, to facilitate storage, retrieval, subsetting and interfacing with downstream tools. Here, we describe SpatialExperiment, a new data infrastructure developed within the R/Bioconductor framework, which extends the popular SingleCellExperiment (Amezquita et al., 2020) class for single-cell RNA sequencing (scRNA-seq) data to the spatial context, with observations taking place at the level of spots or molecules instead 3128 of cells. Several recent studies have reused or extended existing single-cell infrastructure to store additional spatial information (Lohoff et al., 2021;Maynard et al., 2021). In addition, several comprehensive analysis workflows have been developed using modified single-cell infrastructure adapted for spatial data, including Seurat (Hao et al., 2021), Giotto (Dries et al., 2021) and Squidpy (Palla et al., 2022). However, while each of these workflows enables powerful analyses, it remains difficult for users to combine elements in a modular way, since each workflow relies on a separate infrastructure. There does not yet exist a common, standardized infrastructure for storing and accessing SRT data in R, which would allow users to easily build workflows combining methods and software developed by different groups. A well-designed independent data infrastructure simplifies the work of various users, including developers of downstream analysis methods who can reuse the structure to store inputs and outputs, and analysts who can rely on the structure to connect packages from different developers into analysis pipelines. By working within the Bioconductor framework, we take advantage of long-standing Bioconductor principles of modularity, interoperability, continuous testing and comprehensive documentation (Amezquita et al., 2020;Huber et al., 2015). Furthermore, we can ensure compatibility with existing analysis packages designed for the SingleCellExperiment structure for singlecell data, providing a robust, flexible and user-friendly resource for the research community. In addition to the SpatialExperiment package, we provide the STexampleData and TENxVisiumData packages (example datasets) and ggspavis package (visualization tools) for use in examples, tutorials, demonstrations and teaching.

Results
The SpatialExperiment package provides access to the core data infrastructure (referred to as a class), as well as functions to create, modify and access instances of the class (objects). Objects contain the following components adapted from the existing SingleCellExperiment class: (i) assays, tables of measurement values such as raw and transformed transcript counts (note that within the Bioconductor framework, rows usually correspond to features and columns to observations); (ii) rowData, additional information (metadata) describing the features (e.g. gene IDs and names); (iii) colData, metadata describing the observations (e.g. barcode IDs or cell IDs); and (iv) reducedDims, reduced dimension representations (e.g. principal component analysis) of the measurements. SpatialExperiment objects also contain the following further components to store spatial information: (v) additional metadata stored in colData describing spatial characteristics of the spatial coordinates (spots) or cells (e.g. indicators for whether spots are located within the region overlapping with tissue); (vi) spatialCoords, spatial coordinates associated with each observation (e.g. x and y coordinates on the tissue slide); and, (vii) imgData, image files (e.g. histology images) and information related to the images (e.g. resolution in pixels) (Fig. 1).
Accessor and replacement functions allow each of these components to be extracted or modified. Since SpatialExperiment extends SingleCellExperiment, methods developed for single-cell analyses (Amezquita et al., 2020) [e.g. preprocessing and normalization methods from scater (McCarthy et al., 2017), downstream methods from scran (Lun et al., 2016) and visualization tools from iSEE (Rue-Albrecht et al., 2018)] can be applied to SpatialExperiment objects, treating spots as single cells. Spatial coordinates are stored in spatialCoords as a numeric matrix, allowing these to be provided easily to downstream spatial analysis packages in R outside Bioconductor [e.g. packages from geostatistics such as sp (Pebesma & Bivand, 2005) and sf (Pebesma, 2018)], consistent with reducedDims in SingleCellExperiment. For spot-based data, assays contains a table named counts containing the gene counts, while for molecule-based data, assays may contain two tables named counts and molecules containing total gene counts per cell as well as molecule-level information such as spatial coordinates per molecule [formatted as a BumpyMatrix ]. For datasets that are too large to store in-memory, SpatialExperiment can reuse existing Bioconductor infrastructure for sparse matrices and on-disk data representations through the DelayedArray framework (Pagès et al., 2021). Image information is stored in imgData as a table containing sample IDs, image IDs, any other information such as scaling factors, and the underlying image data. The image data can be stored as either a fully realized in-memory object (for small images), a path to a local file that is loaded into memory on demand (for large images) or a URL to a remotely hosted image that is retrieved on demand. SpatialExperiment objects can be created with a general constructor function, SpatialExperiment() or alternatively with a dedicated constructor function for the 10x Genomics Visium platform, read10xVisium(), which creates an object from the raw input files from the 10x Genomics Visium Space Ranger software (10x Genomics, 2020). For Visium data, colData includes the columns in_tissue, array_row and array_col. Measurements from multiple biological samples can be stored within a single object, and linked across the components by providing unique sample IDs. In addition, we provide the associated data packages STexampleData (example datasets from several platforms) and TENxVisiumData (publicly available Visium datasets provided by 10x Genomics), and the ggspavis package providing visualization functions designed for SpatialExperiment objects (Supplementary Fig. S1 and Supplementary Tables S1 and S2).

Discussion
Standardized data infrastructure for scRNA-seq data [e.g. SingleCellExperiment (Amezquita et al., 2020) within the R/ Bioconductor framework] has greatly streamlined the work of data analysts and downstream method developers. For example, relying on common formats for inputs and outputs from individual packages allows users to connect packages into complete analysis pipelines, and operations such as subsetting by row (gene) or column (barcode or cell) across the entire object helps avoid errors. For single-cell data, this has enabled the development of comprehensive workflows and tutorials (Amezquita et al., 2020;, which are an invaluable resource for new users. Here, we provide a new data infrastructure for SRT data, extending the existing SingleCellExperiment class within the Bioconductor framework. In addition, we provide associated packages containing example datasets (STexampleData and TENxVisiumData) and visualization functions (ggspavis), for use in examples, tutorials, demonstrations and teaching.
Existing alternative infrastructure for SRT data includes object classes provided in the Seurat (Hao et al., 2021) and Giotto (Dries et al., 2021) packages in R, and Squidpy/AnnData (Palla et al., 2022;Virshup et al., 2021) in Python, which provide similar underlying functionality such as storing annotated tables of measurement values and related spatial and image information. Compared to these alternatives, a key advantage of SpatialExperiment is that it has been developed independently of any individual analysis workflow and is compatible with any downstream analysis packages that use the SpatialExperiment or SingleCellExperiment class within Bioconductor. This allows analysts to easily build customized, modular workflows consisting of packages developed by various research groups, including the latest methods (which may not yet have been integrated into published workflows) as well as any of the wide variety of methods for single-cell data that have been released through Bioconductor.
SRT technologies are still in their infancy, and the coming years are likely to see ongoing development of existing platforms as well as the emergence of novel experimental approaches. SpatialExperiment is ideally positioned to be extended to accommodate data from new platforms in the future, e.g. through extensions of the more general underlying SummarizedExperiment (Morgan et al., 2021) or by integrating with MultiAssayExperiment (Ramos et al., 2017) to store measurements from further assay types (transcriptomics, proteomics or spatial immunofluorescence, or epigenomics) or multiple assays from the same spatial coordinates. For example, the SingleCellMultiModal package (Eckenrode et al., 2021) stores MultiAssayExperiment objects containing scRNA-seq and SRT data as SingleCellExperiment and SpatialExperiment objects, respectively. Three-dimensional spatial data (Wang et al., 2018) or data from multiple timepoints could be accommodated within SpatialExperiment by storing additional spatial or temporal coordinates. Datasets that are too large to store in-memory can be stored using existing Bioconductor infrastructure for sparse matrices and on-disk data representations through the DelayedArray framework (Pagès et al., 2021). The ability to store image files within the objects (in-memory, locally or remotely) will assist with correctly keeping track of images in datasets with large numbers of samples, e.g. from consortium efforts. Interoperability between SpatialExperiment and other data formats (e.g. in Python) can also be ensured through the use of existing conversion packages such as zellkonverter (Zappia & Lun, 2021) and LoomExperiment (Morgan & Van Twisk, 2021). SpatialExperiment provides the research community with a robust, flexible and extendable core data infrastructure for SRT data, assisting both method developers and analysts to generate reliable and reproducible biological insights from these platforms.

Data availability
The SpatialExperiment package is available from Bioconductor at https://biocon ductor.org/packages/SpatialExperiment. The STexampleData, TENxVisiumData and ggspavis packages are available from Bioconductor at https://bioconductor. org/packages/STexampleData, https://bioconductor.org/packages/TENxVisium Data and https://bioconductor.org/packages/ggspavis, respectively. The package versions described in this manuscript are available in Bioconductor version 3.15 onwards. Datasets from Supplementary Tables S1 and S2 and Supplementary Figure S1 are available as SpatialExperiment objects from the STexampleData and TENxVisiumData packages, and the full original datasets are available from the sources listed in Supplementary Tables S1 and S2 (10x Genomics, 2021b;Lohoff et al., 2021;Maynard et al., 2021;Pardo et al., 2022). Fig. 1. Overview of the SpatialExperiment class structure, including assays (tables of measurement values), rowData (metadata describing features), reducedDims (reduced dimension representations), colData (non-spatial and spatial metadata describing observations), spatialCoords (spatial coordinates associated with the observations) and imgData (image files and information)