The emergence of spatial transcriptomics as a data modality has opened an unprecedented window into the intricate ballet of cellular communication and collective behavior across diverse biological and tissue contexts. While this technology captures the complexity of simultaneous cell-to-cell interactions and coordinated microscale transcriptional programs leading to emergent macroscale tissue phenomena, its direct visual manifestations can be overwhelmingly opaque without an appropriate analytical translator.
In this repository, we introduce SPATIOME—a platform designed to generate synthetic spatial transcriptomic data. SPATIOME serves as a computational framework that transforms opaque spatial transcriptomics data into quantifiable, biologically interpretable measures and visualizations. By leveraging tools from probabilistic inference, machine learning, topology, network theory, and emergent phenomena, this platform provides a testbed for the development and evaluation of analysis algorithms against precisely defined ground truths.
Spatial transcriptomics enables the capture of spatially resolved gene expression data within tissue sections. Despite its power to record complex biological interactions, the raw data are often difficult to interpret without sophisticated analytical methods. Interdisciplinary efforts have shown that techniques like UMAP (Uniform Manifold Approximation and Projection) can bridge the gap between raw spatial data and interpretable biological insights.
Inspired by these interdisciplinary successes, SPATIOME is being developed as a platform for synthetic data generation that aids in the computational formalization of spatial transcriptomics problems and development of solutions. It allows researchers to simulate controlled environments where algorithms can be rigorously tested and benchmarked.
A simple generative process for spatial transcriptomic data involves:
-
Gene Selection:
Choose a predefined subset of$( G )$ RNA molecules (or genes) from the total possible gene set$( G_{\text{total}} )$ such that$( G \subseteq G_{\text{total}} )$ . -
Data Acquisition:
Advanced imaging, microscopy, and molecular barcoding techniques capture and localize these molecules at spatial positions$((x_i, y_i, z_i))$ within a tissue section. The resulting elements can be summarized as follows:-
$(x_i, y_i, z_i)$ are the spatial coordinates, -
$(g_i)$ is the gene identity, -
$(N_T)$ is the total number of detections.
-
-
Regularized Framework:
For computational processing, the representation is shifted to a continuous field:${g: \mathbb{R}^2 \to \mathbb{R}^n}$ ,where each spatial coordinate
$((x,y))$ corresponds to a high-dimensional gene expression vector. The$(z)$ -component is dropped for simplicity.
SPATIOME generates synthetic spatial omics data through the following components:
A predefined mask image
where
Each cell at
$
P(c)
For each cell type
where:
-
$( \mu_c \in \mathbb{R}^n )$ is the mean expression vector, -
$( \Sigma_c )$ is the covariance matrix capturing gene-gene correlations and intra-type variability.
The model is controlled by several key parameters:
Parameter | Description |
---|---|
Number of genes measured per cell | |
Number of distinct cell types | |
Total number of cells | |
Dirichlet parameters for cell-type proportions | |
Variation within cell types (diagonal of |
|
Separation between cell types (magnitude of |
To evaluate the generated data, SPATIOME uses:
-
UMAP Projections:
To visualize gene expression clustering in lower dimensions. -
Spatial Plottings:
To verify that the spatial distribution of cells conforms to the predefined mask.
CODE Section is being updated at the moment. Please contact author(s) for code preview.