Quick Demo: Reference-based simulation for Xenium spatial transcriptomics data

[1]:
import simspace as ss
import pandas as pd
[2]:
# Step 1: Load the reference dataset. We here provide a sample dataset from Xenium human breast tumor dataset.
ref_meta = pd.read_csv('../data/reference_metadata.csv', index_col=0)
ref_omics = pd.read_csv('../data/reference_count.csv', index_col=0)
[3]:
ref_meta.head(3)
[3]:
cell_id x_centroid y_centroid transcript_counts control_probe_counts control_codeword_counts total_counts cell_area nucleus_area Cluster
0 4303 1654.367181 2017.908301 416 0 0 416 490.216250 82.184375 Stromal
1 4304 1669.259509 2026.475952 560 1 0 561 348.741719 110.226406 Invasive_Tumor
2 4305 1660.068158 2040.800818 315 0 0 315 256.668125 65.702344 Invasive_Tumor
[4]:
ref_omics.head(3)
[4]:
4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 ... 76380 76381 76382 76383 76384 76385 76386 76387 76388 76389
ABCC11 0 16 4 5 2 0 1 0 0 0 ... 0 1 2 5 3 7 3 1 1 5
ACTA2 13 2 1 0 2 0 1 3 4 13 ... 5 0 0 1 1 0 2 2 0 0
ACTG2 1 1 1 0 2 2 3 6 2 4 ... 1 2 0 4 1 2 3 1 1 4

3 rows × 2032 columns

[5]:
# Step 2: Fit SimSpace spatial parameters from the reference.
# This step uses the reference metadata and omics data to fit the spatial parameters for the simulation.
# One can do this by using the `ss.fit()` method and some preprocessing, which is shown in details in spaital_fitting.ipynb.
# Here we provide a pre-fitted parameters for the Xenium human breast tumor dataset, which is fitted
# using the `ss.fit()` method with 100 population size and 40 iterations.
# Note that the 'ss.fit()' method may involve randomness, so the results may vary slightly each time it is run.
params = ss.util.load_params('../data/fitted_params.json')
[6]:
# Step 3: Simulate the spatial omics data using the fitted parameters.
# This step uses the fitted parameters to simulate the spatial omics data, just as the reference-free simulation.
sim = ss.util.sim_from_params(
    parameters=params,
    shape=(100, 100),
    custom_neighbor=ss.spatial.generate_offsets(3, method='manhattan'),
    seed=0,
)
# This results is the simulated spatial omics data shown in Fig. 3, with x and y axis switched.
sim.plot(figsize=(5, 5), dpi=100, size=14)
../_images/tutorials_reference_based_Xenium_6_0.png
[6]:
# Step 4 (Optional): Simulate omics data based on the spatial simulation.
# The omics profile can also be fitted from the reference dataset if the reference is available.
# Here we use the `fit_scdesign` method to fit the spatial simulation with the reference dataset,
# which requires the R package 'scDesign3' to be installed. One can find more details in README about the installation.
# The following process may take a while, depending on the size of the reference dataset.
# In this example, it takes about 2 minutes to get the simulation done.
sim.fit_scdesign(
    '../data/reference_count.csv',
    '../data/reference_metadata.csv',
    'Cluster',      # Column in metadata that contains the cell type information
    'x_centroid',   # Column in metadata that contains the x coordinate of the cell centroid
    'y_centroid',   # Column in metadata that contains the y coordinate of the cell centroid
    seed=0,
)
Temporary directory created.
scDesgin fit complete.
[7]:
# Example of plotting a specific gene, e.g., 'CD93'.
sim.plot('CD93', figsize=(5,5), dpi=100)
../_images/tutorials_reference_based_Xenium_8_0.png
[9]:
sim.plot('fitted_celltype', figsize=(7,7), dpi=100)
../_images/tutorials_reference_based_Xenium_9_0.png
[ ]: