Quick Demo: Reference-based simulation for Xenium spatial transcriptomics data

[1]:

import simspace as ss
import pandas as pd

[2]:

# Step 1: Load the reference dataset. We here provide a sample dataset from Xenium human breast tumor dataset.
ref_meta = pd.read_csv('../data/reference_metadata.csv', index_col=0)
ref_omics = pd.read_csv('../data/reference_count.csv', index_col=0)

[3]:

ref_meta.head(3)

[3]:

	cell_id	x_centroid	y_centroid	transcript_counts	control_probe_counts	total_counts	cell_area	nucleus_area	Cluster
0	4303	1654.367181	2017.908301	416	0	416	490.216250	82.184375	Stromal
1	4304	1669.259509	2026.475952	560	1	561	348.741719	110.226406	Invasive_Tumor
2	4305	1660.068158	2040.800818	315	0	315	256.668125	65.702344	Invasive_Tumor

[4]:

ref_omics.head(3)

[4]:

	4303	4304	4305	4306	4307	4308	4309	4310	4311	4312	...	76380	76381	76382	76383	76384	76385	76386	76387	76388	76389
ABCC11	0	16	4	5	2	0	1	0	0	0	...	0	1	2	5	3	7	3	1	1	5
ACTA2	13	2	1	0	2	0	1	3	4	13	...	5	0	0	1	1	0	2	2	0	0
ACTG2	1	1	1	0	2	2	3	6	2	4	...	1	2	0	4	1	2	3	1	1	4

3 rows × 2032 columns

[5]:

# Step 2: Fit SimSpace spatial parameters from the reference.
# This step uses the reference metadata and omics data to fit the spatial parameters for the simulation.
# One can do this by using the `ss.fit()` method and some preprocessing, which is shown in details in spaital_fitting.ipynb.
# Here we provide a pre-fitted parameters for the Xenium human breast tumor dataset, which is fitted
# using the `ss.fit()` method with 100 population size and 40 iterations.
# Note that the 'ss.fit()' method may involve randomness, so the results may vary slightly each time it is run.
params = ss.util.load_params('../data/fitted_params.json')

[6]:

# Step 3: Simulate the spatial omics data using the fitted parameters.
# This step uses the fitted parameters to simulate the spatial omics data, just as the reference-free simulation.
sim = ss.util.sim_from_params(
    parameters=params,
    shape=(100, 100),
    custom_neighbor=ss.spatial.generate_offsets(3, method='manhattan'),
    seed=0,
)
# This results is the simulated spatial omics data shown in Fig. 3, with x and y axis switched.
sim.plot(figsize=(5, 5), dpi=100, size=14)

../_images/tutorials_reference_based_Xenium_6_0.png

[6]:

# Step 4 (Optional): Simulate omics data based on the spatial simulation.
# The omics profile can also be fitted from the reference dataset if the reference is available.
# Here we use the `fit_scdesign` method to fit the spatial simulation with the reference dataset,
# which requires the R package 'scDesign3' to be installed. One can find more details in README about the installation.
# The following process may take a while, depending on the size of the reference dataset.
# In this example, it takes about 2 minutes to get the simulation done.
sim.fit_scdesign(
    '../data/reference_count.csv',
    '../data/reference_metadata.csv',
    'Cluster',      # Column in metadata that contains the cell type information
    'x_centroid',   # Column in metadata that contains the x coordinate of the cell centroid
    'y_centroid',   # Column in metadata that contains the y coordinate of the cell centroid
    seed=0,
)

Temporary directory created.
scDesgin fit complete.

[7]:

# Example of plotting a specific gene, e.g., 'CD93'.
sim.plot('CD93', figsize=(5,5), dpi=100)

../_images/tutorials_reference_based_Xenium_8_0.png

[9]:

sim.plot('fitted_celltype', figsize=(7,7), dpi=100)

../_images/tutorials_reference_based_Xenium_9_0.png

[ ]: