simspace package

Submodules

simspace.core module

class simspace.core.SimSpace(shape=(50, 50), num_states=5, theta=[], num_iterations=5, phi=5, rho=1, grid=[], neighborhood=[], random_seed=1111)[source]

Bases: object

SimSpace is a class for simulating spatial omics data in a 2-D or 3-D grid.

Parameters:
  • shape (tuple) – The shape of the grid. Can be a tuple of two integers for 2-D or three integers for 3-D.

  • num_states (int) – The number of possible states.

  • theta (numpy.ndarray) – The theta matrix.

  • num_iterations (int) – The number of iterations for Gibbs sampling.

  • phi (int or float) – The phi parameter for Gibbs sampling.

  • rho (int or float) – The rho parameter for Gibbs sampling.

  • neighborhood (list, optional) – The custom neighborhood offsets. Defaults to an empty list.

  • random_seed (int, optional) – The random seed. Defaults to 1111.

create_niche(num_niches=3, n_iter=10, theta_niche=None, neighborhood=None)[source]

Apply MRF-based niche creation in a 2-D grid.

Parameters:
  • num_niches (int) – The number of niches to create.

  • n_iter (int) – The number of iterations for the random niche creation.

  • theta_niche (np.ndarray) – The transition matrix for the niche-level MRF model.

  • neighborhood (list) – The list of neighboring cells, generated by spatial.generate_offsets. Default is spatial.generate_offsets(5, ‘manhattan’).

Raises:

ValueError – If the theta_niche is not a 2-D numpy array or if the number of niches exceeds the number of theta matrices provided.

Return type:

None

Notes

This function uses a Markov Random Field (MRF) approach to create spatial niches. It initializes the niche grid with random integers representing niches and iteratively updates the niches based on their neighbors. If no theta_niche is provided, it defaults to a diagonal matrix with ones on the diagonal. This will result in a uniform distribution of niches.

Examples

>>> sim.create_niche(num_niches=3, n_iter=6, theta_niche=theta_niche)
create_niche3D(num_niches=3, n_iter=10, theta_niche=None, neighborhood=None)[source]

Apply MRF-based niche creation in a 3-D grid.

Parameters:
  • num_niches (int) – The number of niches to create.

  • n_iter (int) – The number of iterations for the random niche creation.

  • theta_niche (np.ndarray) – The transition matrix for the niche-level MRF model.

  • neighborhood (list) – The list of neighboring cells, generated by spatial.generate_offsets. Default is spatial.generate_offsets(5, ‘manhattan’).

Raises:

ValueError – If the self.shape is not a 3-D numpy array or if the theta_niche is not a 2-D numpy array.

Return type:

None

Examples

>>> sim.create_niche3D(num_niches=3, n_iter=6, theta_niche=theta_niche)
create_omics(bg_ratio=0.2, n_genes=1000, bg_param=(1, 1), marker_param=(5, 2), lr_ratio=0.5, spatial=True, k_neighors=20, spatial_effect=3, se_threshold=1.5)[source]

Create the reference-free omics data using Gamma-Poisson distribution.

Parameters:
  • bg_ratio (float) – The ratio of background genes to total genes. Defaults to 0.2.

  • n_genes (int) – The total number of genes to simulate. Defaults to 1000.

  • bg_param (tuple) – The parameters for the background gene distribution (shape, scale). Defaults to (1, 1).

  • marker_param (tuple) – The parameters for the marker gene distribution (shape, scale). Defaults to (5, 2).

  • lr_ratio (float) – The ratio of ligand-receptor pairs to total genes. Defaults to 0.5.

  • spatial (bool) – Whether to simulate spatial omics data. Defaults to True.

  • k_neighors (int) – The number of neighbors to consider for spatial omics.

  • spatial_effect (float) – The spatial effect parameter for spatial omics. Defaults to 3.

  • se_threshold (float) – The threshold for spatial effect. Defaults to 1.5.

Raises:

ValueError – If the bg_ratio is not between 0 and 1, or if the n_genes is not a positive integer.

Return type:

None

Notes

This function simulates omics data based on the provided parameters. It uses the omics module to generate the omics data and metadata. The simulated omics data can be used for further analysis or visualization.

Examples

>>> sim.create_omics(bg_ratio=0.2, n_genes=500)
density_sampler(threshold)[source]

Cell density sampler. This sampler will randomly select cells based on the given threshold either uniformly or per state.

Parameters:

threshold (float or list) – The threshold for the density sampler. Should be a list which must match either the number of states or the number of niches in the grid. If only one state or niche is available, it can be a float.

Raises:

ValueError – If the threshold is not a float or list.

Return type:

None

Notes

If a list of thresholds is provided, it must match the number of states in the grid. If a single float is provided, it will be applied uniformly across all states.

fit_scdesign(ref_count_path, ref_meta_path, group_col, spatial_x, spatial_y, seed=0, isreturn=False)[source]

Fit the scdesign model using the reference dataset.

Parameters:
  • ref_count_path (str) – Path to the reference count matrix.

  • ref_meta_path (str) – Path to the reference metadata.

  • group_col (str) – Column name in the metadata for grouping in the simulation.

  • spatial_x (str) – Column name in the metadata for the x-coordinate of spatial data.

  • spatial_y (str) – Column name in the metadata for the y-coordinate of spatial data.

  • seed (int) – Random seed for reproducibility. Defaults to 0.

  • isreturn (bool) – If True, return the simulated count matrix. Defaults to False.

Raises:

ValueError – If the reference dataset is too small or if scdesign_fit returns None for sim_meta or sim_count.

Notes

This function uses the omics module to fit the scdesign model based on the provided reference dataset. It ranks the cell types based on their frequency in the reference metadata and maps them to the simulation metadata. The simulated count matrix is stored in self.omics and can be returned if isreturn is set to True.

Example

>>> sim.fit_scdesign(
... ref_count_path='path/to/ref_count.csv',
... ref_meta_path='path/to/ref_meta.csv',
... group_col='celltype',   # Should match the column in ref_meta
... spatial_x='x_coord',    # Column in ref_meta for x-coordinate
... spatial_y='y_coord',    # Column in ref_meta for y-coordinate
... seed=42)
fit_srtsim(ref_count_path, ref_meta_path, group_col, spatial_x, spatial_y, seed=0, isreturn=False)[source]

Fit the scdesign model using the reference dataset.

Parameters:
  • ref_count_path (str) – Path to the reference count matrix.

  • ref_meta_path (str) – Path to the reference metadata.

  • group_col (str) – Column name in the metadata for grouping in the simulation.

  • spatial_x (str) – Column name in the metadata for the x-coordinate of spatial data.

  • spatial_y (str) – Column name in the metadata for the y-coordinate of spatial data.

  • seed (int) – Random seed for reproducibility. Defaults to 0.

  • isreturn (bool) – If True, return the simulated count matrix. Defaults to False.

Raises:

ValueError – If the reference dataset is too small or if scdesign_fit returns None for sim_meta or sim_count.

Notes

This function uses the omics module to fit the scdesign model based on the provided reference dataset. It ranks the cell types based on their frequency in the reference metadata and maps them to their corresponding state ranks.

get_custom_neighbors(i, j, neighborhood)[source]

Get the custom neighbors for a pixel (i, j) in a 2-D grid based on the specified neighborhood.

Parameters:
  • i (int) – Row index of the current pixel.

  • j (int) – Column index of the current pixel.

  • neighborhood (list) – List of neighbor offsets relative to the current pixel.

Returns:

List of custom neighbor indices.

Return type:

list

Examples

>>> sim.get_custom_neighbors(5, 5, neighborhood=spatial.generate_offsets(2, 'manhattan'))
[(4, 5), (6, 5), (5, 4), (5, 6)]
get_custom_neighbors3D(i, j, k, neighborhood)[source]

Get the custom neighbors for a pixel (i, j, k) in a 3-D grid based on the specified neighborhood.

Parameters:
  • i (int) – Row (X) index of the current pixel.

  • j (int) – Column (Y) index of the current pixel.

  • k (int) – Z index of the current pixel.

  • neighborhood (list) – List of neighbor offsets relative to the current pixel.

Returns:

List of custom neighbor indices.

Return type:

list

gibbs_sampler()[source]

Perform Gibbs sampling to approximate the field grid in a 2-D grid.

Notes

This function uses Gibbs sampling to update the grid cells based on their neighbors.

Examples

>>> sim.gibbs_sampler()
gibbs_sampler3D()[source]

Perform Gibbs sampling to approximate the field grid in a 3-D grid.

Notes

This function uses Gibbs sampling to update the grid cells based on their neighbors.

Examples

>>> sim = SimSpace(shape=(10, 30, 30), num_states=5, theta=np.random.rand(5, 5), num_iterations=10, seed=42)
>>> sim.gibbs_sampler3D()
initialize()[source]

Initialize the grid with random states.

Notes

The grid is initialized with random integers representing states. The shape of the grid is defined by the shape attribute. The random seed is set to ensure reproducibility.

initialize3D()[source]

Initialize the 3D grid with random states.

Raises:

ValueError – If the shape is not a 3-D tuple.

Notes

The grid is initialized with random integers representing states. The shape of the grid is defined by the shape attribute. The random seed is set to ensure reproducibility.

manual_niche(pattern={})[source]

Manually create niches with given patterns in the grid.

Parameters:

pattern (dict) – A dictionary where keys are niche indices and values are tuples containing the shape (‘ellipse’ or ‘rectangle’) and parameters for the shape. For ‘ellipse’: (center_x, center_y, radius_x, radius_y, angle) For ‘rectangle’: (center_x, center_y, length, width)

Raises:

ValueError – If the pattern is not a valid dictionary or if the shapes are not ‘ellipse’ or ‘rectangle’.

Return type:

None

moran_I()[source]

Calculate the global Moran’s I. Results are stored in self.moran_I_value.

perturbation(step)[source]

Perturb the coordinates of the grid. It adds Gaussian noise to the coordinates of the grid, so the cells are randomly displaced instead of being perfectly aligned to the grid.

Parameters:

step (float | int) – The standard deviation of the Gaussian noise to be added to the coordinates.

Raises:

ValueError – If the step is not a positive number.

Return type:

None

Examples

>>> sim.perturbation(step = 0.2)
perturbation3D(step)[source]

Perturb the coordinates of the grid. It adds Gaussian noise to the coordinates of the grid, so the cells are randomly displaced instead of being perfectly aligned to the grid.

Parameters:

step (float | int) – The standard deviation of the Gaussian noise to be added to the coordinates.

Raises:

ValueError – If the step is not a positive number.

Return type:

None

Examples

>>> sim.perturbation3D(step = 0.2)
plot(feature='state', figsize=(8, 8), dpi=150, size=20, title=None, save_path=None, legend=True)[source]

Plot SimSpace simulation results using seaborn.

Parameters:
  • feature (str) – The feature to plot. Can be ‘state’, ‘celltype’, or any other feature in the meta or omics data.

  • figsize (tuple) – The size of the figure. Defaults to (8, 8).

  • dpi (int) – The resolution of the figure. Defaults to 150.

  • size (int) – The size of the points in the scatter plot. Defaults to 20.

  • title (str, optional) – The title of the plot.

  • save_path (str, optional) – The path to save the plot. If None, the plot will be shown instead.

  • legend (bool) – Whether to show the legend. Defaults to True.

Raises:

ValueError – If the feature is not found in the meta or omics data.

Return type:

None

Notes

This function uses seaborn to create a scatter plot of the specified feature. If the feature is ‘state’ or any features in self.meta.columns, it will plot the metadata with cells’ simulated coordinates. If the feature is not found in self.meta, it will check if it exists in self.omics data.

Example

>>> sim.plot(feature='state', figsize=(5, 5), dpi=300, size=14)
>>> # For cell type visualization after omics fitting
>>> sim.plot(feature='fitted_celltype')
>>> # For a specific omics feature
>>> sim.plot(feature='Gene_1')
plot3D(axis='z', pos=0, figsize=(6, 6), dpi=150, save_path=None)[source]

Plot the 3D grid along a specified axis at a given position.

Parameters:
  • axis (str) – The axis to plot. Can be ‘x’, ‘y’, or ‘z’. Defaults to ‘z’.

  • pos (int) – The position along the specified axis to plot. Defaults to 0.

  • figsize (tuple) – The size of the figure. Defaults to (6, 6).

  • dpi (int) – The resolution of the figure. Defaults to 150.

  • save_path (str, optional) – The path to save the plot. If None, the plot will be shown instead.

Raises:

ValueError – If the specified axis is invalid.

Return type:

None

Notes

This function uses seaborn to create a scatter plot of the specified axis at the given position. The plot will show the distribution of cell annotations in the 3D grid at that position. The color palette used is cc.glasbey as default.

Example

>>> sim.plot3D(axis='z', pos=5)
plot_grid(figsize=(5, 5), dpi=150)[source]

Plot the final grid using seaborn.

Parameters:
  • figsize (tuple) – The size of the figure. Defaults to (5, 5).

  • dpi (int) – The resolution of the figure. Defaults to 150.

Raises:

ValueError – If the grid is not a 2-D numpy array.

Return type:

None

Notes

This function uses seaborn to create a heatmap of the grid. The grid should be a 2-D numpy array where each cell represents a state. The color palette used is cc.glasbey as default.

Example

>>> sim.plot_grid(figsize=(10, 10), dpi=300)
plot_niche(figsize=(5, 5), dpi=150)[source]

Plot the niche in the simulation.

Parameters:
  • figsize (tuple) – The size of the figure. Defaults to (5, 5).

  • dpi (int) – The resolution of the figure. Defaults to 150.

Raises:

ValueError – If the niche is not a 2-D numpy array.

Notes

This function uses seaborn to create a heatmap of the niche. The niche should be a 2-D numpy array where each cell represents a niche class. The color palette used is the default seaborn palette.

Example

>>> sim.plot_niche(figsize=(5, 5), dpi=150)
print(type='wide')[source]

Print the final grid.

Parameters:

type (str) – The type of grid to print. Can be ‘long’ or ‘wide’. Defaults to ‘wide’.

Raises:

ValueError – If the type is not ‘long’ or ‘wide’.

Return type:

None

Notes

This function prints the final grid in its current state. The grid is a 2-D or 3-D numpy array representing the simulated spatial data.

save(path, file_name='simspace.pkl')[source]

Save the grid to a file using pickle.

Parameters:
  • path (str) – The path to save the grid.

  • file_name (str) – The name of the file to save the grid. Defaults to ‘sim_space.pkl’.

Raises:

ValueError – If the path or file_name is not a string.

Return type:

None

Example

>>> sim.save('/path/to/save', 'my_simspace.pkl')
update_seed(seed)[source]

Update the random seed.

Parameters:

seed (int) – The new random seed.

Raises:

ValueError – If the seed is not an integer or is negative.

Return type:

None

Example

>>> sim.update_seed(42)

simspace.spatial module

simspace.spatial.calculate_gearys_C(data, coordinates, k=20)[source]

Calculate Geary’s C for a given dataset and spatial weights. :type data: DataFrame :param data: pandas DataFrame or Series containing the variable of interest. :type coordinates: DataFrame :param coordinates: numpy array or pandas DataFrame containing the spatial coordinates. :type k: int :param k: number of nearest neighbors to consider for spatial weights.

Return type:

float

simspace.spatial.calculate_interaction_score(data, coordinates, typelist, k=50, summary='mean', use_knn=True)[source]

Compute a type x type matrix of inter-type distances.

By default uses k-NN distances (fast, memory-safe):
  • For c1 != c2: distances from each c1 cell to its k nearest c2 neighbors.

  • For c1 == c2: distances from each cell to its k nearest other cells of the same type (self-matches removed).

If use_knn=False, computes full pairwise distances:
  • For c1 != c2: all |c1| x |c2| distances (uses scipy.spatial.distance.cdist).

  • For c1 == c2: all unique pairs within the type (uses pdist).

Parameters:
  • data (Series | DataFrame) – pd.Series or single-column pd.DataFrame of categorical labels (cell types).

  • coordinates (DataFrame | ndarray) – (N, d) array of spatial coordinates.

  • typelist (list) – list of types to include in the output matrix (order preserved).

  • k (int) – number of neighbors to consider for k-NN (default: 20).

  • summary (str) – how to summarize the distances (“mean”, “median”, “min”, “max”).

  • use_knn (bool) – whether to use k-NN distances (True, default) or full pairwise distances (False).

Returns:

pd.DataFrame of shape (len(typelist), len(typelist)) with summarized inter-type distances.

Return type:

M

simspace.spatial.calculate_local_entropy(data, coordinates, k=20)[source]

Calculate the local entropy for a given dataset and spatial coordinates.

Parameters:
  • data (DataFrame) – pandas DataFrame or Series containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates.

  • k (int) – number of nearest neighbors to consider for spatial weights.

Returns:

Local entropy values.

Return type:

local_entropy

simspace.spatial.calculate_local_morans_I(data, coordinates, k=20)[source]

Calculate local Moran’s I for a given dataset and spatial weights.

Parameters:
  • data (DataFrame) – pandas DataFrame or Series containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates.

  • k (int) – number of nearest neighbors to consider for spatial weights.

Returns:

Local Moran’s I values.

Return type:

local_morans_I

simspace.spatial.calculate_morans_I(data, coordinates, k=5)[source]

Calculate Moran’s I for a given dataset and spatial weights.

Parameters:
  • data (DataFrame) – pandas DataFrame containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates. Used for libpysal.cg.KDTree()

  • k – number of nearest neighbors to consider for spatial weights. Default is 5.

Returns:

Moran’s I value.

Return type:

morans_I

simspace.spatial.compute_variogram(coords, labels, cell_types, n_bins=20, max_dist=None)[source]

Compute cross-variograms between all pairs of cell types.

Parameters:
  • coords (ndarray) – numpy array of shape (n_cells, n_dimensions) containing the spatial coordinates.

  • labels (ndarray) – numpy array of shape (n_cells,) containing the cell type labels.

  • cell_types (list) – list of unique cell types to consider.

  • n_bins (int) – number of distance bins to use for the variogram (default is 20).

  • max_dist (float) – maximum distance to consider for the variogram (default is None, which uses half the maximum distance between points).

Returns:

dictionary where keys are tuples of cell type pairs and values are tuples of (bin_centers, gamma).

Return type:

variograms

simspace.spatial.generate_offsets(distance, method='manhattan', linear=False)[source]

Generate neighbor offsets based on the specified distance and method.

Parameters:
  • distance (int) – Distance parameter.

  • method (str) – Method to generate offsets (‘manhattan’ or ‘euclidean’).

  • linear (bool) – Whether to include the cell itself in the offsets (default is False).

Returns:

List of generated neighbor offsets.

Return type:

list

Raises:

ValueError – If the distance is not an integer or if the method is not recognized.

Examples

>>> from simspace.spatial import generate_offsets
>>> offsets = generate_offsets(1, 'manhattan')
>>> print(offsets)
[(-1, 0), (0, -1), (0, 1), (1, 0)]
>>> offsets = generate_offsets(2, 'euclidean', linear=True)
>>> print(offsets)
[(-2, 0), (-1, -1), (-1, 0), (-1, 1), (0, -2), (0, -1), (0, 1), (0, 2), (1, -1), (1, 0), (1, 1), (2, 0), (0, 0)]
simspace.spatial.generate_offsets3D(distance, method, linear=False)[source]

Generate 3D neighbor offsets based on the specified distance and method.

Parameters:
  • distance (int) – Distance parameter.

  • method (str) – Method to generate offsets (‘manhattan’ or ‘euclidean’).

  • linear (bool) – Whether to include the cell itself in the offsets (default is False).

Returns:

List of generated neighbor offsets.

Return type:

list

Raises:

ValueError – If the distance is not an integer or if the method is not recognized.

Examples

>>> offsets = generate_offsets3D(3, 'manhattan')
>>> print(offsets)
simspace.spatial.integrate_gearys_C(data, coordinates, typelist)[source]

Calculate Geary’s C for a given dataset and spatial weights. :type data: DataFrame :param data: pandas DataFrame containing the variable of interest. :type coordinates: DataFrame :param coordinates: numpy array or pandas DataFrame containing the spatial coordinates. Used for libpysal.cg.KDTree() :type typelist: :param typelist: list of types to calculate Geary’s C for.

Returns:

List of Geary’s C values for each type in typelist.

Return type:

gc_list

Raises:

ValueError – If typelist is empty.

simspace.spatial.integrate_morans_I(data, coordinates, typelist)[source]

Calculate Moran’s I for a given dataset and spatial weights.

Parameters:
  • data (DataFrame) – pandas DataFrame containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates. Used for libpysal.cg.KDTree()

  • typelist – list of types to calculate Moran’s I for.

Returns:

List of Moran’s I values for each type in typelist.

Return type:

mi_list

Raises:

ValueError – If typelist is empty.

simspace.spatial.integrate_variogram(data, coordinates, typelist)[source]

Compute the average variogram across all cell type pairs.

Parameters:
  • data (DataFrame) – pandas DataFrame containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates.

  • typelist – list of unique cell types to consider.

Returns:

dictionary where keys are tuples of cell type pairs and values are tuples of (bin_centers, gamma).

Return type:

variograms

Raises:

ValueError – If typelist is empty.

simspace.spatial.plot_local_entropy(local_entropy, ax=None)[source]

Plot a histogram of local entropy values.

Parameters:
  • local_entropy (ndarray) – Local entropy values.

  • ax – matplotlib axis object to plot on.

Returns:

matplotlib axis object.

Return type:

ax

simspace.spatial.plot_local_morans_I(data, coordinates, local_morans_I, ax=None)[source]

Plot local Moran’s I values on a scatter plot.

Parameters:
  • data (DataFrame) – pandas DataFrame or Series containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates.

  • local_morans_I (ndarray) – Local Moran’s I values.

  • ax – matplotlib axis object to plot on.

Returns:

matplotlib axis object.

Return type:

ax

simspace.spatial.spatial_stat(data, coordinates, typelist)[source]

Calculate moran’s I and local entropy for a given dataset.

Parameters:
  • data (DataFrame) – pandas DataFrame containing the variable of interest.

  • coordinates (DataFrame) – numpy array or pandas DataFrame containing the spatial coordinates.

  • typelist (list) – list of types to calculate Moran’s I for.

Returns:

numpy array containing moran’s I and local entropy values.

Return type:

res

simspace.util module

simspace.util.convolve(simspace, kernel, scale=1, conv_type='average')[source]

Convolve the omics data with a kernel by averaging.

Parameters:
  • simspace (object) – SimSpace object containing cell metadata and omics data.

  • kernel (tuple) – Size of the kernel as a tuple (width, height).

  • scale (int) – Scaling factor for the omics data. Defaults to 1.

  • conv_type (str) – Type of convolution to perform (‘average’ or ‘sum’). Defaults to ‘average’.

Returns:

A tuple containing two DataFrames:
  • spot_meta: Metadata for the spots, including their coordinates and state proportions.

  • spot_omics: Omics data for the spots, either averaged or summed based on conv_type.

Return type:

tuple

Examples

>>> simspace = SimSpace(...)  # Initialize your SimSpace object
>>> kernel = (5, 5)
>>> spot_meta, spot_omics = convolve(simspace, kernel, scale=1, conv_type='average')
>>> print(spot_meta.head())
>>> print(spot_omics.head())
simspace.util.generate_random_parameters(n_group, n_state, theta=0.8, niche_theta=0.5, density_max=0.4, density_min=0.01, phi_max=5, phi_min=4.4, seed=0)[source]

Generate random parameters for the simulation.

Parameters:
  • n_group (int) – Number of groups.

  • n_state (int) – Number of states.

  • theta (float) – Maximum value for theta. Defaults to 0.8.

  • niche_theta (float) – Maximum value for niche theta. Defaults to 0.5.

  • density_max (float) – Maximum value for density replicates. Defaults to 0.4

  • density_min (float) – Minimum value for density replicates. Defaults to 0.01.

  • phi_max (float) – Maximum value for phi replicates. Defaults to 5.

  • phi_min (float) – Minimum value for phi replicates. Defaults to 4.4

  • seed (int) – Random seed for reproducibility. Defaults to 0.

Returns:

Dictionary containing the generated parameters.

Return type:

dict

Notes

  • The function generates random values for niche theta, theta list, density replicates, and phi replicates.

  • The theta values are generated uniformly within specified ranges.

  • The function uses a seed for reproducibility.

simspace.util.load_params(input_file)[source]

Load genetic algorithm parameters from a JSON file.

Parameters:

input_file (str) – Path to the input JSON file.

Returns:

Dictionary containing parameter names and values.

Return type:

dict

Raises:
  • FileNotFoundError – If the input file does not exist.

  • ValueError – If the JSON file cannot be decoded or if the parameters are not in the expected format.

Note

The function expects the JSON file to contain a dictionary with specific keys.

simspace.util.save_params(params, output_file)[source]

Save genetic algorithm parameters to a JSON file with annotations.

Parameters:
  • params (dict) – Dictionary containing parameter names and values.

  • output_file (str) – Path to the output JSON file.

Raises:
  • ValueError – If the parameters are not in the expected format or if required keys are missing.

  • FileNotFoundError – If the output file cannot be created.

Note

The function converts NumPy arrays to lists before saving to ensure compatibility with JSON format. If the ‘theta_list’ contains NumPy arrays, they are converted to lists. If ‘theta_list’ is not a list of lists or NumPy arrays, a ValueError is raised.

Return type:

None

Returns:

None

Example

>>> params = generate_random_parameters(
>>>     n_group=3,
>>>     n_state=5,
>>>     seed=42
>>> )
>>> save_parameters_to_json(params, 'params.json')
>>> # The saved JSON file will contain the parameters in a structured format.
simspace.util.sim_from_json(input_file, shape, num_iteration, n_iter, custom_neighbor=None, seed=0)[source]

Simulate a SimSpace object from parameters saved in a JSON file.

Parameters:
  • input_file (str) – Path to the input JSON file containing simulation parameters.

  • shape (tuple) – Shape of the SimSpace grid.

  • num_iteration (int) – Number of iterations for the simulation.

  • n_iter (int) – Number of iterations for niche creation.

  • custom_neighbor (callable) – Custom neighbor offsets for the simulation. Defaults to None, which uses the default offsets.

  • seed (int) – Random seed for reproducibility. Defaults to 0.

Returns:

A SimSpace object initialized with the parameters from the JSON file.

Return type:

SimSpace

Raises:
  • FileNotFoundError – If the input file does not exist.

  • ValueError – If the JSON file cannot be decoded or if the parameters are not in the expected format.

  • ValueError – If the ‘theta_list’ is not a list of lists or NumPy arrays, or if the size of the theta matrix is invalid.

simspace.util.sim_from_params(parameters, shape=(100, 100), num_iteration=4, n_iter=6, custom_neighbor=None, step=0.2, seed=0)[source]

Simulate a SimSpace object from given parameters. The parameters should be in the format generated by the generate_random_parameters function.

Parameters:
  • parameters (dict) – Dictionary containing the simulation parameters.

  • shape (tuple) – Shape of the SimSpace grid. Defaults to (100, 100).

  • num_iteration (int) – Number of iterations for the simulation. Defaults to 4.

  • n_iter (int) – Number of iterations for niche creation. Defaults to 6.

  • custom_neighbor (list) – Custom neighbor offsets for the simulation. Defaults to None, which uses the default offsets.

  • step (float) – Step size for perturbation. Defaults to 0.2.

  • seed (int) – Random seed for reproducibility. Defaults to 0.

Returns:

A SimSpace object initialized with the given parameters.

Return type:

SimSpace

Raises:

ValueError – If the parameters are not in the expected format or if required keys are missing.

simspace.niche module

simspace.niche.create_ellipse(array_shape, center, radius_x, radius_y, angle)[source]

Create a 2D numpy array with a rotated ellipse shape.

Parameters:
  • array_shape (tuple) – Shape of the 2D array.

  • center (tuple) – Center of the ellipse as a tuple (x, y).

  • radius_x (int) – Radius of the ellipse along the x-axis.

  • radius_y (int) – Radius of the ellipse along the y-axis.

  • angle (float) – Angle of rotation in degrees.

Returns:

2D numpy array with the rotated ellipse shape, where pixels inside the ellipse are set to 1 and others to 0.

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> arr = create_ellipse((10, 10), (5, 5), 3, 2, 45)
>>> print(arr.astype(int))
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0 0]
[0 0 1 1 1 1 1 1 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 0 1 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
simspace.niche.create_rectangle(array_shape, center, width, height)[source]

Create a 2D numpy array with a rectangle shape.

Parameters:
  • array_shape (tuple) – Shape of the 2D array.

  • center (tuple) – Center of the rectangle as a tuple (x, y).

  • width (int) – Width of the rectangle.

  • height (int) – Height of the rectangle.

Returns:

2D numpy array with the rectangle shape

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> arr = create_rectangle((10, 10), (5, 5), 2, 4)
>>> print(arr.astype(int))
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 1 0 0]
[0 0 0 1 1 1 1 1 0 0]
[0 0 0 1 1 1 1 1 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
simspace.niche.create_ring(array_shape, center, inner_radius, outer_radius)[source]

Create a 2D numpy array with a ring shape.

Parameters:
  • array_shape (tuple) – Shape of the 2D array.

  • center (tuple) – Center of the ring as a tuple (x, y).

  • inner_radius (int) – Inner radius of the ring.

  • outer_radius (int) – Outer radius of the ring.

Returns:

2D numpy array with the ring shape

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> arr = create_ring((10, 10), (5, 5), 2, 4)
>>> print(arr.astype(int))
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 1 1 1 1 1 0 0]
[0 0 1 1 1 1 1 1 1 0]
[0 0 1 1 0 0 0 1 1 0]
[0 1 1 1 0 0 0 1 1 1]
[0 0 1 1 0 0 0 1 1 0]
[0 0 1 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0 0]
[0 0 0 0 0 1 0 0 0 0]]
simspace.niche.create_vessel(array_shape, center, length, width)[source]

Create a 2D numpy array with a linear tube shape.

Parameters:
  • array_shape (tuple) – Shape of the 2D array.

  • center (tuple) – Center of the tube as a tuple (x, y).

  • length (int) – Length of the tube.

  • width (int) – Width of the tube.

Returns:

2D numpy array with the tube shape

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> arr = create_vessel((10, 10), (5, 5), 6, 2)
>>> print(arr.astype(int))
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]]

simspace.omics module

simspace.omics.run_splatter(new_meta, ngene=1000, r_script_path=None)[source]

Run the splatter simulation to generate synthetic single-cell RNA-seq data.

Parameters:
  • new_meta (pd.DataFrame) – A DataFrame containing simulated spatial metadata for omics simulation, which is derived from the .meta of the simspace object.

  • ngene (int) – The number of genes to simulate.

  • r_script_path (str) – The path to the R script that performs the splatter simulation. Default is None, which uses the script of simspace package.

Returns:

A tuple containing two DataFrames:
  • splatter_data: The simulated gene expression data.

  • splatter_meta: The metadata of the simulated cells.

Return type:

tuple

Raises:
  • FileNotFoundError – If the R script file does not exist.

  • Exception – If the R script fails to execute or returns an error.

Examples

>>> splatter_data, splatter_meta = run_splatter(sim.meta, ngene=1000) # sim is simulated simspace object
>>> print(splatter_data.head())
>>> print(splatter_meta.head())
simspace.omics.scdesign_fit(count_path, meta_path, group_col, spatial_x, spatial_y, new_meta, seed=0, r_script_path=None)[source]

Fit the scdesign model to the given reference data and metadata to simulate new omics data. :type count_path: str :param count_path: The path to the count matrix file. :type count_path: str :type meta_path: str :param meta_path: The path to the metadata file. :type meta_path: str :type group_col: str :param group_col: The column name in the metadata that contains the grouping information. :type group_col: str :type spatial_x: str :param spatial_x: The column name in the metadata that contains the x-coordinate of the spatial location. :type spatial_x: str :type spatial_y: str :param spatial_y: The column name in the metadata that contains the y-coordinate of the spatial location. :type spatial_y: str :type new_meta: pd.DataFrame :param new_meta: A DataFrame containing simulated spatial metadata for omics simulation, which is derived from the .meta of the simspace object. :type new_meta: pd.DataFrame :type seed: int :param seed: The random seed for reproducibility. :type seed: int :type r_script_path: str :param r_script_path: The path to the R script that performs the scDesign fitting. Default is None, which uses the script of simspace package. :type r_script_path: str

Returns:

A tuple containing two DataFrames:
  • sim_data: The simulated gene expression data.

  • sim_meta: The metadata of the simulated cells.

Return type:

tuple

Raises:
  • FileNotFoundError – If the R script file does not exist.

  • Exception – If the R script fails to execute or returns an error.

Examples

>>> sim_data, sim_meta = scdesign_fit("path/to/count.csv",
...                              "path/to/meta.csv",
...                              "group_column",
...                              "x_coordinate",
...                              "y_coordinate",
...                              new_meta=sim.meta, # sim is simulated simspace object
...                              seed=42)
>>> print(sim_data.head())
>>> print(sim_meta.head())
simspace.omics.simOmics(omics_meta, meta, seed=0)[source]

Simulate the omics data based on the metadata and cell metadata.

Parameters:
  • omics_meta (pd.DataFrame) – The metadata of the omics data, which should contain the following columns: - GeneID: The index of the gene. - Marker: The cell type of the gene, -1 for background genes. - LRindex: The index of the ligand-receptor pair, -1 if the gene is not a ligand or receptor. - Type_{cell_type}: The mean expression level of the gene in the corresponding cell type.

  • meta (pd.DataFrame) – The metadata of the cells, which should contain a column named ‘state’ representing the cell types.

  • seed (int) – The random seed for reproducibility.

Returns:

A DataFrame containing the simulated omics data, with columns:
  • Gene_{gene_id}: The expression level of the gene in each cell.

Return type:

pd.DataFrame

Raises:
  • ValueError – If the metadata does not contain the ‘state’ column.

  • TypeError – If the input data is not a DataFrame.

Examples

>>> omics_meta = pd.DataFrame({
...     'GeneID': [0, 1, 2],
...     'Marker': ['A', 'B', -1],
...     'LRindex': [-1, 0, -1],
...     'Type_A': [10, 20, 0],
...     'Type_B': [0, 30, 0]
... })
>>> meta = pd.DataFrame({'state': ['A', 'B']})
>>> omics_data = simOmics(omics_meta, meta, seed=42)
>>> print(omics_data.head())
simspace.omics.simOmicsMeta(meta, n_genes=1000, bg_ratio=0.5, bg_param=(1, 1), marker_param=(5, 2), lr_ratio=0.5, seed=0)[source]

Simulate the metadata of omics data :type meta: DataFrame :param meta: The metadata of the cells, which should contain a column named ‘state’ representing the cell types. :type meta: pd.DataFrame :type n_genes: int :param n_genes: The number of genes to simulate. :type n_genes: int :type bg_ratio: float :param bg_ratio: The ratio of background genes (non-marker genes) to the total number of genes. :type bg_ratio: float :type bg_param: Tuple[float, float] :param bg_param: The parameters for the gamma distribution to simulate the mean expression level of background genes. :type bg_param: tuple :type marker_param: Tuple[float, float] :param marker_param: The parameters for the gamma distribution to simulate the mean expression level of marker genes. :type marker_param: tuple :type lr_ratio: float :param lr_ratio: The ratio of ligand-receptor pairs to the total number of marker genes. :type lr_ratio: float :type seed: int :param seed: The random seed for reproducibility. :type seed: int

Returns:

A DataFrame containing the simulated metadata of the omics data, with columns:
  • GeneID: The index of the gene.

  • Marker: The cell type of the gene, -1 for background genes.

  • LRindex: The index of the ligand-receptor pair, -1 if the gene is not a ligand or receptor.

  • Type_{cell_type}: The mean expression level of the gene in the corresponding cell type.

Return type:

pd.DataFrame

Raises:

ValueError – If the background gene ratio or ligand-receptor pair ratio is not between 0 and 1, or if the metadata does not contain the ‘state’ column.

Examples

>>> meta = pd.DataFrame({'state': ['A', 'B', 'C']})
>>> omics_meta = simOmicsMeta(meta, n_genes=100, bg_ratio=0.3, lr_ratio=0.2, seed=42)
>>> print(omics_meta.head())
simspace.omics.simSpatialOmics(gene_data, gene_meta, cell_meta, k_neighors=10, spatial_effect=1.0, se_threshold=1.5, seed=0)[source]

Simulate the spatial omics data

Parameters:
  • gene_data (pd.DataFrame) – The gene expression data, with cells as rows and genes as columns.

  • gene_meta (pd.DataFrame) – The metadata of the genes, which should contain the following columns: - GeneID: The index of the gene. - Marker: The cell type of the gene, -1 for background genes. - LRindex: The index of the ligand-receptor pair, -1 if the gene is not a ligand or receptor.

  • cell_meta (pd.DataFrame) – The metadata of the cells, which should contain a column named ‘state’ representing the cell types.

  • k_neighors (int) – The number of nearest neighbors to consider for spatial effects.

  • spatial_effect (float) – The factor by which to increase or decrease the expression level based on spatial effects.

  • se_threshold (float) – The threshold for spatial effect application.

  • seed (int) – The random seed for reproducibility.

Returns:

A DataFrame containing the simulated spatial omics data, with cells as rows and genes as columns.

Return type:

pd.DataFrame

Raises:

ValueError – If the spatial effect is not greater than 1, or if the cell metadata does not contain the coordinates or cell types.

Examples

>>> gene_data = pd.DataFrame({
...     'Gene_0': [10, 20, 30],
...     'Gene_1': [5, 15, 25],
...     'Gene_2': [0, 10, 20]
... }, index=['cell_1', 'cell_2', 'cell_3'])
>>> gene_meta = pd.DataFrame({
...     'GeneID': [0, 1, 2],
...     'Marker': ['A', 'B', -1],
...     'LRindex': [-1, 0, -1]
... })
>>> cell_meta = pd.DataFrame({
...     'state': ['A', 'B', 'A'],
...     'row': [0, 1, 0],
...     'col': [0, 1, 2]
... })
>>> spatial_omics = simSpatialOmics(gene_data, gene_meta, cell_meta, k_neighors=2, spatial_effect=2, seed=42)
>>> print(spatial_omics.head())
simspace.omics.splatter_fit(count_path, meta_path, group_col, n_cells=2000, r_script_path=None)[source]

Fit the splatter model to the given reference data and metadata to simulate new omics data.

Parameters:
  • count_path (str) – The path to the count matrix file.

  • meta_path (str) – The path to the metadata file.

  • group_col (str) – The column name in the metadata that contains the grouping information.

  • n_cells (int) – The number of cells to simulate. Should match the number of cells in the spatial simulation results.

  • r_script_path (str) – The path to the R script that performs the splatter fitting. Default is None, which uses the script of simspace package.

Returns:

A tuple containing two DataFrames:
  • splatter_data: The simulated gene expression data.

  • splatter_meta: The metadata of the simulated cells.

Return type:

tuple

Raises:
  • FileNotFoundError – If the R script file does not exist.

  • Exception – If the R script fails to execute or returns an error.

Examples

>>> splatter_data, splatter_meta = splatter_fit("path/to/count.csv",
...                                              "path/to/meta.csv",
...                                              "group_column",
...                                              n_cells=2000)
>>> print(splatter_data.head())
>>> print(splatter_meta.head())
simspace.omics.srtsim_fit(count_path, meta_path, group_col='state', spatial_x='x', spatial_y='y', n_rep=1, seed=0, r_script_path=None)[source]

Fit the SRTsim model to the given reference data and metadata to simulate new omics data. :type count_path: str :param count_path: The path to the count matrix file. :type count_path: str :type meta_path: str :param meta_path: The path to the metadata file. :type meta_path: str :type group_col: str :param group_col: The column name in the metadata that contains the grouping information. Default is ‘state’. :type group_col: str :type spatial_x: str :param spatial_x: The column name in the metadata that contains the x-coordinate of the spatial location. :type spatial_x: str :type spatial_y: str :param spatial_y: The column name in the metadata that contains the y-coordinate of the spatial location. :type spatial_y: str :type n_rep: int :param n_rep: The number of replicates to simulate. Default is 1. Since SRTsim can only simulate exact same number of cells as the reference, this parameter is used when the number of cells in the reference is less than the number of cells in the spatial simulation results. :type n_rep: int :type seed: int :param seed: The random seed for reproducibility. Default is 0. :type seed: int :type r_script_path: str :param r_script_path: The path to the R script that performs the SRTsim fitting. Default is None, which uses the script of simspace package. :type r_script_path: str

Returns:

A tuple containing two DataFrames:
  • sim_data: The simulated gene expression data.

  • sim_meta: The metadata of the simulated cells.

Return type:

tuple

Raises:
  • FileNotFoundError – If the R script file does not exist.

  • Exception – If the R script fails to execute or returns an error.

Examples

>>> sim_data, sim_meta = srtsim_fit("path/to/count.csv",
...                              "path/to/meta.csv",
...                              group_col='state',
...                              spatial_x='x',
...                              spatial_y='y',
...                              n_rep=1,
...                              seed=42)
>>> print(sim_data.head())
>>> print(sim_meta.head())

simspace.optimize module

simspace.optimize.spatial_fit(target, population_size=50, generations=20, mutation_rate=0.2, crossover_rate=0.6, shape=(50, 50), n_group=2, n_state=8, custom_neighbor=[(-3, 0), (-2, -1), (-2, 0), (-2, 1), (-1, -2), (-1, -1), (-1, 0), (-1, 1), (-1, 2), (0, -3), (0, -2), (0, -1), (0, 1), (0, 2), (0, 3), (1, -2), (1, -1), (1, 0), (1, 1), (1, 2), (2, -1), (2, 0), (2, 1), (3, 0)], num_iterations=4, n_iter=6, replicate=1, seed=0, parallel=True, verbose=True)[source]

Perform the Evolutionary Algorithm to optimize simulation parameters based on a target vector.

Parameters:
  • target (list) – Target vector to optimize against.

  • population_size (int) – Number of individuals in the population (default is 50).

  • generations (int) – Number of generations to run the algorithm (default is 20).

  • mutation_rate (float) – Probability of mutation for each individual (default is 0.2).

  • crossover_rate (float) – Probability of crossover between individuals (default is 0.6).

  • shape (tuple) – Shape of the simulation grid (default is (50, 50)).

  • n_group (int) – Number of groups in the simulation (default is 2).

  • n_state (int) – Number of states in the simulation (default is 8).

  • custom_neighbor (list) – Custom neighbor offsets for the simulation (default is None).

  • num_iterations (int) – Number of iterations for the simulation (default is 4).

  • n_iter (int) – Number of iterations for the simulation (default is 6).

  • replicate (int) – Number of replicates for the simulation (default is 1).

  • seed (int) – Random seed for reproducibility (default is 0).

  • parallel (bool) – Whether to run the fitness evaluation in parallel (default is True).

  • verbose (bool) – Whether to print progress information (default is True).

Returns:

The best solution found during the optimization process.

Return type:

best_solution (dict)

Raises:

ValueError – If the target vector is not a list or if the population size is not a positive integer.

simspace.plot module

simspace.plot.plot_gene(coords, feature, size=10, save_path=None, figsize=(6, 6), dpi=200, cmap=None, title=None)[source]

Plot the gene expression level on the spatial coordinates.

Parameters:
  • coords (DataFrame) – DataFrame containing the spatial coordinates with columns ‘col’ and ‘row’.

  • feature (Series) – Series containing the gene expression levels, indexed by the same index as coords.

  • size – Size of the scatter points.

  • save_path – Path to save the figure. If None, the figure will be displayed instead.

  • figsize – Tuple specifying the size of the figure (width, height).

  • dpi – Dots per inch for the figure resolution.

  • cmap – Colormap for the scatter plot. If None, a default colormap will be used.

  • title – Title of the plot. If None, the name of the feature will be used.

Returns:

Displays the scatter plot or saves it to the specified path.

Return type:

None

Raises:
  • ValueError – If coords does not contain ‘col’ and ‘row’ columns, or if feature is not a Series indexed by coords.

  • TypeError – If coords or feature are not of the expected types.

simspace.plot.spatial_pie(SimSpace, spot_meta, kernel=(5, 5), figure_size=(5, 5), dpi=300, save_path=None)[source]

Plot the spatial pie chart of the convolved SimSpace dataset.

Parameters:
  • SimSpace – The SimSpace object containing the spatial data.

  • spot_meta (DataFrame) – DataFrame containing the metadata for each spot, including state proportions. Should have at least three columns: ‘col’ and ‘row’ as the first two columns, and state proportions.

  • kernel (tuple) – The kernel used for convolution, which determines the size of the pie chart.

  • figure_size (tuple) – Tuple specifying the size of the figure (width, height).

  • dpi (int) – Dots per inch for the figure resolution.

  • save_path (str) – Path to save the figure. If None, the figure will be displayed instead.

Returns:

Displays the pie chart or saves it to the specified path.

Return type:

None

Raises:
  • ValueError – If the spot_meta DataFrame does not contain the expected columns.

  • ValueError – If spot_meta does not contain ‘col’ and ‘row’ columns

Module contents