Spot-based Spatial Profiling (ISS, ISH)#

This notebook demonstrates the analysis of spot-based spatial transcriptomics data using the exprmat package, with a focus on Xenium in-situ sequencing (ISS) and in-situ hybridization (ISH) platforms. These technologies capture gene expression at single-cell resolution directly within intact tissue sections, preserving spatial context. We work through the complete workflow from reading segmented Xenium outputs, managing large slide images through downscaling, attaching segmentation mask geometries, defining regions of interest for focused analysis, to performing clustering and annotation of cells within the tissue microenvironment.

[1]:
%load_ext autoreload
%autoreload 2
[2]:
import exprmat as em
em.setwd('../../../data')
ver = em.version()
[i] exprmat 0.2.66 / exprmat-db 0.2.66
[i] os: posix (linux)  platform version: 6.8.0-90-generic
[i] loaded configuration from /home/data/yangz/.exprmatrc
[i] current working directory: /home/data/yangz/packages/exprmat/data
[i] current database directory: /home/data/yangz/packages/database (0.2.66)
[i] resident memory: 777.75 MiB
[i] virtual memory: 5.95 GiB

Reading from segmented Xenium outputs#

Cell segmented Xenium output can be directly read to obtain the mask. For better feature support, we request the Xenium onboard analyser output cells.zarr.zip and cell_feature_matrix.zarr.zip, and recommends the morphology_focus folder be available under the repository.

If loading from transcript spots, the transcripts.parquet (or Zarr), and the morphology_focus are required since segmentation should begin with raw images.

[ ]:
meta = em.metadata(
    locations = ['./xenium'],
    modality = ['xenium/c'],
    default_taxa = ['mmu'],
    names = ['uninfected'],
    batches = ['b1'],
    groups = ['nc']
)
[4]:
expm = em.experiment(meta, dump = 'expm/xenium')

Xenium segmentation performance is commonly well. The segmentation result has been loaded into the observation matrix. However, polygonal mask need to be retrieved later using the loaded segmentation mask information.

[6]:
expm.spatial_cell['uninfected'].obs[['barcode', 'ubc', 'x', 'y', 'cid']]
[6]:
barcode ubc x y cid
uninfected:1 uninfected:aaaagnff-1 uninfected:1 709.756958 1617.774292 27989
uninfected:2 uninfected:aaaahafa-1 uninfected:2 673.264099 1846.041748 28752
uninfected:3 uninfected:aaabgjmo-1 uninfected:3 645.391602 1764.135620 92622
uninfected:4 uninfected:aaabgkok-1 uninfected:4 801.664490 1684.112061 92906
uninfected:5 uninfected:aaablemm-1 uninfected:5 821.919067 2029.243042 111820
... ... ... ... ... ...
uninfected:219069 uninfected:oiiklekf-1 uninfected:219069 5524.763184 3660.229492 3901404325
uninfected:219070 uninfected:oiikmapd-1 uninfected:219070 5502.750977 3662.612793 3901407475
uninfected:219071 uninfected:oiiknmmm-1 uninfected:219071 5503.088379 3651.255127 3901414604
uninfected:219072 uninfected:oiikphad-1 uninfected:219072 5513.437012 3658.035645 3901421315
uninfected:219073 uninfected:oiildplo-1 uninfected:219073 5545.117676 3660.333008 3901439934

219073 rows × 5 columns

Rename the channel names from the default annotation to concise abbreviations. Avoid using forward slashes in channel names, as they are used as path separators within the image store hierarchy.

[8]:
expm.spatial_cell['uninfected'].uns['spatial']['uninfected']['origin']['channels'] = [
    # note that we had better change channel names to those not containing '/' char.
    'nucleus', 'membrane', '18s', 'asma-vim'
]
[9]:
expm.spatial_cell.summary(
    run_on_samples = ['uninfected']
)
uninfected
├── mask-nucleus  of shape 40865 ✗ 34148
├── mask  of shape 40865 ✗ 34148
└── origin  of shape 34148 ✗ 40865 ✗ 4
        [0] nucleus   [1] membrane  [2] 18s       [3] asma-vim
[29]:
s = 'uninfected'

The slide image is extremely large. OpenCV did not allow loading the full size image in original resolution (image size exceeds maximum support.) It is also not a good idea to draw such large image in memory. For large-scale visualization, users should first generate a downsample of it.

Although the original OME Tiff file is pyramidal, exprmat did not load the other dimensions except the original one. For scaling to a custom downsample ratio is easy for user to carry out themselves, and program do not presumes this.

In most places of the package, channels can be specified cross-image. However, scale function is an exception, where the channels should be plain channel names inside the source image.

[15]:
expm.spatial_cell.scale(
    run_on_samples = [s],
    source = 'origin',
    destination = 'lores',
    channels = ['nucleus', 'membrane', '18s', 'asma-vim'],
    scale = 1 / 8,  # downscale 8 times
    interpolation = 2,
    xrange = None,
    yrange = None,
)
[16]:
expm.spatial_cell.summary(
    run_on_samples = [s]
)
uninfected
├── mask-nucleus  of shape 40865 ✗ 34148
├── mask  of shape 40865 ✗ 34148
├── origin  of shape 34148 ✗ 40865 ✗ 4[0] nucleus   [1] membrane  [2] 18s       [3] asma-vim
└── lores  of shape 4268 ✗ 5108 ✗ 4
        [0] nucleus   [1] membrane  [2] 18s       [3] asma-vim

This plots the whole slide with low-resolution image.

[17]:
fig = expm.spatial_cell.plot_spatial(
    run_on_samples = [s],
    channels = ['lores/nucleus', 'lores/membrane'],
    channel_colors = ['cyan', 'red'],
    plot_embeddings = {
        'visible': False,
        'basis': 'spatial',
        'color': 'Foxp3'
    },
    xrange = None, yrange = None,
    figsize = (5, 5)
)
../_images/spatial_a2-spot-based_18_0.png

Attach segmentation mask geometries#

Extract polygonal cell boundaries and geometric features (area, circularity) from the segmentation mask. These geometries enable high-quality spatial visualizations where cells are displayed as polygons rather than points, and provide morphological measurements for quality control.

[24]:
expm.spatial_cell[s].obs[['x', 'y', 'area', 'cid', 'segment']]
[24]:
x y area cid segment
uninfected:1 709.756958 1617.774292 64.076721 27989 1
uninfected:2 673.264099 1846.041748 32.060939 28752 2
uninfected:3 645.391602 1764.135620 34.454220 92622 3
uninfected:4 801.664490 1684.112061 38.744064 92906 4
uninfected:5 821.919067 2029.243042 46.781877 111820 5
... ... ... ... ... ...
uninfected:219069 5524.763184 3660.229492 32.422189 3901404325 219069
uninfected:219070 5502.750977 3662.612793 42.582345 3901407475 219070
uninfected:219071 5503.088379 3651.255127 109.232973 3901414604 219071
uninfected:219072 5513.437012 3658.035645 48.859064 3901421315 219072
uninfected:219073 5545.117676 3660.333008 26.913126 3901439934 219073

219073 rows × 5 columns

[30]:
fig = expm.spatial_cell.segment_features(
    run_on_samples = [s],
    mask = 'mask', correspondence = 'segment',
    key_added = 'segment'
)
   ━━━━━━━━━━━━━━━━━━━━━━━ extracting boundaries 219073 / 219073 (01:49 < 00:00)
[i] .obs[x] already exists, not overwriting
[i] .obs[y] already exists, not overwriting
[i] .obs[area] already exists, not overwriting

The contour geometries are now stored in obsm['segment'], providing smoothed and pixellated boundary polygons in WKT text format along with area and circularity measurements for each cell.

[31]:
expm.spatial_cell[s].obsm['segment'][['x', 'y', 'smoothened', 'area', 'circularity']]
[31]:
x y smoothened area circularity
uninfected:1 709.650756 1617.668102 POLYGON ((705.713192136363 1615.8503837393368,... 64.076726 0.776502
uninfected:2 673.157839 1845.935533 POLYGON ((669.1634172631815 1847.8998013044245... 32.060941 0.763162
uninfected:3 645.285392 1764.029447 POLYGON ((641.1133245675701 1763.9629098112594... 34.454223 0.777441
uninfected:4 801.558221 1684.005884 POLYGON ((797.9381864241273 1681.5127657720818... 38.744067 0.751477
uninfected:5 821.812826 2029.136981 POLYGON ((816.4256953593247 2030.6499904416107... 46.781881 0.713766
... ... ... ... ... ...
uninfected:219069 5524.667373 3660.123599 POLYGON ((5521.601009752113 3659.462960789782,... 32.422191 0.937874
uninfected:219070 5502.674307 3662.506674 POLYGON ((5499.288392528724 3660.9504645551947... 42.582349 0.929488
uninfected:219071 5503.058012 3651.142187 POLYGON ((5497.163493014302 3652.4500410076103... 109.232982 0.851031
uninfected:219072 5513.371747 3657.932404 POLYGON ((5509.488503573921 3655.4254599734063... 48.859068 0.877650
uninfected:219073 5545.011341 3660.226791 POLYGON ((5542.638415334609 3658.6129427722244... 26.913128 0.861219

219073 rows × 5 columns

Region of interest#

Creating regions of interest focuses on a small subset of the data

[32]:
fig = expm.spatial_cell.plot_spatial(
    run_on_samples = [s],
    channels = ['origin/nucleus', 'origin/membrane'],
    channel_colors = ['cyan', 'red'],
    channel_intensities = [1, 5],
    plot_embeddings = {
        'visible': False,
        'basis': 'spatial',
        'color': 'Foxp3'
    },
    xrange = (2500, 3000),
    yrange = (1500, 2000),
    figsize = (5, 5)
)
../_images/spatial_a2-spot-based_25_0.png
[33]:
expm.spatial_cell.roi(
    run_on_samples = [s],
    destination = 'roi-villi',
    scale = 1,
    xrange = (2500, 3000),
    yrange = (1500, 2000),
)
[i] created subset [roi-villi] from sample [uninfected]

[34]:
s = 'roi-villi'
[35]:
expm.spatial_cell.view(s)
annotated data of size 1759 × 451
    obs : sample <o> <c/sample> batch <o> <c/batch> group <o> <c> modality <o> <c/modality>
          taxa <o> <c/taxa> barcode <o> <o> ubc <o> <o> x <f64> <f/coordinate/x>
          y <f64> <f/coordinate/y> area <f64> <f> x.nucleus <f64> <f> y.nucleus <f64> <f>
          area.nucleus <f64> <f> z <f64> <f> n.nucleus <f64> <f> cid <ui32> <i> cid.tag <ui32> <i>
          segment <i64> <i> pixellated <o> <o/boundary> smoothened <o> <o/boundary>
          circularity <f64> <f>
    var : chr <o> <c/chromosome> start <i64> <i> end <i64> <i> strand <o> <c/strand> id <o> <o>
          subtype <o> <c/gsubtype> gene <o> <o/gene> tlen <f64> <i/tlen> cdslen <i64> <i/cdslen>
          assembly <o> <c> uid <o> <o/ugene>
   obsm : spatial <arr:f64(2)> <f/coordinate:2d/embedding> segment <df> <f>
    uns : spatial <spatial>
[36]:
expm.spatial_cell[s].obs[['x', 'y', 'area', 'cid', 'segment']]
[36]:
x y area cid segment
uninfected:881 2646.919922 1639.002686 101.150004 15688195 881
uninfected:882 2645.441406 1624.791382 133.481880 15689492 882
uninfected:883 2646.001709 1633.072510 122.825004 15700555 883
uninfected:884 2641.999756 1644.344604 42.762970 15707889 884
uninfected:885 2650.737305 1649.690552 89.635160 15725245 885
... ... ... ... ... ...
uninfected:205042 2534.047607 1797.848389 48.046252 3655955456 205042
uninfected:205043 2541.711426 1793.143555 14.450001 3655969666 205043
uninfected:205044 2534.185303 1790.220459 64.663752 3655989773 205044
uninfected:205142 2720.277100 1994.548340 67.644065 3657693161 205142
uninfected:205143 2720.382568 1999.192993 10.250469 3657720339 205143

1759 rows × 5 columns

[37]:
fig = expm.spatial_cell.plot_spatial(
    run_on_samples = [s],
    channels = [
        'origin/nucleus',
        'origin/membrane',
        'origin/18s',
        'origin/asma-vim'
    ],
    channel_colors = [
        "#6b6bff",
        '#ff0000',
        '#ffff00',
        '#00ff00'
    ],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'color': 'cid',
        'ticks': True,
        'cmap': 'set3'
    },
    plot_cells = {
        'visible': True,
        # plot cell boundary
        'key_boundary': 'smoothened',
        'color': 'cid',
        'subset': None,
        'alpha': 0.3,
        'filled': False,
        'palette': 'turbo',
        'legend': False,
    },
    xrange = (2600, 2850),
    yrange = (1500, 2000),
    ticks = True,
    figsize = (6, 9), dpi = 100,
)
../_images/spatial_a2-spot-based_30_0.png

Clustering and annotation#

After preparing the ROI with segmentation geometries and a downscaled image, we perform standard single-cell-style analysis on the Xenium cells: log-normalization, HVG selection, PCA, KNN graph construction, Leiden clustering, and UMAP embedding. The resulting clusters are visualized both in UMAP space and overlaid on the tissue image to reveal the spatial organization of transcriptional populations.

[ ]:
expm.spatial_cell.log_normalize(
    run_on_samples = [s],
    key_norm = 'norm',
    key_lognorm = 'lognorm'
)
[39]:
expm.spatial_cell.select_hvg(
    run_on_samples = [s],
    key_lognorm = 'lognorm',
    method = 'vst',
    dest = 'vst',
    n_top_genes = 200
)
[40]:
expm.spatial_cell.scale_pca(
    run_on_samples = [s],
    hvg = 'vst.hvg',
    key_lognorm = 'lognorm',
    key_scaled = 'scaled',
    key_added = 'pca', n_comps = 35,
    keep_sparse = True,
    random_state = 42,
    svd_solver = 'arpack'
)
[41]:
expm.spatial_cell.knn(
    run_on_samples = [s],
    use_rep = 'pca',
    n_comps = None,
    n_neighbors = 30,
    knn = True,
    method = "umap",
    transformer = None,
    metric = "euclidean",
    metric_kwds = {},
    random_state = 42,
    key_added = 'neighbors',
    use_gpu = True
)
[42]:
expm.spatial_cell.leiden(
    run_on_samples = [s],
    resolution = 0.5,
    restrict_to = None,
    random_state = 42,
    key_added = 'leiden',
    adjacency = None,
    directed = None,
    use_weights = True,
    n_iterations = 2,
    partition_type = None,
    neighbors_key = None,
    obsp = None,
    flavor = 'igraph',
    use_gpu = True
)
[43]:
expm.spatial_cell.umap(
    run_on_samples = [s],
    min_dist = 0.3,
    spread = 3,
    n_components = 2,
    maxiter = 2000,
    alpha = 1,
    gamma = 1,
    negative_sample_rate = 5,
    init_pos = "random",
    random_state = 42,
    a = None, b = None,
    key_added = 'umap',
    neighbors_key = "neighbors",
    use_gpu = True
)
[44]:
fig = expm.spatial_cell.plot_embedding(
    run_on_samples = ['roi-villi'],
    basis = 'umap', color = 'leiden',
    sort = True, figsize = (3, 3), dpi = 100, legend = False
)
../_images/spatial_a2-spot-based_38_0.png
[47]:
fig = expm.spatial_cell.plot_spatial(
    run_on_samples = ['roi-villi'],
    channels = [
        'origin/nucleus',
        'origin/membrane',
        'origin/18s',
        'origin/asma-vim'
    ],
    channel_colors = [
        "#6b6bff",
        '#ff0000',
        '#ffff00',
        '#00ff00'
    ],
    plot_embeddings = {
        'visible': False,
        'basis': 'spatial',
        'color': 'leiden',
        'ticks': True,
        'cmap': 'turbo',
        'legend': False,
        'annotate': False,
    },
    plot_cells = {
        'visible': True,
        # plot cell boundary
        'key_boundary': 'smoothened',
        'color': 'leiden',
        'subset': None,
        'alpha': 0.3,
        'filled': True,
        'palette': 'turbo',
        'legend': False,
    },
    xrange = (2600, 2850),
    yrange = (1500, 2000),
    ticks = False,
    figsize = (6, 9), dpi = 100,
)
../_images/spatial_a2-spot-based_39_0.png

Save experiment#

[49]:
expm.save()
[i] saving individual samples. (pass `save_samples = False` to skip)

   ━━━━━━━━━━━━━━━━━━━━━━━ modality [spatial-cell]     2 / 2     (00:20 < 00:00)