Deconvolution#

Spot-based spatial transcriptomics platforms measure gene expression at the tissue level, where each spot may contain a mixture of different cell types. Deconvolution methods infer the cellular composition of each spot by integrating single-cell RNA-seq reference data with the spatial expression profiles. This notebook demonstrates spatial deconvolution using the exprmat package with Tangram, a deep learning-based method that learns a spatial mapping from single-cell reference data to spatial coordinates. We cover the preparation of a single-cell reference dataset with cell type annotations, learning the spatial mapping matrix, projecting cell type abundances onto the tissue, and imputing gene expression to fill in genes not captured by the spatial assay.

[1]:
%load_ext autoreload
%autoreload 2
[2]:
import exprmat as em
em.setwd('../../../data')
ver = em.version()
[i] exprmat 0.2.66 / exprmat-db 0.2.66
[i] os: posix (linux)  platform version: 6.8.0-90-generic
[i] loaded configuration from /home/data/yangz/.exprmatrc
[i] current working directory: /home/data/yangz/packages/exprmat/data
[i] current database directory: /home/data/yangz/packages/database (0.2.66)
[i] resident memory: 776.46 MiB
[i] virtual memory: 5.95 GiB

[3]:
expm = em.load_experiment('expm/visium')
   ━━━━━━━━━━━━━━━━━━━━━━━━━━ loading samples          8 / 8     (00:03 < 00:00)

Visiummeasure gene expression across tissue spots, but they suffer from two key limitations: (i) limited gene coverage and (ii) spot-level resolution

Tangram addresses both issues through a two-step imputation approach:

  1. Learning a spatial mapping (tangram_map_cells) — A mapping matrix is learned that aligns single-cell RNA-seq reference cells (or clusters) to spatial spots, constrained by shared gene expression and optionally spatial density priors.

  2. Projecting gene expression (tangram_project_genes) — Using the learned mapping, the full transcriptome of the single-cell reference is projected onto the spatial coordinates. This “imputes” expression for all genes (not just those captured by the spatial assay) at each spot, effectively filling in missing data and leveraging the high gene coverage of scRNA-seq.

First, we need a reference single cell RNA seq data of the same tissue origin to provide the insight of possible diversity of cells in the scenario. For the Visium detections for human HNSCC, we also use scRNA data of HNSCC. The data is already preprocessed to contain log normalized counts in .X

[4]:
sc = em.ad.read_h5ad('visium/single.h5ad')

Give a quick annotation, the process omitted.

[7]:
import exprmat.reader.rna as rna
rna.rna_leiden(sc, None, None, resolution = 1.0)

mapping = {
    'Epi1': ['2', '8'],
    'Epi2': ['14', '19', '7', '9', '15'],
    'Fib': ['13'],
    'SMC': ['4', '3', '22'],
    'Ma': ['17'],
    'Mye': ['20'],
    'Neu': ['16'],
    'T4': ['5', '6'],
    'T8': ['10'],
    'Pl': ['18'],
    'LE': ['21'],
    'EC': ['0', '1', '12', '23'],
    'B': ['11']
}

revmap = {}
for k, v in mapping.items():
    for x in v: revmap[x] = k

sc.obs['cell.type'] = [revmap[x] for x in sc.obs['leiden']]
sc.obs['cell.type'] = sc.obs['cell.type'].astype('category')
fig = rna.rna_plot_embedding(
    sc, None, None,
    basis = 'umap', color = 'cell.type',
    legend = False, annotate_style = 'text'
)
../_images/spatial_g1-deconvolution_9_0.png

For example, we identified two epithelial components (One KRT17+ TP63+, and the other KRT8+ EPCAM+). We want to find mappings of these two subset into spatial coordinates and morphology.

[13]:
fig = rna.rna_plot_multiple_embedding(
    sc, None, None, ncols = 3, figsize = (9, 3),
    basis = 'umap', features = ['KRT17', 'TP63', 'EPCAM'],
    legend = False, annotate_style = 'text'
)
../_images/spatial_g1-deconvolution_11_0.png

For the building of spatial mappings, we use spatially variable genes merged from all clusters.

[ ]:
sample = 'p2-t5'
expm.spatial_bin.get_markers(
    run_on_samples = [sample],
    slot = 'spagcn.markers',
    group_name = '17'
)[sample][['gene', 'pct.in', 'pct.out', 'fc', 'q', 'log10.q']]
[i] fetched diff `17` over `rest` (18 genes)

gene pct.in pct.out fc q log10.q
rna:hsa:g13826 LUM 1.000000 0.996835 2.302415 6.437318e-78 77.191295
rna:hsa:g22861 HERPUD1 0.997691 0.995253 2.107135 6.339839e-77 76.197922
rna:hsa:g53680 SFRP4 0.852194 0.485759 2.073682 2.236636e-110 109.650405
rna:hsa:g15444 POSTN 0.946882 0.783228 1.945651 3.066666e-95 94.513333
rna:hsa:g57703 TP53INP1 0.997691 0.992089 1.802479 7.859763e-72 71.104591
rna:hsa:g59915 ASPN 0.967667 0.799051 1.755815 1.035827e-91 90.984713
rna:hsa:g57902 CTHRC1 0.963048 0.806962 1.665744 3.148795e-84 83.501856
rna:hsa:g6228 VIM 1.000000 0.995253 1.655484 1.246817e-58 57.904197
rna:hsa:g4692 BTG2 0.997691 0.995253 1.605542 1.204427e-58 57.919219
rna:hsa:g62036 PIM2 0.995381 0.985759 1.579871 1.728512e-58 57.762328
rna:hsa:g45757 SFRP2 0.933025 0.912975 1.579464 7.928045e-74 73.100834
rna:hsa:g35422 IGFBP5 0.956120 0.943038 1.578879 2.130125e-68 67.671595
rna:hsa:g58420 PTP4A3 1.000000 0.985759 1.565828 1.001160e-74 73.999496
rna:hsa:g39969 FBLN1 0.935335 0.881329 1.562546 6.522984e-76 75.185554
rna:hsa:g52137 CCN2 0.995381 0.988924 1.557651 3.224739e-44 43.491505
rna:hsa:g35923 COL6A3 0.990762 0.973101 1.552273 5.541407e-72 71.256380
rna:hsa:g44744 CXCL13 0.956120 0.941456 1.518375 1.463188e-57 56.834700
rna:hsa:g34966 COL5A2 0.997691 0.977848 1.515412 1.668277e-65 64.777732
[15]:
clusters = ['7', '17', '18', '19', '20', '21']
genes = []
for c in clusters:
    # get marker genes
    genes += expm.spatial_bin.get_markers(
        run_on_samples = [sample],
        slot = 'spagcn.markers',
        group_name = c
    )[sample].index.tolist()

genes = list(set(genes))
[i] fetched diff `7` over `rest` (0 genes)
[i] fetched diff `17` over `rest` (18 genes)
[i] fetched diff `18` over `rest` (9 genes)
[i] fetched diff `19` over `rest` (35 genes)
[i] fetched diff `20` over `rest` (54 genes)
[i] fetched diff `21` over `rest` (104 genes)

[18]:
expm.spatial_bin.view(sample)
annotated data of size 1470 × 35554
    obs : sample <cat> batch <cat> group <cat> modality <cat> taxa <cat> barcode <o> ubc <o>
          in.tissue <i64> row <i64> col <i64> y <i64> x <i64> leiden <cat> spagcn.domain <cat>
          spagcn.refined <cat> imagerow <f64> imagecol <f64> deepst.domain <cat>
          deepst.refined <cat> integrated <cat> metagene.17 <f64>
    var : chr <cat> start <i64> end <i64> strand <cat> id <o> subtype <cat> gene <cat> tlen <f64>
          cdslen <i64> assembly <cat> uid <o> vst.means <f64> vst.vars <f64> vst.vars.norm <f64>
          vst.hvg.rank <f32> vst.hvg <bool> spatialde.padj <f64> spatialde.svg <bool>
          spatialde.rank <f64>
 layers : counts <f32> lognorm <f32> norm <f32>
   obsm : adjacent <arr:f64(35554)> augmented.genes <arr:f64(35554)> deepst.embedding <arr:f32(28)>
          knn <arr:i32(30)> knn.d <arr:f64(30)> pca <arr:f64(35)> spatial <arr:i64(2)>
          spatial.array <arr:i64(2)> umap <arr:f32(2)> weights <arr:f64>
   varm : pca <arr:f64(35)>
   obsp : connectivities <csr:f32> distances <csr:f64>
    uns : deepst.domain leiden neighbors pca spagcn.markers spatial umap
[25]:
expm.spatial_bin.tangram_map(
    run_on_samples = [sample],
    adata_sc = sc, # the scrna reference.
    genes = genes, # spatially variable genes to use.
    mode = 'clusters', # use cluster-level mapping.
    cluster_label = 'cell.type',
    device = 'cuda',
    random_state = 42,
    verbose = True,
    density_prior = 'rna_count_based',
    key_added = 'tangram',
    layer = 'counts'
)
[i] 216 training genes saved to uns["training_genes"].
[i] 18353 overlapping genes saved to uns["overlap_genes"].
[i] uniform density prior saved to obs["uniform_density"].
[i] rna-count based density prior saved to obs["rna_count_based_density"].
[i] allocating tensors for mapping.
[i] training with 216 genes, density prior rna_count_based, mode clusters.
[i] printing scores every 100 epochs.
[i] gene-voxel score: 0.315, cell densities reg: 0.378
[i] gene-voxel score: 0.637, cell densities reg: 0.012
[i] gene-voxel score: 0.639, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.640, cell densities reg: 0.011
[i] gene-voxel score: 0.641, cell densities reg: 0.011
[i] saving results.

[28]:
expm.spatial_bin[sample].obsm['tangram'].iloc[:, :4]
[28]:
EC Epi2 T4 SMC
p2-t5:1 0.000413 0.000008 0.002090 1.661525e-05
p2-t5:2 0.000003 0.000002 0.000001 1.645054e-03
p2-t5:3 0.000376 0.001533 0.000006 1.243685e-06
p2-t5:4 0.001505 0.001084 0.000135 1.841247e-07
p2-t5:5 0.000211 0.000463 0.000003 5.065746e-06
... ... ... ... ...
p2-t5:1466 0.000092 0.000046 0.000009 2.760589e-06
p2-t5:1467 0.001473 0.000005 0.003435 1.010608e-03
p2-t5:1468 0.000674 0.000010 0.000149 8.719371e-04
p2-t5:1469 0.000284 0.000086 0.001662 2.218365e-04
p2-t5:1470 0.001869 0.000451 0.000011 3.911006e-06

1470 rows × 4 columns

[30]:
expm.spatial_bin[sample].obs['ep1'] = expm.spatial_bin[sample].obsm['tangram']['Epi1']
expm.spatial_bin[sample].obs['ep2'] = expm.spatial_bin[sample].obsm['tangram']['Epi2']
[40]:
fig = expm.spatial_bin.plot_spatial(
    run_on_samples = [sample],
    channels = ['hires/r', 'hires/g', 'hires/b'],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'color': 'ep1',
        'ptsize': 15,
        'annotate_style': 'text',
        'vmax': 0.005,
    },
    xrange = (3000, 13000),
    yrange = (5000, 15000),
    figsize = (5, 5),
)
../_images/spatial_g1-deconvolution_19_0.png
[36]:
fig = expm.spatial_bin.plot_spatial(
    run_on_samples = [sample],
    channels = ['hires/r', 'hires/g', 'hires/b'],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'color': 'KRT17',
        'ptsize': 15,
        'annotate_style': 'text'
    },
    xrange = (3000, 13000),
    yrange = (5000, 15000),
    figsize = (5, 5)
)
../_images/spatial_g1-deconvolution_20_0.png
[41]:
fig = expm.spatial_bin.plot_spatial(
    run_on_samples = [sample],
    channels = ['hires/r', 'hires/g', 'hires/b'],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'color': 'ep2',
        'ptsize': 15,
        'annotate_style': 'text',
        'vmax': 0.005,
    },
    xrange = (3000, 13000),
    yrange = (5000, 15000),
    figsize = (5, 5)
)
../_images/spatial_g1-deconvolution_21_0.png

Imputation#

Using paired single-cell data, we can impute the expression matrices to rule out the extensive sparsity of spatial techniques.

[42]:
expm.spatial_bin.tangram_impute(
    run_on_samples = [sample],
    adata_sc = sc,
    cluster_label = 'cell.type',
    scale = True,
    map_key = 'tangram',
    layer = 'imputed'
)

KRT8 expression before imputation

[43]:
fig = expm.spatial_bin.plot_spatial(
    run_on_samples = [sample],
    channels = ['hires/r', 'hires/g', 'hires/b'],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'color': 'KRT8',
        'ptsize': 15,
        'annotate_style': 'text'
    },
    xrange = (3000, 13000),
    yrange = (5000, 15000),
    figsize = (5, 5)
)
../_images/spatial_g1-deconvolution_25_0.png

KRT8 expression after imputation

[44]:
fig = expm.spatial_bin.plot_spatial(
    run_on_samples = [sample],
    channels = ['hires/r', 'hires/g', 'hires/b'],
    plot_embeddings = {
        'visible': True,
        'basis': 'spatial',
        'slot': 'imputed',
        'color': 'KRT8',
        'ptsize': 15,
        'annotate_style': 'text'
    },
    xrange = (3000, 13000),
    yrange = (5000, 15000),
    figsize = (5, 5)
)
../_images/spatial_g1-deconvolution_27_0.png
[45]:
expm.spatial_bin.view(sample)
annotated data of size 1470 × 35554
    obs : sample <cat> batch <cat> group <cat> modality <cat> taxa <cat> barcode <o> ubc <o>
          in.tissue <i64> row <i64> col <i64> y <i64> x <i64> leiden <cat> spagcn.domain <cat>
          spagcn.refined <cat> imagerow <f64> imagecol <f64> deepst.domain <cat>
          deepst.refined <cat> integrated <cat> metagene.17 <f64> ep1 <f32> ep2 <f32>
    var : chr <cat> start <i64> end <i64> strand <cat> id <o> subtype <cat> gene <cat> tlen <f64>
          cdslen <i64> assembly <cat> uid <o> vst.means <f64> vst.vars <f64> vst.vars.norm <f64>
          vst.hvg.rank <f32> vst.hvg <bool> spatialde.padj <f64> spatialde.svg <bool>
          spatialde.rank <f64>
 layers : counts <f32> lognorm <f32> norm <f32> imputed <f32>
   obsm : adjacent <arr:f64(35554)> augmented.genes <arr:f64(35554)> deepst.embedding <arr:f32(28)>
          knn <arr:i32(30)> knn.d <arr:f64(30)> pca <arr:f64(35)> spatial <arr:i64(2)>
          spatial.array <arr:i64(2)> umap <arr:f32(2)> weights <arr:f64> tangram <df>
   varm : pca <arr:f64(35)>
   obsp : connectivities <csr:f32> distances <csr:f64>
    uns : deepst.domain leiden neighbors pca spagcn.markers spatial umap tangram
[46]:
expm.save()
[i] main dataset write to expm/visium/integrated.h5mu
[i] saving individual samples. (pass `save_samples = False` to skip)

   ━━━━━━━━━━━━━━━━━━━━━━━━ modality [spatial-bin]     8 / 8     (00:08 < 00:00)