Profiling Clusters#

After clustering single-cell RNA-seq data into discrete populations, the next essential step is to profile each cluster by identifying its defining molecular characteristics. This notebook demonstrates how to use the exprmat package for comprehensive cluster profiling, including the identification of cluster-specific marker genes through differential expression testing, visualization of marker gene expression patterns across clusters, and functional annotation of cluster signatures. These profiling approaches facilitate biological interpretation of clustering results, enabling the assignment of cell type identities and the discovery of novel or rare subpopulations within heterogeneous tissues.

The following methods can be used to describe and characterize cluster features:

  • Finding marker genes, which serves as the foundation for all other methods. These are genes that are particularly highly or lowly expressed in a cluster of interest. After obtaining marker genes, you can examine them directly and look for signatures. For well-known cell type classifications, manual identification using prior knowledge is often the most convenient and quickest approach. However, prior knowledge carries the risk of error.

  • Gene set enrichment analysis, which uses gene sets associated with known functions to determine whether marker genes appear more or less frequently in a gene set than expected by random chance, indicating that the cluster’s characteristics are related to the functional annotation of that gene set. This provides an automated annotation method, but requires ensuring that the gene sets meet the necessary criteria. Such methods include ORA, GSEA, GSVA, and basic rank-score methods.

  • Homology-based annotation, if the species you are studying lacks well-annotated gene sets, or if the cell type to be identified is not well-defined, you may be able to borrow knowledge from other species. Due to the complexity of evolutionary relationships, sequence-homologous genes may no longer perform their original functions.

We will load the data directly from the integrated dataset:

[1]:
%load_ext autoreload
%autoreload 2
[2]:
import exprmat as em
# set working directory
em.setwd('../../../data')
ver = em.version()
[i] exprmat 0.2.66 / exprmat-db 0.2.66
[i] os: posix (linux)  platform version: 6.8.0-90-generic
[i] loaded configuration from /home/data/yangz/.exprmatrc
[i] current working directory: /home/data/yangz/packages/exprmat/data
[i] current database directory: /home/data/yangz/packages/database (0.2.66)
[i] resident memory: 774.93 MiB
[i] virtual memory: 5.95 GiB

[3]:
expm = em.load_experiment('expm/scrna', load_samples = False, load_subset = 'mono-neutro')
[!] samples are not dumped in the experiment directory.

[5]:
print(expm)
annotated data of size 9754 × 19651
subset mono-neutro of size 9754 × 19651
contains modalities: rna

 modality [rna]
    obs : sample <cat> <c/sample> batch <cat> <c/batch> group <cat> <c> modality <cat> <c/modality>
          taxa <cat> <c/taxa> barcode <o> <o> ubc <o> <o> n.umi <f64> <i> n.genes <i64> <i>
          n.mito <f64> <f> n.ribo <f64> <f> pct.mito <f64> <f> pct.ribo <f64> <f>
          filter <bool> <bool> score.doublet <f64> <f> score.doublet.se <f64> <f>
          is.doublet <bool> <bool> qc <bool> <bool/qc> leiden <cat> <c> sc3.5 <cat> <c>
          sc3.10 <cat> <c> sc3.20 <cat> <c> sc3.30 <cat> <c> cell.type <cat> <c>
          kde.umap <f64> <f/kde> psbulk <cat>
    var : chr <cat> <c/chromosome> start <i64> <i> end <i64> <i> strand <cat> <c/strand> id <o> <o>
          subtype <cat> <c/gsubtype> gene <cat> <o/gene> tlen <f64> <i/tlen> cdslen <i64> <i/cdslen>
          assembly <cat> <c> uid <o> <o/ugene> vst.hvg <bool> <bool/hvg> vst.all.means <f64> <f>
          vst.all.vars <f64> <f> vst.all.vars.norm <f64> <f> vst.all.hvg.rank <f32> <f>
          vst.all.hvg <bool> <bool>
 layers : counts <f32> <i/counts> norm <f32> <f>
   obsm : cnmf.10 <df> <f/embedding/usage> harmony <arr:f32(35)> <f> knn <arr:i32(100)> <i/knni>
          knn.d <arr:f32(100)> <f/knnd> pca <arr:f64(35)> <f/embedding/pca>
          umap <arr:f32(2)> <f/embedding>
   varm : cnmf.10 <arr:f64(10)> <f/weights> cnmf.coef.10 <arr:f64(10)> <f/usage-coef>
          pca <arr:f64(35)> <f/weights>
   obsp : connectivities <csr:f32> <f/connectivity> distances <csr:f32> <f/distance>
    uns : cell.type.colors cell.type_colors cnmf <cnmf> cnmf.args <o>
          cnmf.density.10 <cnmf-density> cnmf.dist.10 <f/connectivity> cnmf.stats <cnmf-stats>
          commands <system> kde.umap <kde-stats> leiden <o> leiden.colors <o> markers <markers>
          neighbors <knn> pca <dict> sc3.10.colors <o> sc3.20.colors <o> sc3.30.colors <o>
          sc3.5.colors <o> slots <system> umap <o>

[*] samples not loaded from disk.

[6]:
fig = expm.rna.plot_multiple_embedding(
    basis = 'umap', features = [
        'Ly6a', 'F13a1', 'Flt3', 'Irf8',
        'Csf1r', 'Vcan', 'Ly6g', 'cell.type'
    ], ncols = 4,
    sort = True, figsize = (10, 5), dpi = 100, legend = False,
    annotate_style = 'text', annotate_fontsize = 8, ptsize = 2
)
../_images/scrna_e1-profile-cluster_6_0.png
[7]:
fig = expm.rna.plot_embedding(
    basis = 'umap', color = 'cell.type',
    legend = False,
    annotate = True, annotate_style = 'text', annotate_fontsize = 8,
    contour_plot = False,
    sort = True, figsize = (2.5, 2.5), dpi = 100,
    run_on_splits = True, split_key = 'sample', split_selection = ['normal', 'niche']
)
../_images/scrna_e1-profile-cluster_7_0.png
../_images/scrna_e1-profile-cluster_7_1.png

Finding marker genes#

The markers subroutine can be used to obtain differentially expressed genes of a subpopulation relative to others (or another subpopulation).

[8]:
expm.rna.markers(
    groupby = 'cell.type',
    mask_var = None,
    groups = ['Neu'],
    reference = 'rest',
    n_genes = None, rankby_abs = False, pts = True,
    key_added = 'deg.c7',
    method = 't-test',
    corr_method = 'benjamini-hochberg',
    tie_correct = False,
    gene_symbol = 'gene',
    layer = 'X'
)
[11]:
expm.rna.get_markers(
    slot = 'deg.c7',
    min_pct = 0.5,
    max_pct_reference = 0.75,
    max_q = 0.05,
    min_lfc = 1.0, max_lfc = 25,
    remove_zero_pval = False
)[['names', 'lfc', 'q', 'pct', 'pct.reference', 'log10.q', 'gene']]
[i] fetched diff `Neu` over `rest` (302 genes)

[11]:
names lfc q pct pct.reference log10.q gene
0 rna:mmu:g34454 5.000824 0.000000e+00 0.968536 0.358268 300.000000 Mmp9
2 rna:mmu:g1350 5.083699 0.000000e+00 0.917238 0.192585 300.000000 Cxcr2
3 rna:mmu:g48106 3.973207 0.000000e+00 0.974351 0.429462 300.000000 Mxd1
4 rna:mmu:g33362 4.648287 0.000000e+00 0.925291 0.252953 300.000000 Hdc
5 rna:mmu:g31928 4.757430 0.000000e+00 0.780644 0.116142 300.000000 Dhrs9
... ... ... ... ... ... ... ...
544 rna:mmu:g25070 1.034380 6.239966e-111 0.637041 0.597441 110.204818 Pim1
559 rna:mmu:g1820 1.089561 1.149853e-104 0.634954 0.393701 103.939358 Agap1
584 rna:mmu:g57010 1.001317 3.344159e-99 0.560394 0.471129 98.475713 Nfat5
609 rna:mmu:g4166 1.012059 6.729340e-94 0.520131 0.377625 93.172028 Hsd11b1
629 rna:mmu:g60743 1.384094 1.305682e-89 0.681032 0.672572 88.884163 Ltf

302 rows × 7 columns

Over-representation analysis#

Over-representation analysis uses contingency table tests to determine whether a gene set appears more frequently in the marker gene list than expected by random chance.

[12]:
expm.rna.enrich_ora(
    taxa = 'mmu',
    de_slot = 'deg.c7', group_name = None,
    use_abs_lfc = True, min_abs_lfc = 1, max_abs_lfc = 25,
    key_added = 'ora.c7',
    gene_sets = 'bp',
    identifier = 'entrez', # the bp database contains gene names as ENTREZ
    opa_cutoff = 0.05,
)
[i] fetched diff `Neu` over `rest` (14078 genes)
[i] fetched 10675 genes differentially expressed.
[i] with a background of 18040 observed genes.

[15]:
fig = expm.rna.plot_ora_dotplot(
    slot = 'ora.c7', max_fdr = 1, max_p = 0.05,
    top_term = 10, terms = None, # draw all terms
    colour = 'fdr', cmap = 'wyj', figsize = (6, 3), cutoff = 1, ptsize = 5,

    # customizing the formatting rule of the y axis
    formatter = lambda x: x.replace('GOBP_', '').replace('_', ' ').capitalize(),
    title = 'GO Biological Process Enrichment (ORA)'
)
[i] retreived 10 terms for plotting.

../_images/scrna_e1-profile-cluster_14_1.png

Gene set enrichment analysis#

Using the log fold change of all genes between two groups, enrichment analysis can be performed via a rank-based scoring method.

[16]:
expm.rna.enrich_gsea(
    taxa = 'mmu',
    de_slot = 'deg.c7', group_name = None,
    key_added = 'gsea.c7',
    gene_sets = 'kegg',
    identifier = 'entrez'
)
[i] fetched diff `Neu` over `rest` (14078 genes)
[i] fetched 14078 preranked genes by logfc.

2026-05-11 22:17:24,363 [WARNING] Duplicated values found in preranked stats: 0.01% of genes
The order of those genes will be arbitrary, which may produce unexpected results.
[17]:
expm.rna.get_gsea(slot = 'gsea.c7')
[17]:
name es nes p fwerp fdr tag
11 Taurine and hypotaurine metabolism 0.689581 1.708751 0.045714 0.197 0.111763 4/6
10 Taste transduction 0.394177 1.501916 0.040816 0.493 0.120216 11/29
38 Retinol metabolism 0.452057 1.502266 0.027778 0.493 0.180089 9/21
17 Virion - Human immunodeficiency virus -0.855095 -1.733915 0.009357 0.153 0.180526 6/7
2 Virion - Flavivirus and Alphavirus -0.809305 -1.629260 0.020457 0.545 0.466613 6/7
0 Alcoholism -0.511439 -1.477416 0.000000 0.988 0.569829 51/137
40 Basal cell carcinoma -0.595072 -1.505842 0.022175 0.962 0.601898 10/29
3 Cytoskeleton in muscle cells -0.517264 -1.481652 0.002008 0.985 0.603226 45/129
16 Hedgehog signaling pathway -0.586859 -1.517746 0.014644 0.949 0.613821 9/35
35 Breast cancer -0.526433 -1.489142 0.001004 0.978 0.630820 32/94
9 DNA replication -0.551276 -1.431964 0.044421 1.000 0.641675 24/34
37 Proteoglycans in cancer -0.498521 -1.435232 0.002002 1.000 0.671578 45/147
6 Transcriptional misregulation in cancer -0.499654 -1.443439 0.003000 0.997 0.672297 34/140
23 Biosynthesis of unsaturated fatty acids -0.625432 -1.524325 0.021459 0.937 0.680898 8/21
21 ECM-receptor interaction -0.550174 -1.450612 0.025694 0.995 0.683717 28/43
7 Cytokine-cytokine receptor interaction -0.472292 -1.365499 0.001001 1.000 0.687468 68/158
19 Renal cell carcinoma -0.497755 -1.360844 0.044898 1.000 0.687874 9/61
24 Virion - Ebolavirus, Lyssavirus and Morbillivirus -0.700165 -1.541392 0.038976 0.895 0.699155 6/12
41 Focal adhesion -0.480401 -1.372642 0.009027 1.000 0.704115 30/140
5 Hippo signaling pathway -0.488351 -1.378083 0.017051 1.000 0.704133 20/96
25 Systemic lupus erythematosus -0.553177 -1.564574 0.000000 0.820 0.706464 56/103
32 EGFR tyrosine kinase inhibitor resistance -0.498124 -1.366051 0.032587 1.000 0.713456 17/69
4 PPAR signaling pathway -0.516196 -1.380970 0.041879 1.000 0.720114 17/48
26 Ras signaling pathway -0.478279 -1.382024 0.004008 1.000 0.750822 35/162
8 Gastric cancer -0.494137 -1.387209 0.009082 1.000 0.755987 29/91
39 PI3K-Akt signaling pathway -0.453286 -1.331012 0.002000 1.000 0.763638 51/231
29 Cell adhesion molecules -0.472574 -1.325629 0.029029 1.000 0.772963 52/100
13 Rap1 signaling pathway -0.461357 -1.334247 0.008000 1.000 0.798483 33/161
33 Pathways in cancer -0.416376 -1.237094 0.008000 1.000 0.800584 123/386
12 Calcium signaling pathway -0.448758 -1.291423 0.018018 1.000 0.807508 58/153
14 Ribosome -0.444357 -1.259575 0.043043 1.000 0.836399 105/128
28 MAPK signaling pathway -0.421247 -1.228006 0.027000 1.000 0.839601 48/224
[19]:
fig = expm.rna.plot_gsea_dotplot(
    slot = 'gsea.c7', max_fdr = 1, max_p = 0.05,
    top_term = 10, terms = None, # draw all terms
    colour = 'p', cmap = 'turbo', figsize = (6, 3), cutoff = 1, ptsize = 5,

    # customizing the formatting rule of the y axis
    formatter = lambda x: x,
    title = 'KEGG Enrichment (GSEA)'
)
[i] retreived 10 terms for plotting.

../_images/scrna_e1-profile-cluster_18_1.png
[22]:
fig = expm.rna.plot_gsea_leading_edge(
    slot = 'gsea.c7',
    terms = 'Systemic lupus erythematosus',
    figsize = (4, 4),
    title = None,
)
../_images/scrna_e1-profile-cluster_19_0.png

Single-cell gene set scoring#

Single-cell scoring functions can be used to assess the enrichment level of specific gene sets in individual cells. score_genes is a scanpy-compatible version, while other algorithmic implementations include aucell, ulm, and gsva. These functions generate obs columns named score.{geneset}.

[27]:
expm.rna.score_genes(
    taxa = 'mmu',
    gene_sets = {
        'neu': ['S100a8', 'S100a9', 'Mpo'],
    },
    identifier = 'gene', # can be 'gene', 'uppercase', 'entrez', and 'ugene'
    lognorm = 'X',
    random_state = 42,
)
[29]:
expm['rna'].obs[['score.neu']]
[29]:
score.neu
distal:2 3.103644
distal:3 3.853550
distal:4 4.015083
distal:8 3.794824
distal:9 3.864962
... ...
normal:4657 0.530043
normal:4658 4.420076
normal:4660 4.017634
normal:4661 4.039845
normal:4662 4.271536

9754 rows × 1 columns

[31]:
fig = expm.rna.plot_embedding(
    basis = 'umap', color = 'score.neu',
    sort = True, figsize = (3, 3), dpi = 100, legend = False,
    annotate_style = 'text', annotate_fontsize = 8, ptsize = 2
)
../_images/scrna_e1-profile-cluster_23_0.png
[12]:
expm.rna.score_ulm(
    taxa = 'mmu',
    gene_sets = {
        'neu': ['S100a8', 'S100a9', 'Mpo'],
    },
    identifier = 'gene', # can be 'gene', 'uppercase', 'entrez', and 'ugene'
    lognorm = 'X',
    tmin = 0, # for small gene sets
)
[10]:
expm['rna'].obsm['score.ulm']
[10]:
neu
distal:2 14.599044
distal:3 19.084107
distal:4 21.918440
distal:8 19.078472
distal:9 20.233251
... ...
normal:4657 4.259452
normal:4658 22.427184
normal:4660 19.939915
normal:4661 20.791861
normal:4662 20.143090

9754 rows × 1 columns

[14]:
print(expm)
annotated data of size 9754 × 19651
subset mono-neutro of size 9754 × 19651
contains modalities: rna

 modality [rna]
    obs : sample <cat> <c/sample> batch <cat> <c/batch> group <cat> <c> modality <cat> <c/modality>
          taxa <cat> <c/taxa> barcode <o> <o> ubc <o> <o> n.umi <f64> <i> n.genes <i64> <i>
          n.mito <f64> <f> n.ribo <f64> <f> pct.mito <f64> <f> pct.ribo <f64> <f>
          filter <bool> <bool> score.doublet <f64> <f> score.doublet.se <f64> <f>
          is.doublet <bool> <bool> qc <bool> <bool/qc> leiden <cat> <c> sc3.5 <cat> <c>
          sc3.10 <cat> <c> sc3.20 <cat> <c> sc3.30 <cat> <c> cell.type <cat> <c>
          kde.umap <f64> <f/kde> psbulk <cat> <o> score.neu <f64> <f/coordinate/score>
    var : chr <cat> <c/chromosome> start <i64> <i> end <i64> <i> strand <cat> <c/strand> id <o> <o>
          subtype <cat> <c/gsubtype> gene <cat> <o/gene> tlen <f64> <i/tlen> cdslen <i64> <i/cdslen>
          assembly <cat> <c> uid <o> <o/ugene> vst.hvg <bool> <bool/hvg> vst.all.means <f64> <f>
          vst.all.vars <f64> <f> vst.all.vars.norm <f64> <f> vst.all.hvg.rank <f32> <f>
          vst.all.hvg <bool> <bool>
 layers : counts <f32> <i/counts> norm <f32> <f>
   obsm : cnmf.10 <df> <f/embedding/usage> harmony <arr:f32(35)> <f> knn <arr:i32(100)> <i/knni>
          knn.d <arr:f32(100)> <f/knnd> pca <arr:f64(35)> <f/embedding/pca>
          umap <arr:f32(2)> <f/embedding> score.ulm <df> <f/score-matrix>
          padj.ulm <df> <f/score-pval>
   varm : cnmf.10 <arr:f64(10)> <f/weights> cnmf.coef.10 <arr:f64(10)> <f/usage-coef>
          pca <arr:f64(35)> <f/weights>
   obsp : connectivities <csr:f32> <f/connectivity> distances <csr:f32> <f/distance>
    uns : cell.type.colors <o> cell.type_colors <o> cnmf <cnmf> cnmf.args <o>
          cnmf.density.10 <cnmf-density> cnmf.dist.10 <f/connectivity> cnmf.stats <cnmf-stats>
          commands <system> kde.umap <kde-stats> leiden <o> leiden.colors <o> markers <markers>
          neighbors <knn> pca <dict> sc3.10.colors <o> sc3.20.colors <o> sc3.30.colors <o>
          sc3.5.colors <o> slots <system> umap <o> ulm <dict/scoring/score-ulm>

[*] samples not loaded from disk.

ULM scoring achieves well to produce scores in small gene set

[13]:
fig = expm.rna.plot_embedding(
    basis = 'umap', color = 'score.neu',
    sort = True, figsize = (3, 3), dpi = 100, legend = False,
    annotate_style = 'text', annotate_fontsize = 8, ptsize = 2
)
../_images/scrna_e1-profile-cluster_28_0.png

GSVA is less robust on such small geneset.

[15]:
expm.rna.score_gsva(
    taxa = 'mmu',
    gene_sets = {
        'neu': ['S100a8', 'S100a9', 'Mpo'],
    },
    identifier = 'gene', # can be 'gene', 'uppercase', 'entrez', and 'ugene'
    lognorm = 'X',
    tmin = 0, # for small gene sets
)
[16]:
fig = expm.rna.plot_embedding(
    basis = 'umap', color = 'score.neu',
    sort = True, figsize = (3, 3), dpi = 100, legend = False,
    annotate_style = 'text', annotate_fontsize = 8, ptsize = 2
)
../_images/scrna_e1-profile-cluster_31_0.png
[17]:
expm.rna.score_aucell(
    taxa = 'mmu',
    gene_sets = {
        'neu': ['S100a8', 'S100a9', 'Mpo'],
    },
    identifier = 'gene', # can be 'gene', 'uppercase', 'entrez', and 'ugene'
    lognorm = 'X',
    tmin = 0, # for small gene sets
)
[ ]:
fig = expm.rna.plot_embedding(
    basis = 'umap', color = 'score.neu',
    sort = True, figsize = (3, 3), dpi = 100, legend = False,
    annotate_style = 'text', annotate_fontsize = 8, ptsize = 2
)

Differential gene expression between groups#

Data can be split based on two categorical variables to visualize group differences in gene expression.

[23]:
fig = expm.rna.plot_expression_bar(
    gene = 'S100a8', slot = 'X', group = 'cell.type', split = 'sample',
    selected_groups = None, selected_splits = ['niche', 'normal'], palette = ['red', 'black'],
    figsize = (5, 2), dpi = 100, style = 'violin',
    violin_kwargs = { 'split': True, 'inner': None }
)
[i] Neu, p = 0.000, D niche over normal
[i] MDP, p = 0.360, U niche over normal
[i] MM, p = 0.000, D niche over normal
[i] Mo, p = 0.113, U niche over normal
[i] DCp, p = 0.053, D niche over normal
[i] Prog, p = 0.726, U niche over normal
[i] iMac, p = 0.674, U niche over normal

../_images/scrna_e1-profile-cluster_35_1.png
[24]:
fig = expm.rna.plot_expression_bar(
    gene = 'S100a8', slot = 'X', group = 'cell.type', split = 'sample',
    selected_groups = None, selected_splits = ['niche', 'normal'], palette = ['red', 'black'],
    figsize = (5, 2), dpi = 100, style = 'box',
    violin_kwargs = { 'split': True, 'inner': None }
)
[i] Neu, p = 0.000, D niche over normal
[i] MDP, p = 0.360, U niche over normal
[i] MM, p = 0.000, D niche over normal
[i] Mo, p = 0.113, U niche over normal
[i] DCp, p = 0.053, D niche over normal
[i] Prog, p = 0.726, U niche over normal
[i] iMac, p = 0.674, U niche over normal

../_images/scrna_e1-profile-cluster_36_1.png

Saving the dataset#

Finally, save the changes we made.

[25]:
print(expm)
annotated data of size 9754 × 19651
subset mono-neutro of size 9754 × 19651
contains modalities: rna

 modality [rna]
    obs : sample <cat> <c/sample> batch <cat> <c/batch> group <cat> <c> modality <cat> <c/modality>
          taxa <cat> <c/taxa> barcode <o> <o> ubc <o> <o> n.umi <f64> <i> n.genes <i64> <i>
          n.mito <f64> <f> n.ribo <f64> <f> pct.mito <f64> <f> pct.ribo <f64> <f>
          filter <bool> <bool> score.doublet <f64> <f> score.doublet.se <f64> <f>
          is.doublet <bool> <bool> qc <bool> <bool/qc> leiden <cat> <c> sc3.5 <cat> <c>
          sc3.10 <cat> <c> sc3.20 <cat> <c> sc3.30 <cat> <c> cell.type <cat> <c>
          kde.umap <f64> <f/kde> psbulk <cat>
    var : chr <cat> <c/chromosome> start <i64> <i> end <i64> <i> strand <cat> <c/strand> id <o> <o>
          subtype <cat> <c/gsubtype> gene <cat> <o/gene> tlen <f64> <i/tlen> cdslen <i64> <i/cdslen>
          assembly <cat> <c> uid <o> <o/ugene> vst.hvg <bool> <bool/hvg> vst.all.means <f64> <f>
          vst.all.vars <f64> <f> vst.all.vars.norm <f64> <f> vst.all.hvg.rank <f32> <f>
          vst.all.hvg <bool> <bool>
 layers : counts <f32> <i/counts> norm <f32> <f>
   obsm : cnmf.10 <df> <f/embedding/usage> harmony <arr:f32(35)> <f> knn <arr:i32(100)> <i/knni>
          knn.d <arr:f32(100)> <f/knnd> pca <arr:f64(35)> <f/embedding/pca>
          umap <arr:f32(2)> <f/embedding>
   varm : cnmf.10 <arr:f64(10)> <f/weights> cnmf.coef.10 <arr:f64(10)> <f/usage-coef>
          pca <arr:f64(35)> <f/weights>
   obsp : connectivities <csr:f32> <f/connectivity> distances <csr:f32> <f/distance>
    uns : cell.type.colors cell.type_colors cnmf <cnmf> cnmf.args <o>
          cnmf.density.10 <cnmf-density> cnmf.dist.10 <f/connectivity> cnmf.stats <cnmf-stats>
          commands <system> kde.umap <kde-stats> leiden <o> leiden.colors <o> markers <markers>
          neighbors <knn> pca <dict> sc3.10.colors <o> sc3.20.colors <o> sc3.30.colors <o>
          sc3.5.colors <o> slots <system> umap <o> deg.c7 <markers> ora.c7 <ora> gsea.c7 <gsea>

[*] samples not loaded from disk.

[26]:
em.memory()
[i] resident memory: 1.94 GiB
[i] virtual memory: 18.14 GiB