Composition Analysis#

Analysis of cellular composition and modules

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import exprmat as em
# set working directory
em.setwd('../../../data')
ver = em.version()

[i] exprmat 0.2.66 / exprmat-db 0.2.66
[i] os: posix (linux)  platform version: 6.8.0-90-generic
[i] loaded configuration from /home/data/yangz/.exprmatrc
[i] current working directory: /home/data/yangz/packages/exprmat/data
[i] current database directory: /home/data/yangz/packages/database (0.2.66)
[i] resident memory: 778.86 MiB
[i] virtual memory: 5.95 GiB

[ ]:

expm = em.load_experiment('... hidden ...', load_samples = False)

[!] samples are not dumped in the experiment directory.

scCODA#

scCODA offers Bayesian estimate of compositional change across conditions with a pre-defined cluster that assumes no change between conditions. scCODA requires duplicates in experiment design

[8]:

expm.rna.sccoda(
    major = 'sample', minor = 'cell.type',
    categories = ['group'], agg = lambda x: x.value_counts().idxmax(),
    formula = 'group', num_samples = 10000, num_warmup = 1000, seed = 42,
    key_added = 'sccoda'
)

[i] automatic reference selection: reference cell type set to Mast
[i] zero counts encountered in data! Added a pseudocount of 0.5.

[9]:

expm['rna'].uns['sccoda.effects'].reset_index()

[9]:

	index	Covariate	Cell Type	Final Parameter	HDI 3%	HDI 97%	SD	Inclusion probability	Expected Sample	log2-fold change
0	0	groupT.ln-neg	B	2.419516	1.329	3.477	0.559	1.0000	3718.928437	3.385593
1	1	groupT.ln-neg	EC	0.000000	-1.969	0.312	0.598	0.5165	201.476927	-0.105031
2	2	groupT.ln-neg	Epi	0.000000	-2.353	0.008	0.754	0.8335	626.203168	-0.105031
3	3	groupT.ln-neg	Fib/SMC	-1.501326	-2.680	0.001	0.847	0.8570	79.467573	-2.270986
4	4	groupT.ln-neg	Mast	0.000000	0.000	0.000	0.000	0.0000	175.155625	-0.105031
5	5	groupT.ln-neg	Mye	-1.480736	-2.662	0.001	0.839	0.8475	134.685051	-2.241281
6	6	groupT.ln-neg	Neu	-2.839706	-4.069	-1.675	0.631	1.0000	126.972284	-4.201861
7	7	groupT.ln-neg	Plasma	0.000000	-1.790	0.350	0.538	0.4855	314.402769	-0.105031
8	8	groupT.ln-neg	T	0.000000	-0.029	1.723	0.571	0.6337	3280.249834	-0.105031
9	9	groupT.ln-pos	B	3.494247	2.555	4.426	0.488	1.0000	2587.151433	2.862070
10	10	groupT.ln-pos	EC	0.000000	-1.567	0.755	0.432	0.3674	47.849719	-2.179063
11	11	groupT.ln-pos	Epi	0.000000	-1.743	0.568	0.485	0.4071	148.719986	-2.179063
12	12	groupT.ln-pos	Fib/SMC	0.000000	-2.269	0.251	0.694	0.5753	84.695735	-2.179063
13	13	groupT.ln-pos	Mast	0.000000	0.000	0.000	0.000	0.0000	41.598547	-2.179063
14	14	groupT.ln-pos	Mye	0.000000	-2.226	0.097	0.718	0.6206	140.620567	-2.179063
15	15	groupT.ln-pos	Neu	-2.071768	-3.152	-0.922	0.585	0.9985	64.994181	-5.167992
16	16	groupT.ln-pos	Plasma	0.000000	-1.211	0.851	0.302	0.2940	74.669018	-2.179063
17	17	groupT.ln-pos	T	1.948464	1.087	2.776	0.443	1.0000	5467.242481	0.631977

CoVarNet#

CoVarNet provides a standardized method for cellular module calling. It normalize subtypes according to major types (that may have similar loss ratio during sample preparation), and constructs correlation network for cellular compositions

[10]:

expm.rna.annotate(
    cluster = 'cell.type.fine',
    mapping = {
        'B': ['B.a', 'B.ae', 'B.gc', 'B.m', 'B.n', 'B.n.str', 'B.str', 'PC/PB', 'PB.pre'],
        'Fib': ['CAF.Pi15', 'CAF.Wnt', 'Peri', 'SMC', 'apCAF', 'cCAF', 'iCAF', 'pCAF'],
        'T': ['MAIT', 'T.dn', 'T4', 'T4.fh', 'T4.m.Klf2/6', 'T4.m.Runx1', 'T4.n', 'T4.ph', 'T4.reg', 'T4.reg.a', 'T4.cm',
              'T8.n', 'T8.cm', 'T8.em', 'T8.ex', 'T8.Zbtb46'],
        'Mye': ['Mo', 'Mo.Vegf', 'Mac.C1q', 'Mac.Spp1', 'cDC1/2', 'cDC3', 'moDC', 'pDC']
    }
)

cell.type
T         53594
B         29837
Neu        8539
Epi        3229
Mye        3174
Fib        2251
NK/ILC     1612
EC          972
Mast        682
Name: count, dtype: int64

[ ]:

expm.rna.covar_module_cophenetic(
    col_sample = 'sample', col_major = 'cell.type', col_fine = 'cell.type.fine',
    nmf_clusters = [2, 3, 4, 5, 6, 7, 8, 9, 10], n_run = 30, max_iter = 300,
    key_added = 'covarnet'
)

[17]:

expm['rna'].uns['covarnet.coph']

[17]:

	cluster	rss	evar	cophenetic
0	2	35.222438	0.648837	0.999879
1	3	28.013027	0.720714	0.996009
2	4	23.347266	0.767231	0.998115
3	5	19.959535	0.801006	0.997059
4	6	17.024616	0.830267	0.998060
5	7	14.863218	0.851816	0.993559
6	8	13.080866	0.869586	0.984591
7	9	11.259840	0.887741	0.998228
8	10	10.143420	0.898871	0.995887

[18]:

from exprmat.plotting.utils import line
fig = line(
    expm['rna'].uns['covarnet.coph']['cluster'], expm['rna'].uns['covarnet.coph']['cophenetic'],
    xlabel = 'Clusters', ylabel = 'Cophenetic'
)

../_images/scrna_n1-composition_11_0.png

[ ]:

expm.rna.covar_module(
    col_sample = 'sample', col_major = 'cell.type', col_fine = 'cell.type.fine',
    nmf_clusters = 6, n_run = 10, max_iter = 300,
    key_added = 'covarnet'
)

[20]:

print(expm)

annotated data of size 103890 × 30216
integrated dataset of size 103890 × 30216
contains modalities: rna

 modality [rna]
    obs : sample <cat> batch <cat> group <cat> modality <cat> taxa <cat> patient <cat> barcode <o>
          ubc <o> n.umi <f64> n.genes <i64> n.mito <f64> n.ribo <f64> pct.mito <f64> pct.ribo <f64>
          filter <bool> score.doublet <f64> score.doublet.se <f64> is.doublet <bool> qc <bool>
          kde.group <f64> leiden <cat> cell.type <cat> <c> cell.type.fine <cat>
    var : chr <cat> start <i64> end <i64> strand <cat> id <o> subtype <cat> gene <cat> tlen <f64>
          cdslen <i64> assembly <cat> uid <o> vst.means <f64> vst.vars <f64> vst.vars.norm <f64>
          vst.hvg.rank <f32> vst.hvg <bool>
 layers : ambiguous <f64> counts <f32> spliced <f64> unspliced <f64>
   obsm : harmony <arr:f64(20)> knn.d.nn <arr:f32(30)> knn.d.nn.harmony <arr:f32(30)>
          knn.nn <arr:i32(30)> knn.nn.harmony <arr:i32(30)> pca <arr:f64(50)> pca.20 <arr:f64(20)>
          scvi <arr:f32(30)> umap <arr:f32(2)> umap.harmony <arr:f32(2)>
   varm : pca <arr:f64(50)>
   obsp : connectivities.nn <csr:f32> connectivities.nn.harmony <csr:f32> distances.nn <csr:f32>
          distances.nn.harmony <csr:f32>
    uns : cell.type.colors cell.type.fine.colors deg group.colors kde.group leiden leiden.colors nn
          nn.harmony pca scvi umap umap.harmony sccoda.intercepts sccoda.effects commands <system>
          slots <system> covarnet.coph covarnet.corr covarnet <covarnet> covarnet.comp
          covarnet.usage

[*] samples not loaded from disk.

[21]:

fig = expm.rna.plot_covarnet(
    key = 'covarnet', node_size = 500, cmap = 'category20c', cmap_edge = 'reds/r',
    figsize = (7, 7), dpi = 80
)

../_images/scrna_n1-composition_14_0.png

[23]:

fig = expm.rna.plot_covarnet_components(
    key = 'covarnet', figsize = (2.3, 10)
)

../_images/scrna_n1-composition_15_0.png

[27]:

fig = expm.rna.plot_covarnet_usages(
    key = 'covarnet', figsize = (3.7, 2.3)
)

../_images/scrna_n1-composition_16_0.png