Help System#
The exprmat package provides a comprehensive help system accessible through the em.help() function and list_available_tools() accessor methods. This notebook demonstrates how to use the built-in documentation system to explore available functions, understand their parameters, and discover the capabilities of different modalities. The help system is fully integrated with the introspective pipeline framework, allowing users to inspect tool contracts including required inputs, outputs, and
parameter specifications without leaving the Python environment.
[2]:
import exprmat as em
[7]:
em.help(em.experiment)
experiment
exprmat.reader.experiment
experiment(
meta: exprmat.reader.metadata.metadata,
dump = '.',
parallel = False,
eccentric = None,
save_simultaneously = True,
version = 39,
subset = None,
mudata = None,
modalities = {},
attachments = {}
)
Create the experiment class. Parameters that are not recorded below are left for internal use. Do not set them when creating the experiment unless you know what you are doing. Typically, you should only initialize an experiment with a metadata table, and load on-disk dumps via :func:
load_experiment import exprmat as em expm = em.experiment(meta, dump = 'dir', save_simultaneously = True) expm.save()or load an existing experiment via
expm = em.load_experiment('dir')Parameters
meta : exprmat.metadata - The metadata table to instruct the construction of experiment
eccentric : None | Callable = None - A short translator function if the gene names in the source matrix may not be one of (1) Unversioned ENSEMBL index, (2) Gene names, (3) Exprmat Unique Gene Index. For example, for raw data giving variable names as 'ENSMUSG000000001.23', this will not be recognized by the built-in reference database for M. Musculus, and you should pass
eccentric = lambda x: x.split('.')[0]dump : str = '.' = '.' - Location to save the experiment when save() was called. You might change the save location explicitly in the save(fpath) method, but a default value is better given here. See save_simultaneously below.
save_simultaneously : bool = True = True - Write the sample to disk once it is transformed into exprmat-compatible version. When your experiment contains many samples, it is better to turn this to True, since it will skip the saved work when restarted from interruption.
parallel : bool | int = False = False - False turns off parallelization when loading samples, or use <int> workers to load data simultaneously. (Better turned off when sample < 6)
[9]:
from exprmat.reader.rna.reduction import rna_scvi
em.help(rna_scvi)
rna_scvi
exprmat.reader.rna.reduction
rna_scvi(
adata,
sample_name,
key_added = 'scvi',
batch = None,
n_comps = 30,
hvg = 'vst.hvg',
key_counts = 'counts',
savepath = '',
seeding = False,
imputed_label_key = 'cell.type.imputed',
**kwargs
)
Compute a scVI or scANVI latent space representation. Trains a variational autoencoder on the HVG count matrix, optionally with semi-supervised imputation of cell labels (scANVI).
Parameters
run_on_samples : bool | list[str] | str, default False - Samples to operate on (see :meth:
run_rna_qc for semantics).key_added : str, default 'scvi' = 'scvi' - Key under which the latent coordinates are stored in
obsm.batch : str | None, default None = None -
adata.obs column to use as the batch key.n_comps : int, default 30 = 30 - Dimensionality of the latent space (
n_latent in scvi-tools).hvg : str, default 'vst.hvg' = 'vst.hvg' -
adata.var column that flags highly variable genes used for model training.key_counts : str, default 'counts' = 'counts' - Layer containing raw integer counts.
seeding : bool, default False = False - When
True, runs scANVI instead of scVI (requires label_key in kwargs to specify the partial labels column).imputed_label_key : str, default 'cell.type.imputed' = 'cell.type.imputed' -
adata.obs key to store SCANVI-imputed cell-type labels. Only used when seeding=True.label_key : str - Only required when
seeding=True. adata.obs column containing the known partial cell-type labels for scANVI.Model arguments
n_hidden : int = 128 - Multilayer perceptron architecture parameters for the scVI encoder and decoder.
n_layers : int = 3 - Multilayer perceptron architecture parameters for the scVI encoder and decoder.
dropout_rate : float = 0.1 - Dropout rate for the scVI encoder and decoder.
dispersion : Literal['gene', 'gene-batch', 'gene-label', 'gene-cell'] = "gene" - Dispersion parameterization for the scVI likelihood. See scvi-tools documentation for details.
gene_likelihood : Literal['zinb', 'nb', 'poisson', 'normal'] = "zinb" - Likelihood for modeling gene expression in scVI. See scvi-tools documentation for details.
use_observed_lib_size : bool = True - Whether to use the observed library size for each cell or to model it as a latent variable in scVI.
latent_distribution : Literal['normal', 'ln'] = "normal" - Whether to use a normal or log-normal distribution for the scVI latent space.
Trainer arguments
max_epochs : int | None = None - Maximum number of training epochs for the scVI model. If None, uses scvi-tools default.
accelerator : str = "auto" - Accelerator to use for training. Options are "cpu", "cuda", or "auto".
devices : int | list[int] | str = "auto" - Devices to use for training. Options are an integer, a list of integers, or "auto".
train_size : float | None = None - Proportion of the dataset to use for training. If None, uses scvi-tools default.
validation_size : float | None = None - Proportion of the dataset to use for validation. If None, uses scvi-tools default.
shuffle_set_split : bool = True - Whether to shuffle the dataset before splitting into training and validation sets.
load_sparse_tensor : bool = False - Whether to load the data as a sparse tensor.
batch_size : int = 128 - Batch size for training.
early_stopping : bool = False - Whether to use early stopping during training.
datasplitter_kwargs : dict | None = None - Additional arguments for the data splitter.
plan_kwargs : dict | None = None - Additional arguments for the training plan.
Seeding arguments
Example
An example call that is used by human lung cell atlas developers to accomodate scVI for very large-scale data integration:
rna.scvi(
n_hidden = 128,
n_layers = 3,
dropout_rate = 0.1,
dispersion = "gene",
gene_likelihood = "zinb",
max_epochs = 500,
key_added = 'scvi', batch = 'batch', n_comps = 30,
hvg = 'vst.hvg', key_counts = 'counts',
early_stopping = True,
scvi_train_kwargs = {
'early_stopping_monitor': 'elbo_validation',
'early_stopping_min_delta': 0.0,
'early_stopping_patience': 9,
'early_stopping_warmup_epochs': 0,
'early_stopping_mode': 'min',
},
plan_kwargs = {
'reduce_lr_on_plateau': True,
'lr_patience': 8,
'lr_factor': 0.1,
},
seeding = False,
)
- Adds scVI latent representation to
adata.obsm[key_added]. - If
seeding = True, also adds scANVI-imputed labels toadata.obs[imputed_label_key].