scvi-tools Deep Learning Skill

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.

How to Use This Skill

Identify the appropriate workflow from the model/workflow tables below
Read the corresponding reference file for detailed steps and code
Use scripts in scripts/ to avoid rewriting common code
For installation or GPU issues, consult references/environment_setup.md
For debugging, consult references/troubleshooting.md

When to Use This Skill

When scvi-tools, scVI, scANVI, or related models are mentioned
When deep learning-based batch correction or integration is needed
When working with multi-modal data (CITE-seq, multiome)
When reference mapping or label transfer is required
When analyzing ATAC-seq or spatial transcriptomics data
When learning latent representations of single-cell data

Model Selection Guide

Data Type	Model	Primary Use Case
scRNA-seq	scVI	Unsupervised integration, DE, imputation
scRNA-seq + labels	scANVI	Label transfer, semi-supervised integration
CITE-seq (RNA+protein)	totalVI	Multi-modal integration, protein denoising
scATAC-seq	PeakVI	Chromatin accessibility analysis
Multiome (RNA+ATAC)	MultiVI	Joint modality analysis
Spatial + scRNA reference	DestVI	Cell type deconvolution
RNA velocity	veloVI	Transcriptional dynamics
Cross-technology	sysVI	System-level batch correction

Workflow Reference Files

Workflow	Reference File	Description
Environment Setup	`references/environment_setup.md`	Installation, GPU, version info
Data Preparation	`references/data_preparation.md`	Formatting data for any model
scRNA Integration	`references/scrna_integration.md`	scVI/scANVI batch correction
ATAC-seq Analysis	`references/atac_peakvi.md`	PeakVI for accessibility
CITE-seq Analysis	`references/citeseq_totalvi.md`	totalVI for protein+RNA
Multiome Analysis	`references/multiome_multivi.md`	MultiVI for RNA+ATAC
Spatial Deconvolution	`references/spatial_deconvolution.md`	DestVI spatial analysis
Label Transfer	`references/label_transfer.md`	scANVI reference mapping
scArches Mapping	`references/scarches_mapping.md`	Query-to-reference mapping
Batch Correction	`references/batch_correction_sysvi.md`	Advanced batch methods
RNA Velocity	`references/rna_velocity_velovi.md`	veloVI dynamics
Troubleshooting	`references/troubleshooting.md`	Common issues and solutions

CLI Scripts

Modular scripts for common workflows. Chain together or modify as needed.

Pipeline Scripts

Script	Purpose	Usage
`prepare_data.py`	QC, filter, HVG selection	`python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch`
`train_model.py`	Train any scvi-tools model	`python scripts/train_model.py prepared.h5ad results/ --model scvi`
`cluster_embed.py`	Neighbors, UMAP, Leiden	`python scripts/cluster_embed.py adata.h5ad results/`
`differential_expression.py`	DE analysis	`python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden`
`transfer_labels.py`	Label transfer with scANVI	`python scripts/transfer_labels.py ref_model/ query.h5ad results/`
`integrate_datasets.py`	Multi-dataset integration	`python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad`
`validate_adata.py`	Check data compatibility	`python scripts/validate_adata.py data.h5ad --batch-key batch`

Example Workflow

# 1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

# 2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

# 3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

# 4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

# 5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden

Python Utilities

The scripts/model_utils.py provides importable functions for custom workflows:

Function	Purpose
`prepare_adata()`	Data preparation (QC, HVG, layer setup)
`train_scvi()`	Train scVI or scANVI
`evaluate_integration()`	Compute integration metrics
`get_marker_genes()`	Extract DE markers
`save_results()`	Save model, data, plots
`auto_select_model()`	Suggest best model
`quick_clustering()`	Neighbors + UMAP + Leiden

Critical Requirements

Raw counts required: scvi-tools models require integer count data

adata.layers["counts"] = adata.X.copy()  # Before normalization
scvi.model.SCVI.setup_anndata(adata, layer="counts")

HVG selection: Use 2000-4000 highly variable genes

sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
adata = adata[:, adata.var['highly_variable']].copy()

Batch information: Specify batch_key for integration

scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")

Quick Decision Tree

Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)

Kwp Bio Research Scvi Tools

Cloud-native skill

Documentation

scvi-tools Deep Learning Skill

How to Use This Skill

When to Use This Skill

Model Selection Guide

Workflow Reference Files

CLI Scripts

Pipeline Scripts

Example Workflow

Python Utilities

Critical Requirements

Quick Decision Tree

Key Resources

Skill as a Service

Compatible agents

Related Skills

kwp-bio-research-instrument-data-to-allotrope

kwp-bio-research-nextflow-development

kwp-bio-research-scientific-problem-selection

kwp-bio-research-single-cell-rna-qc