Usage
Overview
In this section, we provide a high-level overview of the CDR-g workflow.
As input, CDR-g requires a pre-prepared anndata object. Genes can be filtered based on variance or count criteria to reduce computation time. The condition of interest should be a column in the anndata.obs dataframe.
CDR uses data from the count matrix (anndata.X) to construct co-expression matrices. Count data should be scaled and log-transformed.
The two steps below will (1) run the CDR-g analysis to produce gene expression programs and (2) perform single cell enrichment on each gene expression program recovered by CDR-g. The output of CDR-g is a dictionary of gene lists, with each list representing a gene expression program which varies between the conditions of interest.
from pycdr.pycdr import run_CDR_analysis
from pycdr.perm import calculate_enrichment
run_CDR_analysis(anndata_object, condition_of_interest)
calculate_enrichment(anndata_object)
Example (snakemake) workflows
Three example snakemake workflows are provided in a separate repository. These workflows generate the results and describe preprocessing steps for each dataset in the manuscript. These CDR-g analyses use the visualisation and preprocessing functions provided in other single cell packages.
To run the full workflows, please install scanpy, bbknn (to allow dataset integration) and enrichment_utils (a simple wrapper around goatools to allow enrichment analysis on anndata objects analysed by CDR-g).
pip install snakemake scanpy[leiden] bbknn enrichment_utils
Warning
The workflows download large datasets from GEO.
Walkthrough analysis
An annotated example is provided here: Monocyte dataset