DemoTape is a computational demultiplexing method for targeted single-cell DNA sequencing (scDNA-seq) data, namely MissioBio Tapestri data, based on a distance metric between individual cells at single-nucleotide polymorphisms loci.
The corresponding preprint can be found here.
- Python >=v3.X
- Snakemake >=8.X
All other requirements are installed automatically via Snakemake in separate conda envs.
The following resources need to be
- The annotation file (.bed) for the used Tapestri panel
- dbsnp file (.bed or .txt) for the used reference genome (e.g., hg19)
To run only DemoTape, you can run:
python workflow/scripts/run_demoTape.py -i <VARIANTS.VCF> -n <NO_SAMPLES>where <VARIANTS.VCF> is the .csv file produced by the MissionBio Mosaic Pipeline.
Alternatively, starting from the loom file, you can also first run
python workflow/scripts/mosaic_preprocessing.py -i <INPUT.LOOM>(This is what happens if the whole DemoTape pipeline is run)
The whole DemoTape analysis pipeline can be executed via:
snakemake
-s workflow/Snakefile_analysis
-j 500
--configfile configs/MS1_analysis.yaml
--executor slurm
--rerun-incomplete
--drop-metadata
-k
--use-condaAccording to the running environment (local/HPC), the executor needs to be adjusted.
In the config file, the following variables need to be specified:
analysis:
specific:
input-dir: <INPUT_DIR>
output-dir: <OUTPUT_DIR>
general:
panel_annotation: resources/<ANNOTATED_TAPESTRI_PANEL>.bed
output:
prefix: <PREFIX>
The Tapestri panel file can be annotated (i.e., gene names assigned to loci) via BED Annotation.
Additionally, to run downstream analysis with BnpC or COMPASS, the corresponding software needs to be downloaded and the py/exe files. specified
To run the simulation pipeline, execute:
snakemake
-s workflow/Snakefile_simulations
-j 500
--configfile configs/simulations.yaml
--executor slurm
--rerun-incomplete
--drop-metadata
-k
--use-condawhere input-looms as well as the exe files for souporcell and scSplit needs to be adjusted