case_studies

vHold Case Studies

This directory contains worked examples demonstrating vHold's annotation capabilities.

Case Study Overview

#	Name	Purpose	Proteins	Status
1	SARS-CoV-2	Pipeline validation	18	✅ Complete
2	Remote Homology	Demonstrate annotation at <30% identity	TBD	🔨 In Progress

Case Study 1: SARS-CoV-2 Proteome Validation

Purpose: Validate that the vHold pipeline functions correctly using well-characterized proteins.

Key Results:

10/18 proteins annotated (55.6%)
100% accuracy on structural proteins (S, N, M, E)
7/10 proteins with cross-database consensus

Limitations: SARS-CoV-2 is too well-studied to demonstrate vHold's remote homology detection capabilities. This serves as a validation benchmark, not a discovery demonstration.

Full documentation →

Case Study 2: Remote Homology Discovery (In Progress)

Purpose: Demonstrate vHold's ability to annotate divergent viral proteins where sequence-based methods (BLAST/DIAMOND) fail.

Key Insight: Foldseek already reports sequence identity for every hit. We can stratify results by identity bins without running BLAST:

Identity Bin	Range	BLAST Status
easy	>50%	Works fine
moderate	30-50%	Marginal
remote	20-30%	Fails
twilight	<20%	Structure only

Approach:

Analyze any vHold run by identity stratification
Count successful annotations in each bin
Proteins annotated at <30% identity demonstrate vHold's unique value

Analysis Script: remote_homology/analyze_identity.py

Full documentation →

Running Case Studies

# Case Study 1: SARS-CoV-2
cd case_studies/sars_cov_2
python run_case_study.py -o results/ --device cuda

# View results
cat results/case_study_report.md

Creating New Case Studies

Each case study should include:

Input FASTA - Protein sequences to annotate
Ground truth - Known functions for evaluation (JSON format)
Run script - Automated execution and evaluation
Documentation - README with methods, results, and interpretation

Template structure:

case_studies/new_study/
├── README.md
├── input.fasta
├── ground_truth.json
├── run_case_study.py
└── results/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

vHold Case Studies

Case Study Overview

Case Study 1: SARS-CoV-2 Proteome Validation

Case Study 2: Remote Homology Discovery (In Progress)

Running Case Studies

Creating New Case Studies

Name		Name	Last commit message	Last commit date
parent directory ..
remote_homology		remote_homology
sars_cov_2		sars_cov_2
README.md		README.md

FilesExpand file tree

case_studies

Directory actions

More options

Directory actions

More options

Latest commit

History

case_studies

Folders and files

parent directory

README.md

vHold Case Studies

Case Study Overview

Case Study 1: SARS-CoV-2 Proteome Validation

Case Study 2: Remote Homology Discovery (In Progress)

Running Case Studies

Creating New Case Studies