Skip to content

abrollab/Structure-Based_Virtual_Screening_Workflow_v1.0

Repository files navigation

Structure-Based_Virtual_Screening_Workflow_v1
Dylan CapittiFenton ⚛️
Last Updated: 3/30/26

This repository contains the following files. Each script contains a brief description of its use, input files, and an example run command below. Please note that dependencies must be installed prior to the provided command.

1) rmsf_backbone.tcl
	- Calculates the protein backbone root mean square fluctuation of MD simulation replicates.
	- Requires MD simulation topology and trajectory file(s).
	- Prior to running in AMBER directory edit "topfile" "trajfiles" and "outfile" to call to your files
	- Run: vmd -dispdev text -e rmsf_backbone.tcl

2) cpptraj.in
	- Clusters provided structures (MD frames or PDB files) using DBSCAN.
	- Requires topology and trajectory (files), or PDBs.
	- Prior to running in AMBER directory edit parm and trajin to call to your files
		- Note: for PDB clustering add every PDB as a trajin file and select one as parm (duplicate)
		- Note: for PDB clustering remove hydrogen atoms before clustering
	- Prior to running alter residue counts (i.e. replace 333 with your protein's length)
	- Optionally alter dbscan pinpoints and epsilon values
	- Run: cpptraj -i cpptraj.in

3) generate_glide_grids.py
	- Generates Glide grids for all structures with the center defined by provided residues.
	- Requires Prepared protein structure files (*.maegz).
	- Note: by default script searches parent directory for all structure files
	- Prior to running alter variables under "USER‐CONFIGURABLE PARAMETERS"
	- Run: $SCHRODINGER/run python3 generate_glide_grids.py

4) enrichment_analysis.py
	- Calculates early enrichment metrics for competing Glide grids.
	- Requires Glide VSW subjob output CSV files and ligand activity/library CSV file
		- LIGAND_ID and ACTIVITY columns with actives indicated 1, and decoys indicated by 0
	- Optimized to run in a Python virtual environment (venv) with dependencies installed
		python3 -m venv venv
		source venv/bin/activate
		pip install pandas scikit-learn matplotlib
	- Run: ./enrichment_analysis.py -d . -l ligand_library.csv -o enrichment_v1 --missing_score 0.0

5) vsw_filtering_clustering.py
	- Takes ranked Glide hits and processes them through various filters.
	- Requires Glide VSW subjob output CSV files.
	- Optionally alter the top percentile retained (-p) and the Tanimoto clustering distance cut-off (-c)
	- Optimized to run in a Conda virtual environment (vls-pipeline) with dependencies installed
		- Installed: python=3.10 numpy=1.24 pandas rdkit -c conda-forge
	- Run: python vsw_filtering_clustering.py --vsw-dir . -p 1 -c 0.4 -o results_processed_v1.csv

6) analog_search_biosolveit.sh
	- Generates 300 analogs per ligand (100 analogs per tool).
	- Requires SMILES CSV and ligand space file (*.space)
	- Prior to running edit FT_BIN, SL_BIN, and SM_BIN paths to call to your program file paths
	- Optionally alter the number of hits retained (TOPN), and the similarity cut-off for each tool under "Config"
	- Run: ./analog_search_biosolveit.sh seed_ligands.smi analog_search.space

7) generate_glide_lids.py
	- Generates 2D ligand interaction diagrams for Glide docked poses
	- Requires a Glide pose viewer output file (*.maegz)
	- May require one to update the ligand ID recognition scheme (currently designed to only output analogs)
	- Run: $SCHRODINGER/run generate_glide_lids.py Glide_vsw-OUT_1_pv.maegz LID_pngs --top 100 --offscreen

8) dock_score_distribution.py
	- Scans Glide subjob CSV output files and generates a dock score histogram data file.
	- Optionally alter bin-width (1.0 or 0.5 recommended)
	- Run: python3 dock_score_distribution.py --vsw-dir . --score-col r_i_docking_score --dedupe-by title --dedupe-policy best --bin-width 1.0 --counts-out dockscore_counts.csv --debug-report dockscore_debug.csv

9) vsw_analog_fraction_improved.py
	- Calculates fraction improved from actives and associated analog dock scores.
	- Requires Glide subjob CSV output files and a list of the analog seeds for comparison (inputs-csv)
		- CSV file automatically generated by script 6 above
	- Run: python3 vsw_analog_fraction_improved.py --vsw-dir . --inputs-csv inputs.csv --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv" --score-col r_i_docking_score --tools "spacelight,spacemacs,ftrees" --details-csv fraction_improved_details_dedup.csv

10) vsw_best_analog.py
	- Calculates the best dock score improvement from actives and associated best analog dock scores.
	- Requires Glide subjob CSV output files.
	- Run: python3 vsw_best_analog.py --vsw-dir . --inputs-csv inputs.csv --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv"  --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv" --score-col r_i_docking_score --out-csv out_v1.csv

11) calculate_molecular_properties.py
	- Takes SMILES strings and outputs associated chemical properties.
	- Requires SMILES CSV and RDKit to be installed
	- Run: python3 calculate_molecular_properties.py ligs.csv -o ligs_props.csv -s SMILES

12) centerofmass.py
	- Calculates ligand center of mass for all ligands in the working directory using PyMOL
	- Requires ligand structure files (*.mol2)
	- Run: pymol -cq -r centerofmass.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors