abrollab/Structure-Based_Virtual_Screening_Workflow_v1.0
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Structure-Based_Virtual_Screening_Workflow_v1 Dylan CapittiFenton ⚛️ Last Updated: 3/30/26 This repository contains the following files. Each script contains a brief description of its use, input files, and an example run command below. Please note that dependencies must be installed prior to the provided command. 1) rmsf_backbone.tcl - Calculates the protein backbone root mean square fluctuation of MD simulation replicates. - Requires MD simulation topology and trajectory file(s). - Prior to running in AMBER directory edit "topfile" "trajfiles" and "outfile" to call to your files - Run: vmd -dispdev text -e rmsf_backbone.tcl 2) cpptraj.in - Clusters provided structures (MD frames or PDB files) using DBSCAN. - Requires topology and trajectory (files), or PDBs. - Prior to running in AMBER directory edit parm and trajin to call to your files - Note: for PDB clustering add every PDB as a trajin file and select one as parm (duplicate) - Note: for PDB clustering remove hydrogen atoms before clustering - Prior to running alter residue counts (i.e. replace 333 with your protein's length) - Optionally alter dbscan pinpoints and epsilon values - Run: cpptraj -i cpptraj.in 3) generate_glide_grids.py - Generates Glide grids for all structures with the center defined by provided residues. - Requires Prepared protein structure files (*.maegz). - Note: by default script searches parent directory for all structure files - Prior to running alter variables under "USER‐CONFIGURABLE PARAMETERS" - Run: $SCHRODINGER/run python3 generate_glide_grids.py 4) enrichment_analysis.py - Calculates early enrichment metrics for competing Glide grids. - Requires Glide VSW subjob output CSV files and ligand activity/library CSV file - LIGAND_ID and ACTIVITY columns with actives indicated 1, and decoys indicated by 0 - Optimized to run in a Python virtual environment (venv) with dependencies installed python3 -m venv venv source venv/bin/activate pip install pandas scikit-learn matplotlib - Run: ./enrichment_analysis.py -d . -l ligand_library.csv -o enrichment_v1 --missing_score 0.0 5) vsw_filtering_clustering.py - Takes ranked Glide hits and processes them through various filters. - Requires Glide VSW subjob output CSV files. - Optionally alter the top percentile retained (-p) and the Tanimoto clustering distance cut-off (-c) - Optimized to run in a Conda virtual environment (vls-pipeline) with dependencies installed - Installed: python=3.10 numpy=1.24 pandas rdkit -c conda-forge - Run: python vsw_filtering_clustering.py --vsw-dir . -p 1 -c 0.4 -o results_processed_v1.csv 6) analog_search_biosolveit.sh - Generates 300 analogs per ligand (100 analogs per tool). - Requires SMILES CSV and ligand space file (*.space) - Prior to running edit FT_BIN, SL_BIN, and SM_BIN paths to call to your program file paths - Optionally alter the number of hits retained (TOPN), and the similarity cut-off for each tool under "Config" - Run: ./analog_search_biosolveit.sh seed_ligands.smi analog_search.space 7) generate_glide_lids.py - Generates 2D ligand interaction diagrams for Glide docked poses - Requires a Glide pose viewer output file (*.maegz) - May require one to update the ligand ID recognition scheme (currently designed to only output analogs) - Run: $SCHRODINGER/run generate_glide_lids.py Glide_vsw-OUT_1_pv.maegz LID_pngs --top 100 --offscreen 8) dock_score_distribution.py - Scans Glide subjob CSV output files and generates a dock score histogram data file. - Optionally alter bin-width (1.0 or 0.5 recommended) - Run: python3 dock_score_distribution.py --vsw-dir . --score-col r_i_docking_score --dedupe-by title --dedupe-policy best --bin-width 1.0 --counts-out dockscore_counts.csv --debug-report dockscore_debug.csv 9) vsw_analog_fraction_improved.py - Calculates fraction improved from actives and associated analog dock scores. - Requires Glide subjob CSV output files and a list of the analog seeds for comparison (inputs-csv) - CSV file automatically generated by script 6 above - Run: python3 vsw_analog_fraction_improved.py --vsw-dir . --inputs-csv inputs.csv --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv" --score-col r_i_docking_score --tools "spacelight,spacemacs,ftrees" --details-csv fraction_improved_details_dedup.csv 10) vsw_best_analog.py - Calculates the best dock score improvement from actives and associated best analog dock scores. - Requires Glide subjob CSV output files. - Run: python3 vsw_best_analog.py --vsw-dir . --inputs-csv inputs.csv --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv" --inputs-skip-rows 0 --subjobs-glob "**/*vsw-DOCK*.csv" --score-col r_i_docking_score --out-csv out_v1.csv 11) calculate_molecular_properties.py - Takes SMILES strings and outputs associated chemical properties. - Requires SMILES CSV and RDKit to be installed - Run: python3 calculate_molecular_properties.py ligs.csv -o ligs_props.csv -s SMILES 12) centerofmass.py - Calculates ligand center of mass for all ligands in the working directory using PyMOL - Requires ligand structure files (*.mol2) - Run: pymol -cq -r centerofmass.py