data

PDB files for training and testing on AlphaFold2 predictions datset.

As the feature data is extremely large (can reach over ~1GB for a single sequence) and final PDB outputs are not sufficient for training, it would be more applicable for users to generate such features using the scripts provide in the EnQA repo. To generate the required data for training, we provide the fasta format sequence for our training data and they can be used as input with the following command by AlphaFold2:

python3 docker/run_docker.py \
 --fasta_paths=T1050.fasta \
 --max_template_date=2020-05-14 \
 --model_preset=monomer \
 --db_preset=full_dbs \
 --data_dir=$DOWNLOAD_DIR

We also provide the native PDBs which are required for generating the labels.

To generate the labels and features when you have the predicted results from AlphaFold and the corresponding native PDBs, run process.py with the following command:

python3 process.py --input example/model/6KYTP/test_model.pdb --label_pdb example/label/6KYTP.pdb --output outputs/processed --alphafold_prediction example/alphafold_prediction/6KYTP/

Code in train.py now provides a basic framework to train the EGNN_full model with Pytorch, After all feature files for training and validation are generated, suppose the processed features files(in .pt format) are saved in path/to/train/ and path/to/validation/, here is an example to train the model:

python3 train.py --train path/to/train/ --validation path/to/validation/ --output outputs/ --epochs 60

The training script can be modified to include the callbacks, logging and monitoring for the training procedure fit for user’s environment. Note that while EnQA can be called on PDBs with multiple chains, the training was performed in monomer datasets, and support for complexes would be added in the future.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
af2_test.txt		af2_test.txt
af2_train.txt		af2_train.txt
loader.py		loader.py
process_alphafold.py		process_alphafold.py
process_label.py		process_label.py
seqs.tar.gz		seqs.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

PDB files for training and testing on AlphaFold2 predictions datset.

FilesExpand file tree

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

PDB files for training and testing on AlphaFold2 predictions datset.