patho-detection-plm

Pathogenicity detection using protein language models

Abstract

This project presents a machine learning model designed to analyze human protein sequences and classify single amino-acid polymorphisms as either benign or pathogenic. By training on curated datasets of known benign and pathogenic variants, the model provides a tool for researchers in R&D, medical geneticists, and clinicians engaged in embryo screening or infant genetic testing. Its key functionality lies in detecting mutations within amino acid sequences and predicting their potential pathogenic impact. The model leverages the transformers library (built on PyTorch) for sequence analysis and employs Weights & Biases for performance tracking and visualization. This approach offers a scalable, data-driven method to assist in genetic screening and early diagnosis, with potential applications in precision medicine and healthcare research.

Installation

To install with conda, run:

conda create -n demo python
conda activate demo
pip install Demo/requirements.txt

Quick Start and Usage

# To see a pre-selected set of protien sequence predictions, run:
python ./Demo/Demo.py 

# If you would like to predict pathogenicity for sequences of your own, enter them into the console when prompted and separate them by comma:
python ./Demo/Demo.py --interact

# Our selection of trained models can be specified as followed:
python ./Demo/Demo.py --model_type [esm-fz, esm-ft, esm-fz+mf, or esm-ft+mf]

# Inference can be run on available gpus with:
python ./Demo/Demo.py --device gpu

Organization

This repo is organized into:

Data, datasets used to train the model.
Demo, a notebook and cli to run existing models.
Models, the actual models being run.
Training, the script defining model classes and training procedure.

Contribute

Contributions are welcome! If you'd like to contribute, please open an issue or submit a pull request. See the contribution guidelines for more information.

Support

If you have any issues or need help, please open an issue or contact the project maintainers.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Data		Data
Demo		Demo
Models		Models
Training		Training
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

patho-detection-plm

Abstract

Installation

Quick Start and Usage

Organization

Contribute

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

patho-detection-plm

Abstract

Installation

Quick Start and Usage

Organization

Contribute

Support

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages