About the Project

Meet AmPy, an expert LLM in antibiotic treatments and antimicrobial resistance build on Google DeepMind's lightweight Gemma3 model.

Inspiration

Antibiotic resistance is a growing global health threat, making rapid and accurate treatment decisions critical for patient outcomes. AmPy aims to create a user-friendly platform that helps physicians quickly decide on effective treatment options based on bacterial genomic data. Traditional lab-based resistance testing is slow and resource-intensive. We were inspired to build a tool that leverages bacterial genome data and machine learning to instantly predict resistance profiles for multiple antibiotics, empowering clinicians to make informed choices in real time. AmPy provides reasoning and evidence behind it's treatment recommendation and can have though discussions with physicians about the specific cases they are treating!

What We Learned

Bioinformatics: We learned how to process and represent genome sequences using k-mer extraction, a technique that breaks DNA into short substrings for feature engineering.
Machine Learning: We explored multi-label classification, model evaluation (AUC, AUPRC), and feature importance for interpretability.
Data Engineering: We tackled large-scale data streaming, merging genome and phenotype data, and handling missing/imbalanced labels.
Model Deployment: We experimented with user interfaces (Gradio) for real-world accessibility.
LLM Deployment: We trained our LLM, AmPy, with the World Health Organization's antibiotic recommendation documentation. AmPy was build on Google's Gemma3 Open Weights model

How We Built It

Data Preparation: We streamed bacterial genome data and resistance labels, sampled genomes, and merged them for analysis.
Feature Extraction: We used k-mer extraction and TF-IDF vectorization to convert DNA sequences into machine-readable features.
Model Training: For each antibiotic, we trained an XGBoost classifier, using stratified splits and robust metrics (AUC, AUPRC).
Evaluation: We summarized model performance, identified top-performing antibiotics, and extracted the most predictive k-mers.
Prediction & UI: We built a pipeline to predict resistance for any genome and deployed a Gradio interface for user-friendly access.

Challenges

Data Imbalance: Many antibiotics had few resistant or susceptible samples, requiring careful selection of trainable targets.
Computational Load: Genome data is massive; we optimized by sampling, limiting k-mer features, and using efficient vectorization.
Interpretability: Making predictions explainable for clinicians was key, so we included feature importance and confidence scores.
Integration: Merging streaming genome data with static label files and handling missing data was non-trivial.

Math & Algorithms

We used k-mer extraction: $$ \text{For a sequence } S, \text{ k-mers are } { S_i = S[i:i+k] \mid 0 \leq i \leq |S| - k } $$

TF-IDF vectorization: $$ \text{TF-IDF}(t, d) = \text{tf}(t, d) \times \log\left(\frac{N}{\text{df}(t)}\right) $$

Model evaluation: $$ AUC = \int_0^1 TPR(FPR^{-1}(x)) dx $$

Final Thoughts

We built a scalable, explainable, and fast pipeline for multi-antibiotic resistance prediction from genome data. This project showed us the power of combining bioinformatics and AI to address urgent clinical needs. We hope our tool can help accelerate effective treatment and combat the rise of superbugs.

GITHUB: https://github.com/hackbio-ca/genome-based-antibiotic-recommendation

Built With

gradio
matplotlib
model-evaluation)-xgboost-(machine-learning-models)-tfidfvectorizer-(feature-extraction)-gradio-(user-interface)-matplotlib
numpy
numpy-(data-processing)-hugging-face-datasets-(genome-data-streaming)-scikit-learn-(feature-engineering
pandas
pickle
python
scikit-learn
seaborn
tfidfvectorizer
tqdm
xgboost

Submitted to

Toronto Bioinformatics Hackathon 2025

Created by

Worked on ML

Tara Sangra
Masoud Karimi
Selina Zarzour
I'm a student in computer engineering at the university of Toronto, keen to solving real world problems using technology.
raghavbh5588 Bhargava
Tala Abdelmaguid

Updates

Selina Zarzour started this project — Sep 21, 2025 12:55 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.