-
-
Home Page
-
File Upload Page
-
Results Overview
-
Antibiogram
-
SHAP Data for Ampicillin
-
SHAP Data for Ceftriaxone
-
Genome Map
-
Information on Trimethoprim-resistant dihydrofolate reductase
-
Information on Class A β-lactamase
-
Resistance Mechanism Diagram for dfrA17
-
Resistance Mechanism Diagram for gyrA-S83L
-
General Data Statistics
-
Statistics for trimethoprim/sulfamethoxazole
-
Recommendations Tab
Inspiration
Antibiotic resistance kills 1.27 million people per year, more than HIV and malaria combined. When a patient arrives with a bacterial infection, doctors prescribe antibiotics without knowing which ones will actually work. Lab results take 48-72 hours. For sepsis patients, every hour on the wrong antibiotic increases mortality by 4-8%. We built bacter.ai to close that gap.
What it does
bacter.ai takes a raw bacterial DNA sequence (FASTA file) and predicts resistance or susceptibility across 10 major antibiotics in seconds. It outputs a full antibiogram showing which drugs will work, which won't, and why, with explainable AI highlighting the specific DNA patterns driving each prediction. Doctors get actionable treatment guidance before traditional lab results are even available.
How we built it
We sourced 3,030 lab-verified E. coli genomes and their corresponding antibiotic resistance test results from the PATRIC/BV-BRC database, a public NIH-funded repository of bacterial genome data.
Each genome, roughly 5 million characters of raw DNA, was converted into a numerical fingerprint through a process called k-mer frequency extraction. We slide a window of 6 characters across the entire DNA sequence and count how often each of the 4,096 possible 6-letter DNA patterns appears, then normalize by genome length. This produces a compact numerical representation that captures the genetic content of the entire genome, including resistance genes, mutations, and regulatory elements.
We trained 10 independent XGBoost classifiers, one per antibiotic. XGBoost is a gradient-boosted decision tree ensemble, it sequentially builds 200 decision trees, where each tree learns from the mistakes of the previous ones by fitting to the residual errors. Each tree splits on k-mer frequency thresholds to separate resistant from susceptible genomes. The model minimizes binary cross-entropy loss and uses L2 regularization to prevent overfitting.
We validated every model with 5-fold stratified cross-validation, ensuring every prediction is made on data the model never saw during training. We computed 95% bootstrap confidence intervals over 1,000 resampling iterations to quantify the reliability of our accuracy estimates.
To make predictions explainable, we integrated SHAP (SHapley Additive exPlanations), which decomposes each prediction into individual feature contributions, showing exactly which DNA patterns pushed the prediction toward resistant or susceptible.
The backend serves real-time predictions through a Flask REST API. The frontend, built in React with D3.js, visualizes results through an interactive antibiogram, a circular genome map showing resistance gene locations, a verification panel comparing predictions to lab results, and SHAP-based model explanation charts.
Challenges we ran into
- Downloading and processing thousands of genome sequences from the BV-BRC API, where each genome is approximately 5MB of raw DNA text and 9,255 out of 12,627 genomes failed to download due to API timeouts and missing records
- Only a fraction of downloaded genomes had matching lab-tested resistance data for our target antibiotics, limiting our usable training set to 3,030 genomes out of 4,693 downloaded
- Some antibiotics had severe class imbalance, gentamicin had only 13% resistant samples, making it hard for the model to learn resistance patterns without biasing toward the majority class
- Converting raw MIC (minimum inhibitory concentration) lab values into binary resistant/susceptible labels using CLSI clinical breakpoint standards
- Keeping backend models, API endpoints, and frontend components in sync across development branches while everything was continuously changing under a 36-hour deadline
Accomplishments that we're proud of
- Trained 10 working antibiotic resistance classifiers from scratch in under 36 hours
- Achieved 85% accuracy on ciprofloxacin with 94.6% sensitivity, and 81% on ceftriaxone with balanced sensitivity and specificity, using only raw genomic sequence data
- Built a verification system proving predictions work on genomes the model has never seen, with actual lab results shown side by side
- Created a complete end-to-end pipeline, raw DNA upload, k-mer extraction, real-time prediction, explainable results, all running in seconds
- Processed 243,000 lab-verified AMR phenotype records and 4,693 bacterial genome sequences to build our training dataset
What we learned
- How antibiotic resistance works at the molecular level - resistance genes like TEM-1 beta-lactamases and gyrA mutations, horizontal gene transfer via plasmids, and why different antibiotics fail through fundamentally different biochemical mechanisms
- That feature engineering matters more than model complexity - XGBoost on well-crafted k-mer features matches or outperforms neural networks on this task, according to published benchmarks
- The critical importance of rigorous validation - cross-validation, bootstrap confidence intervals, and blind held-out test sets are what separate a real ML tool from a demo that memorized its training data 4.How to build and coordinate a full-stack ML application under extreme time pressure, from data acquisition through model training to deployment
What's next for bacter.ai
- Train species-specific models for Staphylococcus aureus, Klebsiella pneumoniae, and Pseudomonas aeruginosa using the same pipeline
- Add hyperparameter tuning and ensemble methods to push accuracy above 90%
- Integrate with portable DNA sequencers (Oxford Nanopore MinION) for point-of-care deployment directly in hospitals
- Validate on hospital-specific datasets to account for regional variation in resistance patterns
- Build a clinician-facing report generator with treatment recommendations ranked by predicted efficacy
Log in or sign up for Devpost to join the conversation.