The aim of this project is to predict gene type directly from a raw DNA sequence using classical machine learning models, convolutional neural networks, and a custom nucleotide-level Transformer, while also exploring practical deployment of machine learning services with Docker compose, Kubernetes (locally with Kind) and cloud infrastructure (AWS Kubernetes EKS).
A gene is a defined segment of DNA that contains the nucleotide sequence (A, T, C, and G) encoding the instructions for producing a functional product, most commonly a protein. The nucleotide sequence determines the amino acid sequence, which in turn specifies a protein’s structure and function. Genes typically begin with a start codon and end with a stop codon, which together define the boundaries of the coding region.
Genes can be broadly categorized into protein-coding genes, which are translated into proteins, and non-coding genes, which give rise to functional RNA molecules. Non-coding RNAs constitute the majority of the genome and carry out diverse regulatory and structural roles without being translated into proteins. They are often classified by size into small non-coding RNAs (e.g., snRNAs) and long non-coding RNAs (lncRNAs).
Gene types (such as protein-coding genes, non-coding RNAs, and pseudogenes) and their corresponding DNA sequences were obtained from the NCBI Gene Database via the following Kaggle dataset: https://www.kaggle.com/datasets/harshvardhan21/dna-sequence-prediction.
The dataset contains:
- Gene symbols and textual descriptions
- Gene functional types
- DNA nucleotide sequences represented as A, C, G, T
- Predefined train / validation / test splits
Each sequence consists of a variable number of nucleotides and is capped at a maximum length of 1024 bp for transformer training.
To systematically evaluate different learning paradigms for gene-type prediction, the following models were implemented and compared:
- Classical machine learning (baseline)
- 6-mer generation from nucleotide sequences
- k-mer frequency counting
- Classification using:
- Logistic Regression
- XGBoost
- This baseline captures local motif statistics but ignores long-range dependencies.
- Deep learning with convolutions
- 1D Convolutional Neural Network (CNN)
- One-hot encoded DNA input
- It captures local sequence patterns and motifs.
- Nucleotide-level transformer
- A custom, lightweight, encoder-only Transformer trained from scratch for genomic sequence classification.
- Architecture:
- Nucleotide-level, encoder-only Transformer
- 3 encoder layers
- Hidden dimension (d_model) of 192
- 6 attention heads
- Feed-forward hidden dimension of 512
- Dropout 0.2
- Early stopping based on validation macro-F1
- Max sequence length (
max_len): 1024 to cover all sequences without truncation - Tokenization: simply one token per nucleotide:
- Vocabulary:
["A", "C", "G", "T", "N", "[PAD]", "[CLS]"](plus maybe[UNK]) - Encode each sequence as
[CLS] + tokens + [PAD]up tomax_len
- Vocabulary:
ML | Data Science
🧠 numpy • pandas • scikit-learn • PyTorch • XGBoost • CNN • transformer-encoder
📊 seaborn • matplotlib
Backend
🌐 FastAPI
🚀 gunicorn
MLOps | Deployment
🐳 Docker • Docker Compose
📦 BentoML
⚙️ ONNX Runtime
☁️ Kind • AWS Kubernetes (EKS)
The public Kubernetes endpoint shown below was used during development and testing on AWS EKS. To avoid unnecessary cloud costs, the EKS cluster and external load balancer are not kept running continuously. As a result, the endpoint may be unavailable at the time of reading. All inference steps can be fully reproduced locally using Docker Compose or local Kubernetes (Kind), as described below.
Example request in python (EKS deployment - illustrative):
url = "http://a180c213e0dc94bf3b569493738d5e85-1066055406.eu-west-1.elb.amazonaws.com/predict"
# request JSON
data = {'sequence': 'CTTTCTGCCGCCATCTTGCTTCCGCGTTCCCTGCACAAAATGCCGGGCGAAGCCACAGAAACCGTCCCTGCTACAGAGCAGGAGTTGCCGCAGTCCCAGGCTGAGACAGGGTCTGGAACAGCATCTGATAGTGGTGAATCAGTACCAGGGATTGAAGAACAGGATTCCACCCAGACCACCACACAAAAAGCCTGGCTGGTGGCAGCAGCTGAAATTGATGAAGAACCAGTCGGTAAAGCAAAACAGAGTCGGAGTGAAAAGAGGGCACGGAAGGCTATGTCCAAACTGGGTCTTCTACAGGTTACAGGAGTTACTAGAGTCACTATCTGGAAATCTAAGAATATCCTCTTTGT'}
result = requests.post(url, json=data).json()
print(result)Example responce:
{
"predictions": {
"BIOLOGICAL_REGION": 0.00,
"OTHER": 1.57,
"PROTEIN_CODING": 89.0,
"PSEUDO": 0.10,
"ncRNA": 8.56,
"snoRNA": 0.75,
"tRNA": 0.02
},
"top_class": "PROTEIN_CODING",
"top_probability": 89.0
}After evaluating multiple modeling approaches, a custom Transformer encoder was selected as the final model due to its superior performance in gene-type classification. The trained PyTorch model was exported to ONNX and packaged as a BentoML service using ONNX Runtime, exposing a stable HTTP /predict endpoint.
The service can be run locally with Docker Compose for reproducible end-to-end testing and deployed unchanged to AWS EKS via Amazon ECR, where it operates as a standard Kubernetes workload. Local Kubernetes deployment using Kind is also described to mirror the cloud setup during development.
To sum up, this project is deployed and tested incrementally across multiple environments, starting from local development and progressing to cloud-native model serving:
-
1. Local deployment with Docker Compose (a gateway service + a model service): A multi-container setup consisting of a FastAPI gateway service and a BentoML-based model service running ONNX Runtime.
-
2. Kubernetes-based deployment with ONNX Runtime
- 2A. AWS EKS: Deployment on Amazon Elastic Kubernetes Service (EKS)
- 2B. Local Kubernetes with Kind: A local Kubernetes setup using Kind to mirror the EKS deployment for development and testing.
Organization of the files in the repository:
Genetype-classifier-api/
├── notebooks/
│ ├── 01_EDA.ipynb # Data exploration: class balance, sequence length, cleaning checks
│ ├── 02_baseline_kmers.ipynb # Baseline model using k-mers of the input (e.g., Logistic Regression / XGBoost)
│ ├── 02_baseline_CNN.ipynb # Baseline deep model: CNN sequence classifier
│ ├── 02_transformer.ipynb # Transformer-encoder training + evaluation
│ └── 03_model_comparison.ipynb # Model comparison across baselines and transformer, metrics + plots
│
├── data/
│ ├── train.csv # Training split (raw/processed sequences + labels)
│ ├── validation.csv # Validation split
│ ├── test.csv # Test split
│ ├── train_classimb.csv # Training split with class imbalance variant
│ ├── validation_classimb.csv # Validation split with class imbalance variant
│ └── test_classimb.csv # Test split with class imbalance variant
│
├── models/
│ ├── transformer_model.pkl # Serialized Transformer model (training artifact / checkpoint)
│ ├── transformer_classifier.onnx # Exported ONNX model for optimized inference
│ ├── transformer_classifier.onnx.data # External tensor weights for ONNX (large model data)
│ ├── transformer_report.txt # Evaluation report (metrics, per-class performance)
│ ├── cnn_model.pkl # Serialized CNN baseline model
│ ├── cnn_report.txt # CNN evaluation report
│ ├── logistic_regression_model.pkl # Baseline classical model artifact
│ ├── logistic_regression_report.txt # Baseline evaluation report
│ ├── xboost_model.pkl # XGBoost baseline model artifact
│ └── xboost_report.txt # XGBoost evaluation report
│
├── service/
│ ├── app.py # FastAPI application (inference endpoint(s))
│ ├── Dockerfile_k8s # Service image build for Kubernetes deployment
│ ├── test.py # Local service tests / smoke tests
│ └── __init__.py # Package marker
│
├── docker_compose/
│ ├── bentofile.yaml # BentoML build configuration for model packaging
│ ├── service.py # Bento service definition (runner, API adapters, etc.)
│ ├── gateway.py # Gateway / routing layer (fronts model service)
│ ├── docker-compose.yaml # Local multi-container setup (gateway + model service)
│ ├── image-model.dockerfile # Docker build for model service container
│ ├── image-gateway.dockerfile # Docker build for gateway container
│ └── __init__.py # Package marker
│
├── eks/
│ ├── eks-config.yaml # EKS cluster configuration (cluster, nodegroups, region, VPC settings)
│ ├── model-deployment.yaml # Kubernetes Deployment for model service
│ ├── model-service.yaml # Kubernetes Service for model service (ClusterIP, ports)
│ ├── gateway-deployment.yaml # Kubernetes Deployment for gateway
│ ├── gateway-service.yaml # Kubernetes Service for gateway (LoadBalancer / ingress entrypoint)
│ └── test.py # EKS-related tests / helper script(s)
│
├── k8s/
│ ├── deployment.yaml # Kubernetes Deployment for the inference service
│ ├── service.yaml # Kubernetes Service exposing the API
│ ├── hpa.yaml # Horizontal Pod Autoscaler for scaling under load
│ └── load_test.py # Load testing script for the deployed service
│
├── plots/ # Training curves and confusion matrices for the ML models
├── img/ # README assets (logo/illustrations)
├── pyproject.toml # Python project config + dependencies
├── uv.lock # Locked dependency versions (uv)
└── README.md # Project overview, setup, usage, deployment notes
Docker Compose provides a simple and deterministic way to run multi-container applications on a single host. In this project, it is used to validate the full inference stack end-to-end (gateway routing, model loading, request/response schema, and container networking) in a production-like environment without Kubernetes. This same container-first approach transfers cleanly to AWS EKS later.
The deployment uses a lightweight FastAPI gateway for request routing and a BentoML model service for inference. BentoML is an open-source framework designed to package model execution code, runtime dependencies, and serving logic into a reproducible, containerized service. In this project, BentoML exposes the ONNX Runtime inference pipeline as a production-grade HTTP API, decoupling model training from deployment and runtime concerns.
Architecture overview. This deployment contains two services:
- Gateway service (FastAPI, port 9696)
- Receives HTTP requests from the client (e.g., website or curl).
- Performs lightweight request validation (optional).
- Forwards the request to the model service via an internal Docker network.
- Model service (BentoML + ONNX Runtime, port 3000)
- Loads the exported ONNX model at startup.
- Runs domain-specific sequence preprocessing (tokenization, attention mask, padding).
- Executes inference via ONNX Runtime.
- Applies postprocessing (softmax, class mapping) and returns a JSON response.
Run the model service locally (without Docker)
This is useful for validating ONNX inference and the BentoML endpoint before containerizing.
bentoml serve docker_compose.service:GeneFunctionService --port 3000Test in another terminal:
curl -X POST http://localhost:3000/predict -H "Content-Type: application/json" -d '{"sequence":"AACGGCTC"}'Build and run the model service in Docker
Build/run locally:
# Build the model image (run from repository root):
docker build -t genetype-bento:001 -f docker_compose/image-model.dockerfile .
# Run the container:
docker run --rm -p 3000:3000 genetype-bento:001Test it:
curl -X POST http://localhost:3000/predict \
-H "Content-Type: application/json" \
-d '{"sequence":"AACGGCTC"}'Build the gateway image
The gateway forwards requests to the model service using:
MODEL_URL=http://onnx-model:3000/predictBuild the gateway image (run from repository root; adjust Dockerfile path if needed):
docker build -t genetype-gateway:001 -f docker_compose/image-gateway.dockerfile .Run the full stack with Docker Compose
Start both containers and their internal network:
docker-compose -f docker_compose/docker-compose.yaml up --buildTest end-to-end via the gateway:
curl -X POST http://localhost:9696/predict \
-H "Content-Type: application/json" \
-d '{"sequence":"AACGGCTC"}'You should receive the JSON response produced by BentoML (predictions + top_class + top_probability).
Useful commands:
docker-compose up: run docker composedocker-compose up -d: run docker compose in detached modedocker ps: to see the running containersdocker-compose down: stop the docker compose
Here, I'll create Elastic Kubernetes Service (EKS) cluster on Amazon using cli, publish images to ECR and configure kubectl.
What is needed to be installed locally:
- aws CLI configured for your AWS account (command line tool for working with AWS services)
- kubectl (it manages Kubernetes objects within your Amazon EKS clusters)
- eksctl CLI (it interacts with AWS to create, modify, and delete Amazon EKS clusters)
- helm (popular package manager for kubernetes)
- access to an EKS cluster (existing or newly created)
To create cluster and manage on EKS, I'll use a cli tool eksctl which can be downloaded from here. And next let's follow these steps:
1. In the kube-config folder create eks config file eks-config.yaml:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: genetype-ml-eks
region: eu-west-1
nodeGroups: # for our case, we need only one node group (CPU)
- name: ng-m5-xlarge
instanceType: m5.xlarge
desiredCapacity: 1Create eks cluster:
eksctl create cluster -f eks-config.yaml2. Push local BentoML image to Amazon ECR
Create aws ecr repository for eks cluster (one-time):
aws ecr create-repository --repository-name genetype-bento-images --region eu-west-1Bash commands to run in the terminal to push docker images to ecr repository:
# Registry URI
ACCOUNT_ID=XXXXXXXXXX
REGION=eu-west-1
REGISTRY_NAME=genetype-bento-images
PREFIX=${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REGISTRY_NAME}
# Tag local docker images to remote tag
GATEWAY_LOCAL=genetype-gateway:001 # gateway service
GATEWAY_REMOTE=${PREFIX}:genetype-gateway-001 # notice the ':' is replaced with '-' before 001
docker tag ${GATEWAY_LOCAL} ${GATEWAY_REMOTE}
MODEL_LOCAL=genetype-bento:001 # ml model
MODEL_REMOTE=${PREFIX}:genetype-bento-001 # same thing ':' is replaced with '-' before genetype-bento
docker tag ${MODEL_LOCAL} ${MODEL_REMOTE}
# Push tagged docker images
docker push ${MODEL_REMOTE}
docker push ${GATEWAY_REMOTE}Login to ecr and push images:
$(aws ecr get-login --no-include-email)first push the model and then gateway remote image.
Get the uri of these images:
echo ${MODEL_REMOTE} and
echo ${GATEWAY_REMOTE} and add them to model-deployment.yaml and gateway-deployment.yaml respectively.
Apply all the yaml config files to remote node coming from eks (kubectl get nodes):
kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-service.yamlTesting the deployment pods and services should give us predictions.
Check nodes:
kubectl get nodesMy example output:
NAME STATUS ROLES AGE VERSION
ip-192-168-24-82.eu-west-1.compute.internal Ready <none> 15m v1.32.9-eks-ecaa3a6kubectl get serviceMy example output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gateway LoadBalancer 10.100.212.63 a180c213e0dc94bf3b569493738d5e85-1066055406.eu-west-1.elb.amazonaws.com 80:31910/TCP 5m15s
genetype-bento ClusterIP 10.100.68.77 <none> 8500/TCP 5m20s
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 24mExecuting kubectl get service should give us the external port address which need to add in the test.py as access url for predictions (e.g., url = 'http://a3399e***-5180***.ap-south-123.elb.amazonaws.com/predict').
Note: this load balancer is now open to everyone! Remember to restrict it.
Test it!
python3 eks/test.pyAnd the output:
{
"predictions": {
"BIOLOGICAL_REGION": 0.00,
"OTHER": 1.57,
"PROTEIN_CODING": 89.0,
"PSEUDO": 0.10,
"ncRNA": 8.56,
"snoRNA": 0.75,
"tRNA": 0.02
},
"top_class": "PROTEIN_CODING",
"top_probability": 89.0
}To delete the remote cluster:
eksctl delete cluster --name genetype-eksInstall kubectl on linux:
cd
mkdir bin && cd bin
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
cd
export PATH="${PATH}:${HOME}/bin"
# add it to .bashrc
# verify installation
which kubectl
kubectl version --client
Install Kind. Kind (Kubernetes in Docker) allow to run Kubernetes clusters locally using Docker containers.
curl -Lo ${HOME}/bin/kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ${HOME}/bin/kind
Install uv:
pip install uv
Create a Kind Cluster. Let's create a local Kubernetes cluster:
kind create cluster --name seq2genetype
This command will:
- Create a single-node Kubernetes cluster
- Configure kubectl to use this cluster
- Take a few minutes on first run (downloads images)
Verify the cluster is running:
kubectl cluster-info
kubectl get nodes
We should see one node in "Ready" status.
We will use a pre-trained PyTorch model that classifies clothing items. The model has already been converted to ONNX format. The model predicts one of 10 clothing categories from an image URL.
Download the ONNX model:
mkdir service
cd service
wget https://github.com/katwre/Genetype-classifier-api/blob/main/models/transformer_classifier.onnx -O transformer_classifier.onnx
wget https://github.com/katwre/Genetype-classifier-api/blob/main/models/transformer_classifier.onnx.data -O transformer_classifier.onnx.data
kubectl get nodes
Here, I'll create a FastAPI application that serves the ONNX model for inference (prediction).
Initialize the project with uv:
uv init
rm main.py
uv add fastapi uvicorn onnxruntime keras-image-helper numpy requests
FastAPI application was created in service/app.py. To test locally, run the service:
uv run uvicorn service.app:app --host 0.0.0.0 --port 8080 --reload
Open http://localhost:8080/docs to see the API documentation.
And then test the service with:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"sequence": "ACGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA"}'
It's going to return:
{"predictions":{"BIOLOGICAL_REGION":0.04,"OTHER":14.28,"PROTEIN_CODING":0.0,"PSEUDO":0.9,"ncRNA":77.85,"snoRNA":6.84,"tRNA":0.08},"top_class":"ncRNA","top_probability":77.85}
Let's containerize the application.
Build the image:
docker build -t gene-type-classifier:v1 .
Test the container locally:
docker run -it --rm -p 8080:8080 gene-type-classifier:v1
In another terminal, run the test script:
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"sequence": "ACGTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA"}'
Kind clusters run in Docker, so they can't access images from the local Docker daemon by default. I need to load the image into Kind.
kind load docker-image gene-type-classifier:v1 --name seq2genetype
Understanding kubernetes resources:
- Pod: The smallest deployable unit in Kubernetes (one or more containers)
- Deployment: Manages a set of identical Pods, handles updates and scaling
- Service: Exposes Pods to network traffic
- HPA (Horizontal Pod Autoscaler): Automatically scales Pods based on metrics
Key configuration in k8s/deployment.yaml:
- replicas: 2 - Run 2 copies of our service
- imagePullPolicy: Never - Use local image (don't pull from registry)
- resources - Memory and CPU limits/requests
- livenessProbe - Restart container if unhealthy
- readinessProbe - Only send traffic when ready
Deploy it:
kubectl apply -f k8s/deployment.yaml
Check the deployment:
kubectl get deployments
kubectl get pods
kubectl describe deployment gene-type-classifier
View logs:
kubectl logs -l app=gene-type-classifier --tail=20
Key configuration in k8s/service.yaml:
- type: NodePort - Expose on a static port on each node
- nodePort: 30080 - Accessible on port 30080 from host
- selector - Routes traffic to Pods with matching labels
Deploy it:
kubectl apply -f k8s/service.yaml
Check the service:
kubectl get services
kubectl describe service gene-type-classifier
With NodePort, the service is accessible on localhost:30080:
Check the health endpoint:
curl http://localhost:30080/health
Our kind cluster is not configured for NodePort, so it won't work. We don't really need this for testing things locally, so let's just use a quick fix: Use kubectl port-forward. It starts a temporary TCP tunnel from our local machine to the Kubernetes Service (map local port 8080 to service port 8080). Visually: curl localhost:8080 -> kubectl (port-forward) -> Kubernetes API -> Service (gene-type-classifier) -> Pod :8080.
kubectl port-forward service/gene-type-classifier 30080:8080
Now it's accessible on port 30080:
curl http://localhost:30080/health
When we deploy to EKS or some other kubernetes in the cloud, it won't be a problem - there Elastic Load Balancer will solve this problem.
Kubernetes can automatically scale your application based on CPU or memory usage. First, we need metrics-server for HPA to work. Install it in kubectl:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
For Kind, we need to patch metrics-server to work without TLS:
kubectl patch -n kube-system deployment metrics-server --type=json -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
Wait for metrics-server to be ready:
kubectl get deployment metrics-server -n kube-system
We should see something like that:
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 72s
In the file k8s/hpa.yaml there is a configuration to:
- Scale between 2 and 10 replicas
- Target 50% CPU utilization
- Automatically adds/removes Pods based on load
To deploy HPA:
kubectl apply -f k8s/hpa.yaml
Check HPA status:
kubectl get hpa
kubectl describe hpa gene-type-classifier-hpa
Generate load to trigger autoscaling. We can use a simple load test (see load_test.py).
First, check that you can access the endpoint:
curl http://localhost:30080/health
Run the test:
uv run python k8s/load_test.py
While running the load test, watch the HPA in another terminal:
kubectl get hpa -w
We should see the number of replicas increase as CPU usage rises.
Check Pods:
kubectl get pods -w
If we make changes to your code:
- Rebuild the image with a new tag:
docker build -t gene-type-classifier:v2 .
- Load to Kind:
kind load docker-image gene-type-classifier:v2 --name seq2genetype
- Update deployment:
kubectl set image deployment/gene-type-classifier gene-type-classifier=gene-type-classifier:v2
Or update the YAML file and apply:
kubectl apply -f k8s/deployment.yaml
Watch the rollout:
kubectl rollout status deployment/gene-type-classifierScale to 5 replicas:
kubectl scale deployment gene-type-classifier --replicas=5All Pods:
kubectl logs -l app=gene-type-classifier --tail=50Specific Pod:
kubectl logs <pod-name>Follow logs:
kubectl logs -f -l app=gene-type-classifierDescribe resources:
kubectl describe deployment gene-type-classifier
kubectl describe pod <pod-name>
kubectl describe service gene-type-classifierGet events:
kubectl get events --sort-by='.lastTimestamp'Execute commands in a Pod:
kubectl exec -it <pod-name> -- /bin/bashDelete the deployment and service:
kubectl delete -f k8s/deployment.yaml
kubectl delete -f k8s/service.yaml
kubectl delete -f k8s/hpa.yamlOr delete everything at once:
kubectl delete all -l app=gene-type-classifier
kubectl delete hpa gene-type-classifier-hpaDelete the Kind cluster:
kind delete cluster --name seq2genetype



