I'm an MLOps Engineer focused on designing scalable, cloud-native inference architectures and automated CI/CD pipelines. I specialize in transforming experimental models into production-ready microservices using Kubernetes, Docker, and AWS.
I am currently architecting robust ML systems with a focus on:
- Orchestration: Managing containerized applications with Kubernetes.
- MLOps & GitOps: End-to-end pipeline automation, monitoring, and continuous delivery using ArgoCD.
- Microservices: Decoupling monolithic ML code into scalable FastAPI services.
- Infrastructure: Configuring AWS (EC2, VPC, IAM, ECR) and Linux environments for high availability.
1. End-to-End Fashion AI Recommender (H&M Project)
- Cloud-Native MLOps & IaC: Architected a decoupled deployment strategy by provisioning AWS S3 storage via Terraform, removing heavy model artifacts from Docker images. Optimized model deployment via INT8 Dynamic Quantization (ONNX), achieving a ~75% reduction in footprint (100MB+ to 23MB) with dynamic S3 lazy-loading via Boto3.
- Architecture & GitOps: Orchestrated a highly available Kubernetes environment managed declaratively via Helm and ArgoCD. Automated zero-downtime rollouts and utilized multi-stage builds to reduce image sizes by 54%.
- High-Performance API & Reliability: Architected an asynchronous Redis caching layer, slashing inference latency to <2ms. Validated robustness via Locust, sustaining 805 RPS (equivalent to ~2.9M requests/hour) under 2,000 concurrent users with a 0% error rate.
- Vector Search Engine: Integrated Qdrant as a high-throughput vector database to enable semantic similarity search, optimizing persistent storage structures for millisecond-latency recommendation retrieval.
- DevOps & Observability: Orchestrated a full CI/CD pipeline via GitHub Actions for automated AWS EC2 deployments. Implemented a robust observability stack (Prometheus & Grafana) to track real-time P99 latency and system stability. Enforced rigorous DevSecOps practices (Trivy, Black, Isort) and ensured reliability with Pytest.
2. Cloud-Native Inference Engine (NYC Taxi Project)
- Architecture: Deployed a decoupled system on AWS EC2 with zero-trust security via IAM Roles and custom Security Groups.
- High-Performance: Achieved an 85x latency reduction (~281ms to ~3ms) and sustained ~290 RPS by migrating to ONNX Runtime and integrating Redis caching.
- DevSecOps & CI/CD: Established a GitHub Actions pipeline with Trivy for vulnerability scanning. Reduced production Docker image sizes by 68% (2.05GB to 650MB) via multi-stage builds.
- Model Engineering: Engineered a resource-efficient Random Forest model (size reduced by 97%, from 1.2GB to 33MB) and utilized MLflow, resolving critical Kubernetes OOM errors.
- Reliability: Validated system robustness via Locust stress testing (1000 concurrent connections, 0% error rate) and implemented real-time monitoring with Prometheus & Grafana.