I build GenAI and machine learning systems that run in production — not just models in notebooks.
I'm currently finishing my Master's in Data Science at Illinois Institute of Technology, and my work is centered around building end-to-end AI systems across:
- Retrieval-augmented generation (RAG)
- Multi-agent LLM systems
- LLM evaluation and quality pipelines
- Recommender systems and ranking
- MLOps, deployment, and observability
- Real-time inference APIs
What I enjoy most is building the full system around the model: retrieval, routing, evaluation, serving, monitoring, and iteration.
I'm especially interested in problems where:
- LLMs need strong retrieval, grounding, and routing
- models are part of a larger production system
- latency, reliability, and monitoring matter
- evaluation is treated as a first-class part of the stack
- engineering decisions matter as much as model choice
LangGraph FastAPI ChromaDB PostgreSQL Redis Prometheus Grafana Kubernetes MCP
- 12-agent LangGraph graph with supervisor routing: deterministic rules first, LLM classifier fallback for low-signal queries
- Hybrid RAG pipeline: ChromaDB vector search + BM25 lexical ranking + Reciprocal Rank Fusion + LLM reranker
- Knowledge graph agent using NetworkX for relationship-style local queries
- Graph-native Human-in-the-Loop (HITL): Streamlit Approve/Reject buttons for web and recency queries
- MCP tool bridge for web search and calculator with automatic local fallback
- TTL caching on Redis with in-memory fallback; dual-layer persistence via LangGraph PostgreSQL checkpointer + conversation store
- Per-user auth, CI/CD via GitHub Actions, Kubernetes manifests, and Prometheus/Grafana monitoring included
LangGraph FastAPI OpenAI PostgreSQL GitHub Actions API Prometheus Docker
- Autonomous agent that detects failed GitHub Actions runs, diagnoses the failure, generates a code fix, and opens a PR — without human intervention
- LangGraph state machine with eight explicit stages: ingest → triage → diagnose → reproduce → patch → validate → evaluate → PR or escalate
- Two trigger modes: automatic via GitHub webhook on
workflow_runevents, and manual via Streamlit operator console - Patch generation uses OpenAI Responses API with strict JSON-schema output; heuristic fallback for local/demo scenarios
- Every fix must pass lint and tests in a disposable isolated workspace before a PR is opened
- Guardrails enforce allowlisted paths only, block secret-like files, cap patch size, and escalate low-confidence results
- Six Prometheus metrics: success rate, first-patch pass rate, MTTR, escalation rate, false-positive PR rate, retry count
LangChain Elasticsearch FAISS BERT MLflow DVC PostgreSQL Docker AWS EKS
- Hybrid RAG pipeline combining BM25 + SentenceTransformer dense embeddings fused via Reciprocal Rank Fusion over Elasticsearch
- distilBERT extractive QA model for direct answer extraction from financial documents
- LangChain agent workflows for context-aware multi-step document reasoning
- Automated evaluation pipeline logging Hit Rate, MRR, and latency per query to MLflow
- PostgreSQL captures user interactions and satisfaction signals for downstream analysis
- Containerized with Docker, deployed via GitHub Actions CI/CD to AWS EKS with full Kubernetes manifests
OpenAI Cohen's Kappa Fleiss' Kappa SQLite AWS S3 Streamlit
- Production-grade pipeline for validating annotation consistency and evaluating LLM output quality on QA datasets
- Inter-annotator agreement scoring using both Cohen's Kappa and Fleiss' Kappa across multiple annotators
- Schema validation layer flags malformed or inconsistent annotation records before scoring
- LLM-as-judge scoring via OpenAI API to evaluate response quality at scale
- All results logged to SQLite with AWS S3 for artifact and report storage
- Streamlit dashboard surfaces agreement metrics, judge scores, dataset quality summaries, and evaluation trends
QLoRA PEFT Mistral-7B Hugging Face Dual Adapters Instruction Tuning
- Fine-tuned Mistral-7B under resource constraints using QLoRA: 4-bit quantization + LoRA adapters on limited GPU memory
- Two separate adapter sets trained: one for conceptual explanation mode, one for code generation mode
- Runtime routing layer selects the correct adapter based on query intent at inference time
- Instruction-tuned for data science notebook workflows — explanation, debugging, and code generation tasks
- Focus on reliable, consistent outputs for technical use cases rather than benchmark-only performance
Multi-Stage Two-Tower Recommender
TensorFlow Recommenders FAISS FastAPI MLflow DVC Airflow A/B Testing Prometheus
AutoML LangGraph Assistant
LangGraph ChromaDB OpenAI GPT-4o MLflow DVC AWS EC2/S3 MCP Docker
ECG Anomaly Detection — LSTM AutoEncoder
TensorFlow Keras LSTM AutoEncoder Unsupervised 97.93% accuracy
Brain Tumor Segmentation
PyTorch ResNeXt50-UNet Streamlit LGG MRI
LLM / GenAI
LangGraph LangChain LlamaIndex RAG Agentic AI QLoRA PEFT MCP LlamaGuard Vector Search Hybrid Retrieval Reranking
ML / Retrieval / Ranking
PyTorch TensorFlow scikit-learn FAISS BM25 XGBoost MLflow DVC
Backend / Infra
FastAPI Streamlit PostgreSQL Redis Docker Kubernetes AWS Azure
Monitoring / Ops
Prometheus Grafana GitHub Actions
I'm looking for entry-level roles in:
- Machine Learning Engineering
- Applied AI / GenAI
- Data Science with strong ML systems focus
I'm authorized to work in the U.S. on F-1 OPT.
- Email: [email protected]
- LinkedIn: linkedin.com/in/theepankumar
- Portfolio: theepan-portfolio.netlify.app
Outside of work, I like trekking and baking. One clears the head, the other feeds it.
