Senior AI/ML Engineer with 4+ years of production experience building end-to-end AI systems β from RAG pipelines and multi-agent architectures to real-time voice agents and GPU-optimized inference services.
Currently based in Dammam, Saudi Arabia.
I build AI systems that actually ship to production β not just prototypes. My work spans the full lifecycle: architecture, training, optimization, deployment, and monitoring.
- LLMs & Generative AI β RAG pipelines, multi-agent systems (LangGraph, CrewAI), LLM fine-tuning (LoRA, QLoRA), prompt engineering
- MLOps & Infrastructure β CI/CD for ML (GitHub Actions), Docker, AWS (ECS, EC2, Lambda, SageMaker), model serving, auto-scaling
- Computer Vision β Object detection & tracking (YOLO, ByteTrack), segmentation (SAM), pose estimation, video analytics
- Voice AI β Real-time voice agents with Whisper (STT), custom TTS, VAPI, Twilio/Telnyx integration
- Backend β FastAPI async services, Redis caching, request batching, <200ms p95 latency at scale
No-Code Multi-Agent Builder Platform β Built a platform where users create and deploy custom AI agents without coding. Deployed on AWS ECS with Docker, auto-scaling, and load balancing handling concurrent users with sub-second latency.
High-Throughput LLM Inference Service β Designed FastAPI async endpoints with request batching and Redis caching achieving 10x latency reduction and <200ms p95 latency under load.
LegalMind β Modular RAG system for legal document Q&A featuring hybrid search (vector + BM25), Cohere cross-encoder reranking, mandatory citation validation, and AI evaluation agents (adversarial test generation, hallucination detection, citation verification) with CI/CD pipeline.
Real-Time Sports Video Analytics β Custom YOLO training for player/puck tracking in ice hockey. Optimized from 8 FPS to 25+ FPS via TensorRT quantization and GPU optimization.
Real-Time Voice Agent β Built calling agent using Twilio/Telnyx with Whisper for STT and custom TTS. Sub-1s end-to-end latency with VAD-based turn-taking and interruption handling.
Stable Diffusion Production APIs β Fine-tuned on custom datasets. Built production serving with request queuing and horizontal scaling for face restoration, AI aging, and image editing.
| Project | What It Does | Stack | Status |
|---|---|---|---|
| DocMind | Multi-tenant RAG platform with hybrid search, reranking & evaluation | LangChain, Pinecone, FastAPI, Docker | β Live |
| LegalMind | Legal document Q&A with hybrid search, citation validation & eval agents | LangChain, Cohere, FastAPI, CI/CD | β Live |
| AgentForge | Multi-agent task orchestration with LangGraph state machines | LangGraph, CrewAI, MCP, Python | π¨ Coming Soon |
| CallBot AI | AI voice agent β answers calls, books appointments, syncs CRM | VAPI, Twilio, OpenAI, FastAPI | π¨ Coming Soon |
| FlowPilot | AI-powered workflow automation suite | n8n, Python, OpenAI API | π¨ Coming Soon |
| MCP Toolkit | Production-ready MCP servers for LLM tool integration | FastMCP, Python, Docker | π¨ Coming Soon |
| SmartServe | AI customer support system with RAG + agent handoff | LangGraph, FastAPI, React | π¨ Coming Soon |
LLMs & GenAI: LangChain, LangGraph, CrewAI, MCP, LlamaIndex, OpenAI, Claude, Gemini
Fine-tuning: LoRA, QLoRA, PEFT | Models: Llama, Mistral, Gemma
RAG: Pinecone, ChromaDB, Qdrant, Weaviate, FAISS, Cohere Reranking
Voice AI: VAPI, Twilio, Telnyx, Whisper, Deepgram, ElevenLabs, Coqui TTS
Computer Vision: YOLO, ByteTrack, SAM, TensorRT, OpenCV, Stable Diffusion
MLOps: Docker, GitHub Actions CI/CD, AWS ECS, Model Serving, Auto-scaling
vLLM, ONNX, TensorRT, Prefect
Cloud: AWS (ECS, EC2, S3, Lambda, SageMaker, CloudWatch), GCP
Backend: Python, FastAPI, Node.js, REST APIs, WebSockets, Async Programming
Data: PostgreSQL, MongoDB, Redis, Vector DBs, Pandas, NumPy
ML/DL: PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers
- π Website: muaazdev.com
- π¬ LinkedIn: muaazdev
- π§ Email: [email protected]



