Skip to content
View muaazdev's full-sized avatar
πŸ’­
,.` '.,
πŸ’­
,.` '.,

Block or report muaazdev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
muaazdev/README.md

Hi, I'm Muaaz Ahmad πŸ‘‹

Senior AI/ML Engineer with 4+ years of production experience building end-to-end AI systems β€” from RAG pipelines and multi-agent architectures to real-time voice agents and GPU-optimized inference services.

Currently based in Dammam, Saudi Arabia.


πŸ’Ό What I Do

I build AI systems that actually ship to production β€” not just prototypes. My work spans the full lifecycle: architecture, training, optimization, deployment, and monitoring.

  • LLMs & Generative AI β€” RAG pipelines, multi-agent systems (LangGraph, CrewAI), LLM fine-tuning (LoRA, QLoRA), prompt engineering
  • MLOps & Infrastructure β€” CI/CD for ML (GitHub Actions), Docker, AWS (ECS, EC2, Lambda, SageMaker), model serving, auto-scaling
  • Computer Vision β€” Object detection & tracking (YOLO, ByteTrack), segmentation (SAM), pose estimation, video analytics
  • Voice AI β€” Real-time voice agents with Whisper (STT), custom TTS, VAPI, Twilio/Telnyx integration
  • Backend β€” FastAPI async services, Redis caching, request batching, <200ms p95 latency at scale

πŸ—οΈ Production Experience Highlights

No-Code Multi-Agent Builder Platform β€” Built a platform where users create and deploy custom AI agents without coding. Deployed on AWS ECS with Docker, auto-scaling, and load balancing handling concurrent users with sub-second latency.

High-Throughput LLM Inference Service β€” Designed FastAPI async endpoints with request batching and Redis caching achieving 10x latency reduction and <200ms p95 latency under load.

LegalMind β€” Modular RAG system for legal document Q&A featuring hybrid search (vector + BM25), Cohere cross-encoder reranking, mandatory citation validation, and AI evaluation agents (adversarial test generation, hallucination detection, citation verification) with CI/CD pipeline.

Real-Time Sports Video Analytics β€” Custom YOLO training for player/puck tracking in ice hockey. Optimized from 8 FPS to 25+ FPS via TensorRT quantization and GPU optimization.

Real-Time Voice Agent β€” Built calling agent using Twilio/Telnyx with Whisper for STT and custom TTS. Sub-1s end-to-end latency with VAD-based turn-taking and interruption handling.

Stable Diffusion Production APIs β€” Fine-tuned on custom datasets. Built production serving with request queuing and horizontal scaling for face restoration, AI aging, and image editing.


πŸš€ Open Source Projects

Project What It Does Stack Status
DocMind Multi-tenant RAG platform with hybrid search, reranking & evaluation LangChain, Pinecone, FastAPI, Docker βœ… Live
LegalMind Legal document Q&A with hybrid search, citation validation & eval agents LangChain, Cohere, FastAPI, CI/CD βœ… Live
AgentForge Multi-agent task orchestration with LangGraph state machines LangGraph, CrewAI, MCP, Python πŸ”¨ Coming Soon
CallBot AI AI voice agent β€” answers calls, books appointments, syncs CRM VAPI, Twilio, OpenAI, FastAPI πŸ”¨ Coming Soon
FlowPilot AI-powered workflow automation suite n8n, Python, OpenAI API πŸ”¨ Coming Soon
MCP Toolkit Production-ready MCP servers for LLM tool integration FastMCP, Python, Docker πŸ”¨ Coming Soon
SmartServe AI customer support system with RAG + agent handoff LangGraph, FastAPI, React πŸ”¨ Coming Soon

πŸ› οΈ Tech Stack

LLMs & GenAI:     LangChain, LangGraph, CrewAI, MCP, LlamaIndex, OpenAI, Claude, Gemini
                  Fine-tuning: LoRA, QLoRA, PEFT | Models: Llama, Mistral, Gemma
RAG:              Pinecone, ChromaDB, Qdrant, Weaviate, FAISS, Cohere Reranking
Voice AI:         VAPI, Twilio, Telnyx, Whisper, Deepgram, ElevenLabs, Coqui TTS
Computer Vision:  YOLO, ByteTrack, SAM, TensorRT, OpenCV, Stable Diffusion
MLOps:            Docker, GitHub Actions CI/CD, AWS ECS, Model Serving, Auto-scaling
                  vLLM, ONNX, TensorRT, Prefect
Cloud:            AWS (ECS, EC2, S3, Lambda, SageMaker, CloudWatch), GCP
Backend:          Python, FastAPI, Node.js, REST APIs, WebSockets, Async Programming
Data:             PostgreSQL, MongoDB, Redis, Vector DBs, Pandas, NumPy
ML/DL:            PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers

πŸ“« Let's Connect


πŸ“Š GitHub Stats:



Pinned Loading

  1. legalmind legalmind Public

    Python

  2. DocMind DocMind Public

    Python 1

  3. RAG-That-Works RAG-That-Works Public

    Python