Rehan Malik

Senior AI/ML Engineer · Cloud Solution Architect (AWS) · Open to Opportunities

5+ years building production AI systems at enterprise scale — GenAI, LLMs, RAG, RLHF, Computer Vision, Voice AI, Cloud Architecture.

About Me

I’m a Senior AI/ML Engineer with 5+ years of hands-on experience shipping production AI systems across healthcare, finance, retail, media, and enterprise operations. I’ve worked with companies ranging from 10-person startups to 10,000+ employee enterprises like MARS, solving problems that move business metrics.

What I do best: Take AI from research to production. I’ve fine-tuned LLMs (LLaMA, Mistral) with LoRA/QLoRA, built RLHF pipelines with PPO, architected RAG systems over 2TB+ corpora, deployed real-time voice infrastructure handling 500+ concurrent calls, and shipped fraud detection models processing applications in real-time — all on AWS/GCP at scale.

What I’m looking for: Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles where I can build and ship production AI systems.

B.S. Computer Science (COMSATS University Islamabad, 2016–2020)

What I’ve Built

The work I’m most proud of — production systems processing real data, serving real users, driving real business impact.

Voice AI Infrastructure

Real-time concurrent voice processing with zero-latency ingestion engines

Built voice-to-data pipelines handling 500+ simultaneous calls using WebSockets, Apache Kafka, and streaming architectures
Developed gRPC microservices with C++ modules (CUDA, Eigen), reducing inference latency by 25%
Designed speech-to-text, sentiment analysis, and sales insights extraction from live audio streams

Fraud Detection AI Co-Pilot

Ensemble ML + GenAI explainability for financial services

Engineered 650+ predictive features from raw application data — behavioral anomalies, timing patterns, identity verification signals
Built ensemble model (XGBoost + Isolation Forest) achieving 50% fraud detection on holdout test sets
Discovered 3 applicant personas via unsupervised clustering (UMAP + HDBSCAN) — “Digital Ghost” persona has 70% fraud concentration
Implemented GenAI-powered explainable PDF reports via Amazon Bedrock translating SHAP values into plain English

Enterprise RAG Pipelines

Knowledge retrieval across 2TB+ structured and unstructured data

Architected multi-index retrieval (FAISS + ChromaDB + PG-Vector) with cross-encoder re-ranking
Built hallucination detection and citation tracking for grounded LLM responses
Deployed on AWS SageMaker with auto-scaling — 40% cost reduction vs hosted API models

LLM Fine-Tuning & RLHF

Parameter-efficient fine-tuning and human alignment for production LLMs

Fine-tuned LLaMA-2, Mistral with LoRA, QLoRA, PEFT — served via VLLM with CUDA optimization
Built full RLHF pipeline: SFT → Reward Modeling → PPO optimization with KL divergence constraints
Achieved 68% win rate vs SFT baseline and 96% safety compliance

Autonomous AI Agents

Multi-agent systems executing complex workflows without human intervention

Built 8+ specialized agents for insurance underwriting, multilingual caregiving (100+ languages), content generation, and admissions automation
LangChain Agent orchestration connecting LLMs to databases, APIs, and messaging platforms
Reduced processing time by 50% for student admissions workflows

Computer Vision at Scale

Object detection and digital avatar generation

BiiView: Real-time object detection using Meta AI’s Segment Anything Model (SAM) — 90% accuracy across 11M+ images and 1.1B+ masks
Digital People Platform: Hyper-realistic talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement, 30% user satisfaction increase
KYC Platform: Identity verification with OpenCV + AI — 99.9% accuracy, 50% faster document processing

Professional Experience

Role	Company	Period	Highlights
Senior ML/AI Engineer	Verticiti	Mar 2024 – Present	RAG pipelines (2TB+), LLM fine-tuning (LoRA/QLoRA), agentic workflows, C++ inference optimization, SAM object detection at scale.
Senior Generative AI Engineer	MARS (10K+ employees)	Oct 2024 – Jan 2026	Led $1M+ GenAI enterprise transformation. RAG architectures, LLM orchestration, multi-agent frameworks for regulated industries.
Cloud Solution Architect	Cloud Kinetics USA	Aug 2024 – Jan 2026	Designed cloud-native AI solutions on AWS, Azure, GCP for enterprise clients. ETL/ELT, data migration, real-time pipelines.
Senior AI Engineer	Reallytics.ai	Oct 2022 – Jan 2026	Voice AI infra (500+ calls), fraud detection, autonomous agents, RLHF frameworks, cloud architecture on AWS/GCP.
Senior ML Engineer	Afiniti	Oct 2022 – Nov 2023	Production ETL at scale for $1M+ accounts, churn modeling, call routing optimization.
AI Product Engineer	Afiniti	Apr 2021 – Oct 2022	ML pipelines for call-routing, feature engineering on millions of daily records, production monitoring.
Python Engineer	MeryCure	May 2020 – Apr 2021	IoT data pipelines (1000+ devices), anomaly detection, predictive maintenance, Power BI dashboards.

Featured Projects

Sentinel AI — Fraud Detection Ensemble XGBoost + Isolation Forest with 650+ features and GenAI explainability via Amazon Bedrock.

Python XGBoost AWS Bedrock SageMaker

Voice-AI-Platform Real-time voice processing — 500+ concurrent calls, WebSockets, Kafka, gRPC/C++.

Python Kafka gRPC C++ AWS

BiiView — Object Detection Meta AI SAM for video object detection — 90% accuracy across 11M+ images.

Python SAM OpenCV PyTorch

RAG-Enterprise-Search Enterprise RAG with multi-index fusion, re-ranking, and hallucination detection.

Python LangChain FAISS ChromaDB

LLM-Fine-Tuning-LoRA Fine-tuning LLaMA/Mistral with LoRA, QLoRA, PEFT — 40% cost reduction vs hosted APIs.

Python HuggingFace VLLM CUDA

RLHF-LLM-Optimization Full RLHF pipeline — SFT, reward modeling, PPO with KL constraints.

Python PyTorch HuggingFace TRL

Digital People Platform Talking avatars with SadTalker + SpeechT5 TTS — 70% realism improvement.

Python SadTalker OpenAI PyTorch

Agentic-AI-Workflows Autonomous AI agents for enterprise automation with LangChain orchestration.

Python LangChain OpenAI FastAPI

Sunshine Care — Daycare Management SaaS Production-grade multi-center childcare SaaS — 13 modules, 28 REST APIs, real-time multi-site switching, notifications. Competes with Brightwheel, Tadpoles and Lillio.

Next.js TypeScript Prisma SQLite Tailwind CSS

IPM-Website-V2 Professional web platform with modern UI/UX, responsive design and production deployment.

Next.js TypeScript Tailwind CSS

View all repositories →

Kaggle — Research & Technical Notebooks

Hands-on explorations, architecture deep-dives, and production-tested techniques — published on Kaggle.

🤖 Agentic AI: Multi-Agent Orchestration from Scratch Building a multi-agent system with tool registries, planning loops, and guardrails — framework-agnostic patterns from production.	🔌 LLM Function Calling and Tool Use: Complete Guide End-to-end function calling — schema design, validation, chaining, error recovery, and production deployment patterns.
🔍 Advanced RAG: Production Retrieval Guide Multi-query RAG, hybrid search, cross-encoder re-ranking, hallucination detection — beyond basic retrieve-and-generate.	🎯 Prompt Engineering That Actually Works (2026) Chain-of-thought, few-shot, self-consistency, structured output — real techniques with measured results.
👁️ Multimodal AI: Vision-Language Pipeline Vision encoders, cross-attention fusion, image captioning, visual QA — building multimodal systems from components.	💳 Fraud Detection: XGBoost + Isolation Forest Ensemble Ensemble anomaly detection with SHAP explainability, t-SNE visualization, and DBSCAN clustering on imbalanced data.
💬 Sentiment Analysis: NLP Pipeline Comparison TF-IDF vs BERT vs DistilBERT — benchmarking classical and transformer approaches on real text data.	📚 RAG Pipeline: LangChain + FAISS for Document QA End-to-end retrieval-augmented generation with chunk strategies, embedding models, and answer grounding.
🧬 LLM Fine-Tuning: LoRA and QLoRA Guide Parameter-efficient fine-tuning walkthrough — LoRA, QLoRA, PEFT with memory profiling and serving benchmarks.	📈 Time Series: XGBoost Forecasting Feature engineering for temporal data — lag features, rolling stats, calendar effects, walk-forward validation.
🚢 Titanic: Stacking Ensemble Pipeline — Advanced stacking with cross-validated base learners, meta-learner optimization, and feature engineering.

👉 View all notebooks on Kaggle →

Featured Writeups & Datasets

Technical writeups published as Kaggle Datasets — production insights, benchmarks, and reference architectures.

Writeup	What’s Inside
Agentic AI Tool Schemas: Production Patterns	50+ tool/function schemas, 8 agent configs, benchmark data from 500 agent executions
RAG Evaluation Benchmark 2026	1,000 QA pairs with human-annotated relevance scores across 50 retrieval configs
LLM Prompt Engineering Templates	100+ prompt templates with A/B test results from 200 production experiments
Fraud Detection: Feature Engineering Guide	650+ feature catalog, interaction analysis, and 3 fraud persona profiles
ML System Design Patterns: Production	40+ patterns, 25+ anti-patterns, decision frameworks for production ML

Tech Stack

Languages & Frameworks

Generative AI & LLMs

Cloud & Infrastructure

Data Engineering

Education & Certifications


B.S. Computer Science	COMSATS University Islamabad, 2016–2020
Foundations: Data, Data, Everywhere	Google
PostgreSQL: Advanced Queries	LinkedIn Learning
SQL Essential Training	LinkedIn Learning

📰 Latest AI Research Articles

Auto-generated articles with AI-crafted images — published daily to AI-Engineering-Notes

Model Context Protocol And Tool Use _2026-04-15	Llm Fine Tuning At Scale With Lora _2026-04-14
Production Rag Pipelines With Re Ranking _2026-04-13	Real Time Multimodal Llm Integration _2026-04-12

📚 View all articles →

⚡ Recent Activity

📝 Opened issue [Feature] Built-in collection versioning for zero-downtime i in chroma-core/chroma _(2026-04-15)

💬 Commented on Qwen3.5 Image Lora post-training in axolotl-ai-cloud/axolotl _(2026-04-15)

💬 Commented on [Bug] Responses API with code_interpreter file_ids does not in BerriAI/litellm _(2026-04-15)

⭐ Starred mage-ai/mage-ai _(2026-04-15)

⭐ Starred Avaiga/taipy _(2026-04-15)

💬 Commented on Liger-Kernel is now supported on LLaMA-Factory + NPU in hiyouga/LlamaFactory _(2026-04-14)

💬 Commented on [BUG] convertSegmentMetadataToModel debug logs nil instead o in chroma-core/chroma _(2026-04-14)

⭐ Starred umbertogriffo/rag-chatbot _(2026-04-14)

🔬 Currently Researching

Topics discovered daily by a multi-model AI research engine (GPT-4.1, Grok-3, DeepSeek R1, Llama-4)

🔬 Model Context Protocol and Tool Use

🔬 Agentic Coding Assistants Architecture

🔬 LLM Fine-Tuning at Scale with LoRA

🔬 Production RAG Pipelines with Re-ranking

🔬 Real-Time Multimodal LLM Integration

🔬 Real-Time Data Quality Monitoring for ML

📌 Latest Code Snippets

📌 Token Budget Manager — LLM Context Window Optimization (Python) _(2026-04-15)

📌 Retry with Exponential Backoff & Jitter — Production HTTP Client (Python) _(2026-04-14)

📌 Async LLM Gateway with Circuit Breaker & Retry — Production Pattern (Python) _(2026-04-13)

_{🤖 Profile auto-updated on 2026-04-15 09:16 UTC}

GitHub Stats

Currently open to Senior AI/ML Engineer, Staff ML Engineer, or Lead AI Engineer roles.
If you’re building production AI systems and need someone who ships — let’s talk.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
llms.txt		llms.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rehan Malik

Senior AI/ML Engineer · Cloud Solution Architect (AWS) · Open to Opportunities

About Me