Principal Data Scientist | LLM / RAG / Conversational AI
π Bangalore, India
π§ [email protected]
π LinkedIn
π YouTube
Principal Data Scientist with 16+ years of experience (9+ in ML/AI) specializing in:
- LLM-powered conversational systems
- Retrieval-Augmented Generation (RAG)
- Multi-turn reasoning & ambiguity resolution
Built production AI systems serving:
- π₯ ~148K users
- β‘ ~4K queries/day
- π ~5K internal knowledge documents
π Improved system deflection from 26% β ~50%
Enterprise-scale conversational AI system designed to handle ambiguous user queries using LLMs, RAG, multi-turn reasoning, and retrieval optimization.
This system powers an AI Service Desk (AISD) used by ~148K employees, handling:
- β‘ ~4,000 queries/day
- π ~5,000 internal knowledge documents
π Improved automation/deflection from 26% β ~50%
Users frequently submit vague and ambiguous queries, such as:
- βI need helpβ
- βUnable to loginβ
- βReset passwordβ
- Missing context in user queries
- Over-specific refined queries causing LLM failures
- Poor UX (long, unstructured responses)
- Scaling ambiguity resolution across topics
User Query β Query Understanding (Rephrase + Context) β Semantic Retrieval (OpenSearch - Top 100) β Reranking (Cohere - Top K) β LLM Reasoning Layer β Answer Generation β Structured Response (UX Layer)
- Transitioned from single-turn β context-aware system
- Maintains conversation history for better reasoning
- Decision-tree-based disambiguation
- Handles structured flows for known topics
- Example flow:
- Reset password β Ask OS (Windows/Mac) β Ask login status β Provide resolution
π Improved deflection: 26% β 40%
- Handles unconfigured / scalable scenarios
- Uses top-k retrieved documents to generate follow-ups
Problem:
Refined queries became too specific, causing LLM to fail (~10%)
Solution:
- Generate multiple simplified queries
- Remove unnecessary constraints
- Retry retrieval + generation
π Results:
- +10% answer coverage
- +5β6% deflection
Problem:
- Long LLM responses reduce readability in Slack
Solution:
- Structured output:
- π Quick Solution
- Step-by-step instructions
- Notes / context
π Impact:
- +2β3% deflection
- Improved user engagement
| Component | Improvement |
|---|---|
| Multi-turn baseline | 26% |
| Dialogue Model | +11% |
| Mutual Understanding | +10% |
| Query Simplification | +5β6% |
| UX Optimization | +2β3% |
| Final Deflection | ~50% |
- LLMs: LLaMA (llama4.scout)
- Architecture: RAG, Multi-turn reasoning
- Search: OpenSearch (semantic retrieval)
- Ranking: Cohere rerank
- Cloud: Oracle Cloud Infrastructure (OCI)
- Data: ATP DB
- Languages: Python, SQL
- Ambiguity handling is critical in real-world AI systems
- Retrieval quality directly impacts LLM performance
- Over-specific queries can degrade answerability
- UX design is as important as model performance
- Adaptive retrieval strategies (dynamic top-k)
- Better query intent classification
- Reinforcement learning from user feedback
- Latency optimization for real-time interaction
- π₯ Ericsson ML Challenge β 2nd / 4120
- π₯ USAID Forecasting Challenge β 3rd
- π₯ MakeMyTrip β 3rd / 3556
- π BrainWaves (SocGen) β Finalist (4926)
- Building production-grade LLM systems
- Solving ambiguity in real-world AI systems
- Improving retrieval + reasoning + UX together
If you're working on:
- LLM systems
- RAG architectures
- Conversational AI
Feel free to connect or collaborate!

