Enterprise-Grade Intelligent Document Question Answering System
PolicyBot is an AI agent that intelligently answers questions about company policies using Retrieval-Augmented Generation (RAG). Features automatic decision-making between direct LLM responses and document retrieval, dual environment support (local development + Azure production), and comprehensive tooling.
β¨ Intelligent Decision Engine - Automatically classifies queries and routes to optimal response strategy
π Dual Environment - Local (Gemini or Azure OpenAI) + Production (Azure OpenAI only)
π RAG Pipeline - Semantic search with FAISS (local) or Azure AI Search (production)
π¬ Session Memory - Maintains conversation context across interactions
π οΈ Tool Calling - Extensible tool system (document search, calculations)
π³ Docker Ready - Multi-stage builds with security best practices
βοΈ Azure Deployment - Complete CI/CD with GitHub Actions
π Monitoring - Application Insights integration
π§ͺ Comprehensive Tests - Unit, integration, and API tests
π Production Docs - Architecture diagrams, API reference, deployment guides
Component Layers:
- API Layer: FastAPI with OpenAPI docs
- Agent Layer: Decision engine, memory, tools
- LLM Layer: Google Gemini / Azure OpenAI
- RAG Layer: Document processing, embeddings, vector search
- Data Layer: Company policy documents
β See docs/architecture.md for detailed architecture
- Python 3.11+
- For local development (default): Google Gemini API key (Get free key)
- For local development (optional): Azure OpenAI access (endpoint, API key, deployment names)
# 1. Clone and setup
cd policybot-ai-agent
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# For testing/dev:
pip install -r requirements-dev.txt
# 2. Configure (choose your LLM provider)
cp .env.example .env
# Edit .env: For Gemini (default): GOOGLE_GEMINI_API_KEY=your_key_here
# Or for Azure OpenAI: LLM_PROVIDER=azure, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, etc.
# 3. Initialize vector store
python scripts/setup_vectorstore.py
# 4. Run application
python -m app.main
# 5. Test
python scripts/test_agent.py# 1. Clone and setup
cd policybot-ai-agent
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
# For testing/dev:
pip install -r requirements-dev.txt
# 2. Configure (choose your LLM provider)
copy .env.example .env
# Edit .env: For Gemini (default): GOOGLE_GEMINI_API_KEY=your_key_here
# Or for Azure OpenAI: LLM_PROVIDER=azure, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY, etc.
# 3. Initialize vector store
python scripts\setup_vectorstore.py
# 4. Run application
python -m app.main
# 5. Test
python scripts\test_agent.pyAPI Docs: http://localhost:8000/docs
PolicyBot is designed to solve real-world enterprise challenges. Here are key scenarios demonstrating its value:
Scenario: A new hire, Sarah, needs to understand her benefits and initial setup requirements but doesn't want to overwhelm her manager.
- User Query: "When does my health insurance coverage start?"
- Agent Action: Retrieves the
benefits_guide.txtdocument. - Response: "According to the benefits guide, health insurance coverage begins on the first day of the month following your start date."
- Benefit: Reduces "HR fatigue" and empowers new employees to self-serve information directly.
Scenario: The HR team is drowning in repetitive questions during open enrollment season.
- User Query: "What is the difference between the standard and premium dental plan?"
- Agent Action: Searches
benefits_guide.txtand synthesizes a comparison. - Response: Explains the coverage limits and deductibles for both plans side-by-side.
- Benefit: Frees up HR professionals to focus on complex, sensitive employee relations issues rather than FAQ answering.
Scenario: An employee is unsure about the rules for using personal devices for work.
- User Query: "Can I check my work email on my personal phone?"
- Agent Action: Consults
it_policy.txtandmobile_device_policy.txt. - Response: "Yes, but you must install the MDM (Mobile Device Management) profile and enforce a 6-digit passcode as per the IT Security Policy."
- Benefit: Ensures standardized, policy-compliant answers are given every time, reducing security risks.
Scenario: A remote worker in a different time zone needs immediate clarification on a leave policy for an emergency.
- User Query: "What is the bereavement leave policy?"
- Agent Action: Instant retrieval from
leave_policy.txt. - Response: "You are entitled to up to 5 days of paid bereavement leave for immediate family members..."
- Benefit: Provides instant support regardless of time zone or HR team availability.
| Component | Technology | Purpose |
|---|---|---|
| API | FastAPI 0.109 | High-performance async web framework |
| Validation | Pydantic 2.5 | Data validation and settings |
| Language | Python 3.11 | Modern async/await support |
Local Development (Configurable via LLM_PROVIDER env var):
- LLM: Google Gemini 1.5-flash (default,
LLM_PROVIDER=gemini) OR Azure OpenAI GPT-4 (LLM_PROVIDER=azure) - Embeddings: Gemini embedding-001 (768-dim) OR Azure text-embedding-ada-002 (1536-dim)
- Vector Store: FAISS (local file storage, separate indices per provider:
faiss_index_gemini/faiss_index_azure)
Production (Azure Only - Not Configurable):
- LLM: Azure OpenAI GPT-4 (always)
- Embeddings: text-embedding-ada-002 (1536-dim)
- Vector Store: Azure AI Search (managed, scalable, cloud-based)
- Containerization: Docker with multi-stage builds
- Orchestration: Docker Compose
- Cloud: Azure App Service + Container Registry
- CI/CD: GitHub Actions
- Monitoring: Azure Application Insights
- Testing: pytest with async support
policybot-ai-agent/
βββ app/ # Application source
β βββ agents/ # AI agent core
β β βββ agent.py # Decision engine β
β β βββ memory.py # Session management
β β βββ tools.py # Tool calling
β βββ api/ # FastAPI routes
β β βββ routes.py
β βββ llm/ # LLM integrations
β β βββ llm_client.py # Unified client (Gemini/Azure) β
β β βββ prompts.py # Prompt engineering
β βββ rag/ # RAG pipeline
β β βββ document_processor.py
β β βββ vector_store.py # FAISS + Azure Search β
β β βββ retriever.py
β βββ models/ # Pydantic schemas
β βββ config.py # Configuration β
β βββ main.py # FastAPI app entry point
βββ data/
β βββ documents/ # Policy documents
β βββ vector_stores/ # FAISS indices
βββ deployment/
β βββ azure/ # Azure manual deployment scripts
β βββ terraform/ # Infrastructure as Code (recommended)
β βββ Dockerfile # Multi-stage production build
β βββ docker-compose.yml
βββ docs/ # Project documentation
β βββ architecture.md # System design
β βββ api.md # API reference
β βββ deployment.md # Deployment guide
β βββ testing.md # Testing guide
βββ scripts/
β βββ setup_vectorstore.py # Initialize embeddings
β βββ test_agent.py # Interactive testing
βββ tests/ # Test suite (unit, integration, performance)
βββ .env.example # Environment template
βββ DEPLOYMENT_GUIDE.md # Comprehensive deployment guide
βββ QUICKSTART.md # Quick start guide
βββ requirements.txt # Python dependencies
βββ README.md # This file
β = Core components implementing assignment requirements
- Quick Start Guide - Get running in 5 minutes
- API Documentation - Complete API reference
- Deployment Guide - Local, Docker, and Azure deployment
- Architecture Overview - System design and data flow
- Design Decisions - Why we made key choices
- Limitations & Roadmap - Current constraints and future plans
Decision: Use Google Gemini as the default LLM for local development (with Azure OpenAI as an alternative option)
Rationale:
- β Fast inference (optimized for speed)
- β Generous free tier (no credit card required)
- β Simple setup (just API key needed)
- β Good quality for development/testing
- β Easy switch to Azure for production
- β
Developers can still use Azure OpenAI locally if preferred (set
LLM_PROVIDER=azure)
Decision: Separate local (Gemini OR Azure OpenAI + FAISS) and production (Azure OpenAI only + AI Search)
Rationale:
- Local: Fast iteration, zero cloud costs with Gemini (default), or use Azure for testing production setup
- Production: Enterprise SLAs, compliance, scalability, managed services with Azure OpenAI (required)
- Flexibility: Local environment supports both providers via
LLM_PROVIDERconfig variable - Cost-Effective: Developers don't need Azure subscriptions (Gemini free tier), but can opt-in if needed
| Feature | FAISS (Local) | Azure AI Search (Production) |
|---|---|---|
| Setup | Instant, no dependencies | Requires Azure resource |
| Cost | Free | ~$250/month |
| Performance | Fast (local) | Fast + distributed |
| Scalability | Limited to local resources | Auto-scaling |
| Features | Vector similarity only | Hybrid search (keyword + vector) |
| Backup | Manual file copy | Automatic Azure backup |
Best of both worlds: Simple for development, enterprise-ready for production
Decision: In-memory dictionary for conversation history
Rationale:
- Simple: No external dependencies or setup
- Sufficient: Handles typical use cases well
- Fast: No network latency
- Upgradeable: Easy migration to Redis/CosmosDB later
Production Note: For multi-instance deployments, migrate to Redis or Azure CosmosDB for distributed sessions.
Decision: FastAPI over Flask, Django, or other frameworks
Rationale:
- Async Native: Built-in async/await support for AI operations
- Performance: One of the fastest Python frameworks
- Auto Docs: OpenAPI/Swagger generated automatically
- Type Safety: Full Pydantic integration
- Modern: Designed for Python 3.7+ features
How it works:
- Query Classification: LLM classifies intent (low temperature for consistency)
- Routing Decision:
GENERALquestions β Direct LLM response (faster, cheaper)POLICYquestions β RAG pipeline (accurate, source-backed)CLARIFICATIONβ Request more information
- Fallback Strategy: Defaults to RAG on uncertainty (safer)
Why this approach?
- Optimizes for both speed and accuracy
- Reduces unnecessary vector searches
- Provides transparent source attribution
- Handles edge cases gracefully
The project includes 5 production-quality company policy documents:
-
company_handbook.txt (3,500 lines)
- Company culture, values, employment policies
- Work hours, dress code, performance reviews
-
it_policy.txt (2,500 lines)
- Security requirements, acceptable use
- Password management, data protection, email policies
-
leave_policy.txt (1,500 lines)
- PTO, sick leave, parental leave
- FMLA, bereavement, military leave, sabbaticals
-
code_of_conduct.txt (1,500 lines)
- Ethics, conflicts of interest, confidentiality
- Compliance, reporting, non-retaliation
-
benefits_guide.txt (2,000 lines)
- Health insurance, dental, vision
- 401k, HSA/FSA, EAP, additional benefits
Total: 11,000+ lines of realistic, comprehensive policy content
# Install test dependencies
pip install -r requirements-dev.txt
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_agent.py -v
# Run with coverage
pytest tests/ --cov=app# Start interactive agent
python scripts\test_agent.py
# Try these queries:
# "What is the parental leave policy?"
# "How many vacation days do I get?"
# "What are the password requirements?"# Health check
curl http://localhost:8000/health
# Ask a question
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"query":"What is the 401k match?"}'cd deployment
docker-compose up --buildAccess at http://localhost:8000
docker build -t policybot-ai-agent -f deployment/Dockerfile .
docker run -p 8000:8000 --env-file .env policybot-ai-agent# Infrastructure as Code - Production Ready
cd deployment/terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your configuration
terraform init
terraform plan
terraform applyβ See deployment/terraform/README.md for complete Terraform guide
# Set Azure credentials as environment variables
export AZURE_OPENAI_API_KEY="your_key"
export AZURE_SEARCH_API_KEY="your_key"
# Run deployment
cd deployment/azure
chmod +x deploy.sh
./deploy.sh- Add secrets to GitHub repository
- Push to
mainbranch - Automatic deployment via GitHub Actions
β See docs/deployment.md for complete guide
-
Session Storage π΄
- Limitation: In-memory, not distributed
- Impact: Single-instance only, sessions lost on restart
- Mitigation: Migrate to Redis or Azure CosmosDB for distributed sessions
- Effort: Medium (2-3 days)
-
No Authentication π΄
- Limitation: API publicly accessible without authentication
- Impact: Security risk, anyone can query, potential abuse
- Mitigation: Implement JWT authentication or API key system
- Effort: Medium (2-3 days)
-
No Rate Limiting π΄
- Limitation: No request throttling per user/IP
- Impact: Vulnerable to abuse, uncontrolled costs, DDoS risk
- Mitigation: Add rate limiting middleware (e.g., SlowAPI)
- Effort: Low (1 day)
-
Chunking Strategy π‘
- Limitation: Simple word-based splitting
- Impact: May split mid-sentence, reduced RAG accuracy
- Mitigation: Implement semantic chunking with sentence boundaries
- Effort: Medium (2-3 days)
-
Limited Test Coverage π‘
- Limitation: ~30% estimated coverage, no integration tests
- Impact: Harder to maintain, refactor, and ensure quality
- Mitigation: Increase to 80%+ coverage with mocked LLM calls
- Effort: High (1-2 weeks)
-
No Caching Layer π‘
- Limitation: No query result caching
- Impact: Repeated queries hit LLM every time, higher costs
- Mitigation: Add Redis caching with TTL-based expiration
- Effort: Medium (2-3 days)
-
No Authorization/RBAC π‘
- Limitation: No role-based access control
- Impact: All authenticated users have same permissions
- Mitigation: Implement RBAC for admin vs user endpoints
- Effort: Medium (3-4 days)
-
Basic Monitoring π’
- Limitation: Application Insights only, limited custom metrics
- Impact: Harder to debug and optimize performance
- Mitigation: Add Prometheus metrics and Grafana dashboards
- Effort: Medium (3-4 days)
-
Single Language π’
- Limitation: English only
- Impact: Can't handle multilingual queries
- Mitigation: Add language detection and translation
- Effort: High (1-2 weeks)
-
No Input Sanitization π’
- Limitation: Beyond Pydantic validation
- Impact: Potential for prompt injection attacks
- Mitigation: Add input sanitization and validation rules
- Effort: Low (1-2 days)
-
No Secrets Management π’
- Limitation: Secrets in environment variables only
- Impact: Less secure, harder to rotate secrets
- Mitigation: Integrate Azure Key Vault
- Effort: Low (1-2 days)
-
No Advanced RAG Features π’
- Limitation: Basic vector search only
- Impact: Could improve accuracy with hybrid search, re-ranking
- Mitigation: Add BM25 hybrid search, cross-encoder re-ranking
- Effort: High (1-2 weeks)
Focus: Security & Stability
- Add JWT authentication with token validation
- Implement rate limiting (60 requests/minute per user)
- Add input sanitization for prompt injection prevention
- Improve error handling with structured error responses
- Add API key support as alternative to JWT
Focus: Scalability & Quality
- Migrate to Redis for distributed sessions
- Increase test coverage to 80%+ with mocked LLM calls
- Add query result caching with Redis
- Implement semantic chunking with sentence boundaries
- Add integration tests for end-to-end workflows
- Integrate Azure Key Vault for secrets management
Focus: Features & Observability
- Hybrid search (BM25 + vector similarity)
- Cross-encoder re-ranking for better relevance
- Prometheus metrics + Grafana dashboards
- Admin dashboard for management and analytics
- Role-based access control (RBAC)
- Query expansion and reformulation
- Conversation summarization
- Export chat history feature
Focus: Advanced Features
- Multi-document reasoning and synthesis
- Fine-tuned embeddings on company data
- Real-time document updates and indexing
- Multi-agent collaboration system
- Advanced analytics and reporting
- A/B testing framework for prompts
- Multi-language support with translation
- Voice interface integration
Focus: Enterprise & Scale
- Mobile application (iOS/Android)
- Multi-tenancy support for SaaS
- Advanced security (SSO, SAML, OAuth)
- Compliance features (audit logs, data retention)
- Custom model fine-tuning pipeline
- Federated learning for privacy
- Edge deployment support
- GraphQL API alternative
| Priority | Limitation | Impact | Effort | ROI |
|---|---|---|---|---|
| P0 | Authentication | High | Medium | High |
| P0 | Rate Limiting | High | Low | Very High |
| P0 | Session Storage | High | Medium | High |
| P1 | Test Coverage | Medium | High | High |
| P1 | Caching Layer | Medium | Medium | High |
| P1 | Semantic Chunking | Medium | Medium | Medium |
| P2 | RBAC | Medium | Medium | Medium |
| P2 | Monitoring | Medium | Medium | Medium |
| P2 | Hybrid Search | Medium | High | Medium |
| P3 | Multi-language | Low | High | Low |
| P3 | Secrets Vault | Low | Low | Medium |
These improvements provide maximum value with minimal effort:
- Rate Limiting (1 day) - Prevents abuse, controls costs
- Input Sanitization (1-2 days) - Improves security
- Azure Key Vault (1-2 days) - Better secrets management
- Structured Errors (1 day) - Better debugging and UX
- Health Check Enhancement (1 day) - Verify LLM connectivity
When implementing improvements, be aware of potential breaking changes:
- Authentication: Existing API clients will need to authenticate
- Rate Limiting: High-volume users may need quota increases
- Session Storage: Migration to Redis requires data migration strategy
- RBAC: Existing users may need role assignments
We welcome contributions in these areas:
- π§ͺ Test coverage improvements
- π Documentation enhancements
- π§ Bug fixes and optimizations
- β¨ New features from the roadmap
- π Multi-language support
- π¨ UI/UX improvements
- β Multi-stage Docker builds
- β Non-root container user
- β Environment variable secrets (never committed)
- β Input validation (Pydantic)
- β CORS configuration
- β HTTPS enforcement (Azure)
- β Secrets management (Azure Key Vault ready)
- Direct LLM: 0.5-1.5s
- RAG Pipeline: 1.5-3s
- Document Search: 0.1-0.3s
- Local: Single instance, suitable for development
- Production: Azure App Service auto-scaling (1-10 instances)
- Database: Vector stores handle millions of documents
This is a portfolio/assignment project. For questions or suggestions:
- Open an issue
- Submit a pull request
- Contact: [email protected]
This project is for portfolio and interview demonstration purposes.
- Google Gemini - Fast and accessible AI for development
- Azure OpenAI - Enterprise-grade LLM services
- FastAPI - Excellent Python web framework
- FAISS - Efficient similarity search
- OpenAI - Pioneering work in LLMs
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
-
GitHub: https://github.com/devopsexpertlearning/policybot-ai-agent
-
Author: Ravi Kumar Kushwaha
Built with β€οΈ for the AI Agent Assignment
Production-ready β’ Enterprise-grade β’ Fully documented
