Inspiration

Organizations today struggle with information overload. Knowledge is scattered across documentation, incident tickets, policies, and runbooks. Teams waste hours searching for the right information during critical incidents, and context-switching between multiple systems slows down response times.

We built EverydayElastic to solve this problem: a conversational AI platform that brings together semantic search and generative AI to provide instant, grounded answers with actionable follow-ups.

What it does

EverydayElastic is an AI-powered knowledge search platform that combines Elasticsearch's semantic search with Google Vertex AI's Gemini models. Users can:

  1. Ask natural language questions about their knowledge base (incidents, policies, documentation)
  2. Get AI-generated responses grounded in retrieved documents with source citations
  3. Take immediate action with suggested follow-ups like posting incident alerts to Slack
  4. Search across 7,800+ documents with sub-second latency using hybrid search

Key Features:

  • Semantic Search: Hybrid retrieval combining BM25 keyword search and dense vector embeddings
  • Intelligent Reranking: Google Vertex AI reranking ensures the most relevant results surface first
  • Context-Aware Responses: Gemini 2.5 Flash generates grounded answers with citations to prevent hallucination
  • Slack Integration: Automatically detects critical incidents and suggests posting to team channels with rich formatting
  • Conversation Management: Persistent chat history with session management
  • Real-time Interface: Modern React-based UI with markdown rendering and expandable sources
  • Observability Built-in: Prometheus metrics from backend/app/core/metrics.py and OpenTelemetry traces exported via backend/app/dependencies.py into Elastic APM dashboards

How we built it

Technology Stack

Backend (Python)

  • FastAPI: High-performance async web framework for REST API
  • Elasticsearch 8.15: Semantic search with Open Inference API integration
  • Google Vertex AI: Text generation (Gemini 2.5 Flash Lite), embeddings, and reranking
  • Slack Web API: Rich message formatting with Block Kit
  • OpenTelemetry & Prometheus: Observability and metrics

Frontend (TypeScript/React)

  • Next.js 15: Server-side rendering with App Router
  • Tailwind CSS: Utility-first styling
  • React Markdown: Rendered AI responses with syntax highlighting
  • Lucide Icons: Modern iconography

Infrastructure

  • Google Cloud Run: Serverless container deployment
  • Elasticsearch Cloud: Managed cluster with semantic_text mapping
  • Secret Manager: Secure credential management

Architecture

The system follows a three-tier architecture:

  1. User Layer: Web browser and Slack workspace for notifications
  2. Application Layer:
    • Frontend (Next.js) handles UI, conversation state, and user interactions
    • Backend (FastAPI) orchestrates search, AI generation, and integrations
  3. External Services: Elasticsearch, Vertex AI, and Slack APIs

Search Pipeline:

  1. User query → Backend extracts keywords and context
  2. Elasticsearch performs hybrid search (BM25 + semantic vectors)
  3. Vertex AI reranks results by relevance
  4. Top 4 documents form context for LLM
  5. Gemini generates grounded response with citations
  6. Workflows engine suggests follow-up actions (e.g., Slack alerts)

Data Flow:

  • Frontend sends POST to /chat/completions with user message
  • Backend queries Elasticsearch using Open Inference API for embeddings
  • Results reranked via Vertex AI Reranking API
  • Context built from top documents and passed to Gemini
  • Response returned with sources and suggested actions
  • User can trigger actions (e.g., post to Slack) via /chat/actions endpoint

Implementation Highlights

Hybrid Search

# Combines keyword (BM25) and semantic (vector) search
{
    "query": {
        "bool": {
            "should": [
                {"match": {"content": query}},  # BM25
                {"semantic": {"field": "content", "query": query}}  # Vectors
            ]
        }
    }
}

Grounded Generation

  • Context from retrieved documents included in system prompt
  • Citations enforced in response format
  • Temperature kept low (0.3) for factual accuracy

Slack Integration

  • OAuth-based Web API for rich formatting
  • Block Kit for structured message display
  • Automatic fallback to webhooks if API fails
  • Incident detection based on severity tags

Observability Instrumentation

  • Prometheus /metrics endpoint backed by REQUEST_LATENCY and SEARCH_SOURCE_COUNTER in backend/app/core/metrics.py
  • configure_tracing() in backend/app/dependencies.py wires OpenTelemetry OTLP exports (Gemini + Elasticsearch spans) into Elastic APM with environment-driven configuration

Challenges we ran into

  1. Elasticsearch Open Inference API: Initially struggled with the correct inference endpoint configuration. Required understanding the semantic_text field type and how it integrates with external embedding models.

  2. Reranking Integration: Getting Vertex AI reranking to work with Elasticsearch results required proper document formatting and handling of the rerank response structure.

  3. Token Context Management: Balancing between providing enough context to the LLM (for accurate responses) while staying within token limits. Solved by limiting to top 4 documents and truncating very long documents.

  4. Slack OAuth Tokens: Managing OAuth token refresh and handling expired credentials gracefully. Implemented a fallback mechanism to webhooks when API calls fail.

  5. Response Latency: Initial implementation took 5-6 seconds per query. Optimized by:

    • Reducing search result size from 20 to 8 documents
    • Async/await throughout the backend
    • Using Gemini 2.5 Flash Lite instead of Pro
    • Parallel execution where possible
  6. Citation Accuracy: Early versions sometimes hallucinated sources. Fixed by:

    • Strict system prompts with citation requirements
    • Lower temperature (0.3)
    • Validating returned citations against actual sources

Accomplishments that we're proud of

  1. End-to-end Integration: Successfully integrated three major platforms (Elasticsearch, Vertex AI, Slack) into a cohesive system that feels seamless to the end user.

  2. Production-Ready Architecture: Built with observability, metrics, structured logging, and proper error handling from day one.

  3. Sub-3-Second Response Time: Achieved fast end-to-end latency (2-3 seconds) despite multiple API calls and AI processing.

  4. Grounded AI: Zero hallucination issues in testing because all responses are grounded in retrieved documents with citations.

  5. Rich Slack Integration: Not just basic text notifications—implemented proper Block Kit formatting with structured incident data.

  6. Clean Separation of Concerns: Modular architecture with clear boundaries between search, generation, and actions makes the codebase maintainable and extensible.

What we learned

  1. Elasticsearch Semantic Search: Deep dive into hybrid search, semantic_text field types, and Open Inference API for embeddings and reranking.

  2. Vertex AI APIs: Learned about different Gemini models (Flash vs Pro), function calling, and the importance of prompt engineering for grounded responses.

  3. RAG Architecture: Practical experience with Retrieval-Augmented Generation patterns, context window management, and citation handling.

  4. Async Python: Leveraged FastAPI's async capabilities for concurrent I/O operations (Elasticsearch + Vertex AI in parallel).

  5. Next.js 15: Worked with the new App Router, Server Components, and modern React patterns.

  6. Production Observability: Implemented structured logging, Prometheus metrics, and health check endpoints for monitoring.

Live Deployment

Quick checks:

# Health
curl -s https://everydayelastic-backend-1064261519338.us-central1.run.app/health | jq

# Slack webhook action (posts to webhook-bound channel)
curl -s -X POST "https://everydayelastic-backend-1064261519338.us-central1.run.app/chat/actions" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "slack_webhook",
    "payload": {
      "channel": "sev-1-war-room",
      "message": "Cloud Run webhook test from EverydayElastic",
      "ticket_info": {"severity":"Sev-1","status":"Investigating","owner":"On-call"}
    }
  }'

What's next for EverydayElastic

Short‑term (2–4 weeks)

  • Streaming responses (SSE) for token‑by‑token replies
  • Query expansion (synonyms, reformulation) to boost recall
  • Conversation memory with session resume

Medium‑term (2–3 months)

  • Bi‑directional Slack (slash commands, channel ID lookups)
  • Auto‑escalation rules with PagerDuty handoff
  • Multilingual retrieval + generation

Long‑term (6+ months)

  • Function calling to trigger actions (create tickets, update status)
  • Multimodal search (PDF/images with OCR)
  • Analytics + RBAC and audit logs

Built With

Share this project:

Updates