Skip to content

N-45div/EverydayElastic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EverydayElastic

When incidents hit, every second counts. Stop searching. Start fixing.

The Problem

IT teams waste critical time during incidents searching through scattered documentation, past tickets, runbooks, and policies. Context-switching between systems slows down response times, and finding the right information becomes a bottleneck when you need it most.

The Solution

EverydayElastic is an AI copilot for IT operations that delivers instant, grounded answers from thousands of documents. Ask in plain Englishβ€”"What's the runbook for payment gateway timeouts?" or "Show me Sev-1 incidents from last 24 hours"β€”and get cited answers in under 3 seconds. When critical incidents are detected, the system automatically suggests posting alerts to your Slack war room with all the context your team needs.

Built for On-Call Teams, SREs, and IT Ops

No hallucinations. All responses are grounded in your actual documents with citations.
No context-switching. Search, answer, and actionβ€”all in one interface.
No wasted time. Sub-3-second response time, even across 7,800+ documents.

🎯 Key Capabilities

  • Conversational Search: Ask questions in natural language across incident tickets, runbooks, policies, and documentation
  • Grounded AI Responses: Gemini-powered answers with citations from retrieved documentsβ€”zero hallucinations
  • Hybrid Search: Combines BM25 keyword search with semantic vector retrieval for best relevance
  • Intelligent Reranking: Vertex AI reranks results to surface the most relevant information first
  • Actionable Intelligence: Automatically detects critical incidents and suggests follow-up actions
  • Slack Integration: Post incident alerts directly to team channels with rich formatting (severity, owner, status)
  • Multi-Domain Knowledge: Search across structured and unstructured data simultaneously

πŸš€ Quick Demo

Example Query:
"What's the runbook for payment gateway timeouts?"

Response (2.8 seconds):

  • βœ… Retrieves relevant runbooks from knowledge base
  • βœ… Provides step-by-step resolution with citations
  • βœ… Shows related past incidents
  • βœ… Suggests posting to #sev-1-war-room if severity is high

Tech Under the Hood:

  • Elasticsearch hybrid search (BM25 + vectors) across 7,800+ documents
  • Vertex AI reranking for precision
  • Gemini 2.5 Flash for grounded generation
  • Slack Block Kit for rich notifications

πŸ—οΈ Architecture Overview

Technology Stack

Backend

  • FastAPI 0.115.0 - High-performance async Python web framework
  • Elasticsearch 8.15.1 - Semantic search with Open Inference API
  • Google Cloud Vertex AI - Text generation (Gemini 2.5 Flash Lite) and embeddings
  • Slack Web API - Rich message formatting with Block Kit
  • OpenTelemetry - Observability and metrics collection
  • Prometheus - Metrics exposition

Frontend

  • Next.js 15.5.5 - React framework with App Router and Server Components
  • Tailwind CSS 4 - Utility-first styling
  • Lucide Icons - Modern icon library
  • React Markdown - Rendered AI responses with syntax highlighting

Infrastructure

  • Google Cloud Run - Serverless container deployment
  • Elasticsearch Cloud - Managed Elasticsearch cluster
  • Google Cloud Storage - Document storage and processing

See ARCHITECTURE.md for detailed architecture diagram and data flow.

Prerequisites

  • Node.js 20+
  • Python 3.11+
  • npm (for frontend) & pip/uv (for backend)
  • gcloud CLI (>= 471.0.0) with an authenticated account and active project
  • Elastic Cloud deployment with semantic search enabled
  • Google Cloud service account JSON with Vertex AI permissions (set via GOOGLE_APPLICATION_CREDENTIALS)

Environment Variables

Create a .env (or export in terminal) for both backend and frontend as needed.

# .env (root directory)
ELASTIC_ENDPOINT="https://<your-deployment>.es.us-central1.gcp.cloud.es.io"
ELASTIC_USERNAME="elastic"
ELASTIC_PASSWORD="your-password"
VERTEX_PROJECT_ID="your-gcp-project-id"
EMBEDDING_INFERENCE_ID="google_vertex_ai_embeddings"
RERANKER_INFERENCE_ID="google_vertex_ai_rerank"
VERTEX_MODEL="gemini-2.5-flash-lite"
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Observability (optional but recommended)
ENABLE_TRACING=true
OTEL_EXPORTER_ENDPOINT="https://<your-deployment>.apm.us-central1.gcp.cloud.es.io:443/v1/traces"
OTEL_EXPORTER_HEADERS="Authorization=Bearer <elastic-apm-token>"
OTEL_EXPORTER_INSECURE=false

# Slack Integration (optional)
SLACK_ACCESS_TOKEN=xoxe.xoxp-1-...
SLACK_REFRESH_TOKEN=xoxe-1-...
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
DEFAULT_SLACK_CHANNEL="sev-1-war-room"

Frontend expects the API base URL via frontend/.env.local:

NEXT_PUBLIC_API_BASE_URL="http://localhost:8000"

πŸš€ Get Started in 5 Minutes

What You Need

  • Python 3.11+ and Node.js 20+
  • Google Cloud account with Vertex AI API enabled (free trial available)
  • Elasticsearch Cloud deployment (14-day free trial)
  • Slack workspace (optional, for incident notifications)

Backend Setup (2 minutes)

# 1. Clone and navigate to project
cd backend

# 2. Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Copy environment variables from root .env
cp ../.env .env

# 5. Start development server
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

βœ… Backend Running! Verify at:

  • http://localhost:8000/health - Health check
  • http://localhost:8000/integrations/status - Integration status
  • http://localhost:8000/metrics - Prometheus metrics

Frontend Setup (2 minutes)

# 1. Navigate to frontend
cd frontend

# 2. Install dependencies
npm install

# 3. Create environment file
echo "NEXT_PUBLIC_API_BASE_URL=http://localhost:8000" > .env.local

# 4. Start development server
npm run dev

πŸŽ‰ You're Live!

  • Landing page: http://localhost:3000
  • Chat with AI: http://localhost:3000/copilot

Try asking: "Show me Sev-1 incidents from the last 24 hours" or "What's the runbook for database connection failures?"

Cloud Run Deployment

Build and push backend and Frontend container

Ensure backend/Dockerfile exists (see notes below). Then run:

export PROJECT_ID="<gcp-project>"
export REGION="us-central1"
export SERVICE="everydayelastic-backend"

gcloud builds submit ./backend \
  --tag "gcr.io/${PROJECT_ID}/${SERVICE}:$(git rev-parse --short HEAD)"

Deploy to Cloud Run with required environment variables and secrets:

gcloud run deploy ${SERVICE} \
  --image "gcr.io/${PROJECT_ID}/${SERVICE}:$(git rev-parse --short HEAD)" \
  --region ${REGION} \
  --platform managed \
  --allow-unauthenticated \
  --set-env-vars "VERTEX_PROJECT_ID=${PROJECT_ID},VERTEX_LOCATION=us-central1,VERTEX_MODEL=gemini-2.5-flash-lite,EMBEDDING_INFERENCE_ID=google_vertex_ai_embeddings,RERANKER_INFERENCE_ID=google_vertex_ai_rerank" \
  --set-secrets "ELASTIC_ENDPOINT=elastic-endpoint:latest,ELASTIC_USERNAME=elastic-username:latest,ELASTIC_PASSWORD=elastic-password:latest,SLACK_ACCESS_TOKEN=slack-access-token:latest,SLACK_REFRESH_TOKEN=slack-refresh-token:latest,SLACK_CLIENT_ID=slack-client-id:latest,SLACK_CLIENT_SECRET=slack-client-secret:latest,SLACK_WEBHOOK_URL=slack-webhook-url:latest,DEFAULT_SLACK_CHANNEL=default-slack-channel:latest,ENABLE_TRACING=enable-tracing:latest,OTEL_EXPORTER_ENDPOINT=otel-exporter-endpoint:latest,OTEL_EXPORTER_HEADERS=otel-exporter-headers:latest,OTEL_EXPORTER_INSECURE=otel-exporter-insecure:latest" \
  --service-account "vertex-runner@${PROJECT_ID}.iam.gserviceaccount.com"

Ensure frontend/Dockerfile exists (see notes below). Then run:

export FRONTEND_SERVICE="everydayelastic-frontend"
export TAG="$(git rev-parse --short HEAD)"
export API_BASE_URL=$(gcloud secrets versions access latest --secret=next-public-api-base-url)

gcloud builds submit ./frontend \
  --config ./frontend/cloudbuild.yaml \
  --substitutions=_SERVICE=${FRONTEND_SERVICE},_TAG=${TAG},_NEXT_PUBLIC_API_BASE_URL="${API_BASE_URL}"

Deploy to Cloud Run with required environment variables and secrets:

gcloud run deploy ${FRONTEND_SERVICE} \
  --image "gcr.io/${PROJECT_ID}/${FRONTEND_SERVICE}:${TAG}" \
  --region ${REGION} \
  --platform managed \
  --allow-unauthenticated \
  --service-account "elastic-vertex@${PROJECT_ID}.iam.gserviceaccount.com" \
  --set-secrets "NEXT_PUBLIC_API_BASE_URL=next-public-api-base-url:latest"

Mount the service-account JSON via Secret Manager or use Workload Identity Federation (recommended) instead of shipping raw keys.

Live Deployment

Quick checks:

# Health
curl -s https://everydayelastic-backend-1064261519338.us-central1.run.app/health | jq

# Slack webhook action (posts to webhook-bound channel)
curl -s -X POST "https://everydayelastic-backend-1064261519338.us-central1.run.app/chat/actions" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "slack_webhook",
    "payload": {
      "channel": "sev-1-war-room",
      "message": "Cloud Run webhook test from EverydayElastic",
      "ticket_info": {"severity":"Sev-1","status":"Investigating","owner":"On-call"}
    }
  }'

3. Post-deployment

  • Hit /integrations/status on the backend URL to validate Elastic and Vertex connectivity.
  • Load the frontend Cloud Run URL and confirm /copilot can generate responses with cited sources.
  • Configure a custom domain or Cloud Load Balancer if a single entrypoint is required.

Observability & Logging

  • Backend logging includes logger.exception("Vertex AI generation failed") in backend/app/api/routes.py for debugging.
  • Enable Cloud Logging and Cloud Trace on both services for deeper insights.
  • Elastic Observability can ingest backend structured logs by forwarding from Cloud Logging.

πŸ“Š How It Works

Smart Search Pipeline: Fast & Accurate

Your query goes through multiple layers of intelligence to deliver the best answer:

  1. Query Understanding: Automatically detects context (incident vs. policy vs. runbook) and applies smart filters
  2. Hybrid Retrieval: Combines keyword matching (BM25) with semantic understanding (vector search) using Elasticsearch
  3. Relevance Reranking: Vertex AI reranks results to ensure the most relevant documents surface first
  4. Context Assembly: Top 4 most relevant documents form the knowledge base for AI generation
  5. Grounded Response: Gemini 2.5 Flash generates answers with citationsβ€”no hallucinations, just facts

Result: Sub-3-second end-to-end latency from query to cited answer.

Incident Management Made Easy

When the AI detects critical incidents (Sev-1, Sev-2), it suggests posting to Slack automatically:

  • Rich Alerts: Uses Slack Block Kit for structured, scannable incident data
  • All Context Included: Severity, status, owner, affected serviceβ€”everything your team needs
  • No Manual Work: One click from detection to war room notification
  • Reliable Delivery: Primary OAuth-based Web API with automatic webhook fallback
  • Team Coordination: Posts to configured channels (e.g., #sev-1-war-room)

πŸ§ͺ Testing

Backend Tests

cd backend
pytest

Frontend Build Verification

cd frontend
npm run lint
npm run build

Integration Status Check

curl http://localhost:8000/integrations/status | jq

Expected output:

{
  "elastic": {"status": "green", "cluster_name": "..."},
  "vertex_ai": {"status": "enabled", "model": "gemini-2.5-flash-lite"},
  "slack": {"status": "enabled", "method": "web_api", "channel": "sev-1-war-room"}
}

πŸ“ Project Structure

everydayelastic/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”‚   └── routes.py          # Chat and action endpoints
β”‚   β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”‚   β”œβ”€β”€ config.py          # Configuration management
β”‚   β”‚   β”‚   β”œβ”€β”€ logging_config.py  # Structured logging
β”‚   β”‚   β”‚   └── metrics.py         # Prometheus metrics
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   β”œβ”€β”€ elastic.py         # Elasticsearch client
β”‚   β”‚   β”‚   β”œβ”€β”€ vertex.py          # Vertex AI client
β”‚   β”‚   β”‚   β”œβ”€β”€ slack_client.py    # Slack integration
β”‚   β”‚   β”‚   └── workflows.py       # Follow-up suggestions
β”‚   β”‚   β”œβ”€β”€ schemas/
β”‚   β”‚   β”‚   └── chat.py            # Pydantic models
β”‚   β”‚   └── main.py                # FastAPI application
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”‚   β”œβ”€β”€ page.tsx           # Landing page
β”‚   β”‚   β”‚   β”œβ”€β”€ copilot/
β”‚   β”‚   β”‚   β”‚   └── page.tsx       # Chat interface
β”‚   β”‚   β”‚   └── globals.css        # Global styles
β”‚   β”‚   └── components/
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── package.json
β”œβ”€β”€ .env                           # Environment variables
β”œβ”€β”€ ARCHITECTURE.md                # Architecture documentation
└── README.md                      # This file

🚒 Production Deployment

Ready to deploy? See the Cloud Run Deployment section above for step-by-step instructions to get your backend and frontend live on Google Cloud. Complete infrastructure-as-code with secrets management, auto-scaling, and monitoring built in.

πŸ“ License

MIT License - see LICENSE file for details.


Built with ❀️ for IT teams who deserve better incident response tools.

Stop searching. Start fixing.

Releases

No releases published

Packages

 
 
 

Contributors