Guide-first reference implementation for a production-shaped SOC stack on GKE, using Cloudflare Logpush as the example HTTP producer and pairing it with Kafka-backed ingest, Elasticsearch hot/warm storage, Grafana alerting, and investigation workflows.
This repo is useful in two ways:
- as a guide for engineers designing a real ingest and investigation platform on Kubernetes
- as a reproducible deployment blueprint for the same platform shape
- accepts batched HTTP log pushes through a stateless ingest API
- durably buffers ingestion through Kafka before indexing
- writes to Elasticsearch through versioned ILM, index-template, and alias contracts
- exposes Grafana dashboards, alerts, contact points, and notification policies
- runs investigation APIs and workers that turn alerts into deterministic triage jobs
- keeps runtime objects under version control so deployments behave consistently
- Cloudflare example, reusable pattern: the producer example is Cloudflare Logpush, but the same design works for any HTTP log producer
- Durable ACK boundary: the ingest API can acknowledge after Kafka durability instead of waiting on Elasticsearch indexing
- Separation of concerns: API and worker roles scale independently and fail differently
- Operational realism: hot/warm storage, Kafka lag, write bottlenecks, alert routing, and investigation playbooks are treated as first-class concerns
- Versioned runtime behavior: Kafka topics, Elasticsearch objects, and Grafana assets are part of the repo, not tribal knowledge
flowchart LR
CF["Cloudflare Logpush (example)\nor any HTTP log producer"] --> API["Edge ingest API"]
API --> KAFKA["Kafka topics\nedge-logpush\nedge-logpush-dlq"]
KAFKA --> WORKER["Edge ingest workers"]
WORKER --> ES["Elasticsearch\nwrite alias: edge-logs"]
ES --> GRAF["Grafana dashboards and alerts"]
GRAF --> INVAPI["Investigation API"]
INVAPI --> INVTOPIC["Kafka topic\ninvestigations"]
INVTOPIC --> INVWORKER["Investigation worker"]
INVWORKER --> ES
- Read
docs/architecture.mdfor the platform model and design tradeoffs. - Read
docs/workflows.mdfor the ingest and investigation flows. - Read
docs/lessons.mdfor the operational lessons embedded in the stack. - Use
docs/getting-started.mdto deploy it. - Use
docs/getting-started.mdto deploy the stack end to end.
infra/gcp: Terraform for the GKE and network foundationk8s: namespace-scoped Kubernetes manifests and platform valuesruntime: Kafka, Elasticsearch, and Grafana runtime objectsservices/edge-ingest: source for the ingest API and worker imageservices/investigation-ops: source for the investigation API and worker imagedocs: guide material and deployment docsscripts: bootstrap, render, and smoke-test tooling
The primary supported path is the production-shaped GKE deployment documented in docs/getting-started.md.
High-level flow:
- Copy
.env.exampleto.envand fill the required values. - Provision infra with Terraform.
- Install the platform layer.
- Render secrets and apply apps.
- Bootstrap runtime objects.
- Run smoke validation.
If you are using this as a reference rather than reproducing the same shape:
- keep the split ingest pattern
- keep Kafka topic contracts and DLQ separation
- keep Elasticsearch runtime objects versioned
- adapt node sizes, retention windows, and public exposure to your environment
docs/architecture.mddocs/workflows.mddocs/lessons.mddocs/getting-started.mddocs/scope.md
- HTTP ingest endpoint:
POST /GET /healthz
- Kafka topics:
edge-logpushedge-logpush-dlqinvestigations
- Elasticsearch aliases and indices:
edge-logsinvestigation-results-v1
- Grafana assets:
- dashboards
- alert rules
- contact points
- notification policies
- study a production-shaped ingest and investigation platform on GKE
- deploy the stack as a working reference implementation
- adapt a Cloudflare Logpush-style ingest path to your own producers
- reuse Kafka, Elasticsearch, and Grafana runtime contracts in your own environment
- reuse the playbook-driven investigation flow with your own alerts and identifiers
- The active ingest path is the split deployment under
k8s/namespaces/observability/:edge-ingest-apiedge-ingest-workeredge-ingestLoadBalancer service targeting the API