Skip to content

didiberman/sovereign-mesh-k8s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔐 Sovereign Mesh

A multi-tenant AI inference platform for high-security environments.
Isolated departmental enclaves · Zero-trust networking · GitOps-driven · Production-ready RAG


CI Kubernetes ArgoCD Terraform vLLM Qdrant Python


Architecture

Internet
    │
    ▼  DNS: Cloudflare → Worker public IP
┌─────────────────────────────────────────────────┐
│  Hetzner Cloud (nbg1)                           │
│                                                  │
│  ┌─────────────────┐   ┌─────────────────────┐  │
│  │  Control Plane  │   │  Worker (cx53)       │  │
│  │  cx23 (4GB RAM) │   │  8 vCPU / 32GB RAM   │  │
│  │  K3s server     │   │  + 30GB Volume (PVCs)│  │
│  │  ArgoCD         │   │                     │  │
│  └────────┬────────┘   └──────────┬──────────┘  │
│           └──── 10.0.1.0/24 ──────┘             │
└─────────────────────────────────────────────────┘

Traffic path for a tenant request:
  User → hr.sovereign.didibe.dev
    → nginx-gateway (hostPort 443, TLS termination)
    → HTTPRoute → hr-app:8080
    → embed(query) → TEI
    → search(vector, filter:{tenant_id:"hr"}) → Qdrant
    → generate(context + query) → vLLM
    → answer

Namespaces

Namespace Purpose Pod Security
platform vLLM, Qdrant, TEI Baseline
hr HR tenant RAG app Restricted
legal Legal tenant RAG app Restricted
nginx-gateway Ingress controller
cert-manager TLS certificate automation
monitoring Prometheus, Grafana, Loki, Promtail
argocd GitOps controller

Platform Services

Service Image Purpose
vLLM vllm/vllm-openai:v0.7.2 Inference — Qwen2.5-7B-Instruct, CPU mode, OpenAI-compatible API
TEI ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 Embeddings — BAAI/bge-small-en-v1.5
Qdrant qdrant/qdrant:v1.13.0 Vector database — multi-tenant via metadata filtering

Tenant Isolation Model

  • Network: default-deny-all on every tenant namespace. Explicit egress allowlists for platform services (ports 80, 6333, 6334) and DNS (port 53).
  • Data: Qdrant queries always include filter: {tenant_id: "<tenant>"} — cross-tenant data access is impossible at the query level.
  • Identity: Dedicated ServiceAccount per workload, automountServiceAccountToken: false.
  • Resources: ResourceQuota + LimitRange per namespace, HPA (min 2 replicas) + PodDisruptionBudget for tenant apps.

Tech Stack

Layer Technology
Infrastructure Hetzner Cloud (Terraform)
Orchestration K3s + Flannel CNI
GitOps ArgoCD (App-of-Apps)
Inference vLLM (Qwen2.5-7B-Instruct, CPU)
Embeddings HuggingFace TEI (BAAI/bge-small-en-v1.5)
Vector DB Qdrant
Application FastAPI + LangChain (RAG)
Ingress Nginx Gateway Fabric (Gateway API v1)
TLS cert-manager + Let's Encrypt (DNS-01 via Cloudflare)
Monitoring Prometheus + Grafana + AlertManager
Logging Loki + Promtail
Security scanning Trivy + Checkov + kube-linter

Prerequisites

Before deploying, ensure you have:

  • Terraform >= 1.6
  • A Hetzner Cloud account and API token
  • A Cloudflare account managing didibe.dev with an API token
  • The Cloudflare token stored in GCP Secret Manager under project didiberman, secret name cloudflare_key_mlops_project
  • An SSH key pair (~/.ssh/id_ed25519 by default)
  • gcloud CLI authenticated (gcloud auth application-default login)

Deployment

1. Configure Terraform variables

cd terraform
cp terraform.tfvars.example terraform.tfvars   # if it exists, otherwise create:
# terraform/terraform.tfvars
hcloud_token     = "your-hetzner-api-token"
ssh_public_key   = "ssh-ed25519 AAAA..."

2. Provision infrastructure

terraform init
terraform plan    # review: 2 servers, 1 volume, firewall rules, DNS records
terraform apply

This provisions:

  • Control plane (cx23) + Worker (cx53) in Hetzner nbg1
  • 30GB volume attached to worker, mounted at the K3s storage path (all PVCs land here)
  • Cloudflare DNS records for all platform and tenant subdomains
  • Installs K3s, joins the worker node
  • Installs ArgoCD via Helm and bootstraps the App-of-Apps

3. GitOps takes over

Once Terraform completes, ArgoCD syncs everything from k8s/argocd/ automatically in wave order:

Wave -5  security       namespaces, NetworkPolicies, RBAC, quotas
Wave -1  cert-manager   CRDs + controller; nginx-gateway-fabric
Wave  0  gateway        Gateway + GatewayClass; monitoring stack
Wave  1  issuers        ClusterIssuer + wildcard TLS cert; Qdrant; vLLM + TEI
Wave  2  monitoring     AlertManager rules + Grafana dashboards
Wave  5  tenants        HR app, Legal app, ArgoCD HTTPRoute

4. Access the platform

Service URL
HR enclave https://hr.sovereign.didibe.dev/chat
Legal enclave https://legal.sovereign.didibe.dev/chat
ArgoCD https://argocd.didibe.dev
Grafana http://<worker-ip>:30082

Retrieve the ArgoCD initial admin password:

kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

Development

Running tests

cd apps/tenant-app
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
pytest tests/ -v

All 19 tests run fully offline — external dependencies (vLLM, Qdrant, TEI) are mocked at the sys.modules level.

Project structure

sovereign-mesh-k8s/
├── apps/
│   └── tenant-app/          # FastAPI RAG application (shared by HR + Legal)
│       ├── main.py
│       ├── Dockerfile
│       ├── requirements.txt
│       ├── requirements-dev.txt
│       ├── pytest.ini
│       ├── promptfoo.yaml   # LLM accuracy benchmarks
│       └── tests/
├── k8s/
│   ├── argocd/              # ArgoCD Application manifests (App-of-Apps)
│   ├── argocd-routes/       # HTTPRoute for the ArgoCD UI itself
│   ├── cert-manager/        # ClusterIssuer + wildcard Certificate
│   ├── platform/
│   │   ├── gateway/         # Gateway + GatewayClass
│   │   ├── inference/       # vLLM + TEI Deployments
│   │   ├── monitoring/      # PrometheusRules + Grafana dashboards
│   │   ├── nginx-gateway/   # Nginx Gateway Fabric CRDs
│   │   ├── security/        # NetworkPolicies, RBAC, quotas, PDBs
│   │   └── vector-db/       # Qdrant + backup CronJob
│   └── tenants/
│       ├── hr/              # HR Deployment, Service, HTTPRoute, HPA
│       └── legal/           # Legal Deployment, Service, HTTPRoute, HPA
└── terraform/               # Hetzner Cloud + Cloudflare + K3s provisioning

CI/CD Pipeline

The pipeline (.github/workflows/tenant-ci.yml) runs on every push to main or PR touching apps/tenant-app/, k8s/tenants/, or the workflow file itself.

Job Trigger What it does
test PR + push Installs deps, runs pytest (19 tests)
lint-and-scan PR + push kube-linter on manifests, Checkov on all K8s YAML (hard fail)
model-evaluation PR (if ENABLE_MODEL_EVAL=true) Runs promptfoo against live vLLM endpoint
build-and-push Push to main only Docker build → push to GHCR → Trivy scan → SBOM → Cosign sign → GitOps commit

Images are signed with Cosign keyless signing via GitHub OIDC — no long-lived credentials.

Observability

Metrics (Prometheus + Grafana)

  • Pre-built dashboard: Sovereign Mesh Platform — service availability, request rates, error rates, container memory/CPU, pod restarts, replica status
  • AlertManager enabled with rules for: service down, crash-looping pods, memory pressure (>90%), disk pressure (>85%), replica mismatch, high error rate (>5%)

Logs (Loki + Promtail)

  • Promtail DaemonSet collects all container logs and ships to Loki
  • Logs are labelled with namespace, pod, container for filtering in Grafana

Backup

  • Qdrant snapshot CronJob runs nightly at 02:00 UTC, snapshots all collections to the persistent volume

Roadmap

  • Data Ingestion Worker — automated pipeline for loading tenant documents into Qdrant
  • External Secrets Operator — replace plain K8s secrets with GCP Secret Manager sync
  • Terraform remote state — move tfstate to Hetzner Object Storage or Terraform Cloud
  • ArgoCD OIDC — replace initial admin password with SSO
  • Argo Rollouts — canary deployments for tenant app upgrades
  • Off-cluster Qdrant backup — upload nightly snapshots to S3-compatible object storage

About

Multi-tenant sovereign LLM inference platform on Kubernetes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors