A multi-tenant AI inference platform for high-security environments.
Isolated departmental enclaves · Zero-trust networking · GitOps-driven · Production-ready RAG
Internet
│
▼ DNS: Cloudflare → Worker public IP
┌─────────────────────────────────────────────────┐
│ Hetzner Cloud (nbg1) │
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Control Plane │ │ Worker (cx53) │ │
│ │ cx23 (4GB RAM) │ │ 8 vCPU / 32GB RAM │ │
│ │ K3s server │ │ + 30GB Volume (PVCs)│ │
│ │ ArgoCD │ │ │ │
│ └────────┬────────┘ └──────────┬──────────┘ │
│ └──── 10.0.1.0/24 ──────┘ │
└─────────────────────────────────────────────────┘
Traffic path for a tenant request:
User → hr.sovereign.didibe.dev
→ nginx-gateway (hostPort 443, TLS termination)
→ HTTPRoute → hr-app:8080
→ embed(query) → TEI
→ search(vector, filter:{tenant_id:"hr"}) → Qdrant
→ generate(context + query) → vLLM
→ answer
| Namespace | Purpose | Pod Security |
|---|---|---|
platform |
vLLM, Qdrant, TEI | Baseline |
hr |
HR tenant RAG app | Restricted |
legal |
Legal tenant RAG app | Restricted |
nginx-gateway |
Ingress controller | — |
cert-manager |
TLS certificate automation | — |
monitoring |
Prometheus, Grafana, Loki, Promtail | — |
argocd |
GitOps controller | — |
| Service | Image | Purpose |
|---|---|---|
| vLLM | vllm/vllm-openai:v0.7.2 |
Inference — Qwen2.5-7B-Instruct, CPU mode, OpenAI-compatible API |
| TEI | ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 |
Embeddings — BAAI/bge-small-en-v1.5 |
| Qdrant | qdrant/qdrant:v1.13.0 |
Vector database — multi-tenant via metadata filtering |
- Network:
default-deny-allon every tenant namespace. Explicit egress allowlists for platform services (ports 80, 6333, 6334) and DNS (port 53). - Data: Qdrant queries always include
filter: {tenant_id: "<tenant>"}— cross-tenant data access is impossible at the query level. - Identity: Dedicated
ServiceAccountper workload,automountServiceAccountToken: false. - Resources:
ResourceQuota+LimitRangeper namespace,HPA(min 2 replicas) +PodDisruptionBudgetfor tenant apps.
| Layer | Technology |
|---|---|
| Infrastructure | Hetzner Cloud (Terraform) |
| Orchestration | K3s + Flannel CNI |
| GitOps | ArgoCD (App-of-Apps) |
| Inference | vLLM (Qwen2.5-7B-Instruct, CPU) |
| Embeddings | HuggingFace TEI (BAAI/bge-small-en-v1.5) |
| Vector DB | Qdrant |
| Application | FastAPI + LangChain (RAG) |
| Ingress | Nginx Gateway Fabric (Gateway API v1) |
| TLS | cert-manager + Let's Encrypt (DNS-01 via Cloudflare) |
| Monitoring | Prometheus + Grafana + AlertManager |
| Logging | Loki + Promtail |
| Security scanning | Trivy + Checkov + kube-linter |
Before deploying, ensure you have:
- Terraform >= 1.6
- A Hetzner Cloud account and API token
- A Cloudflare account managing
didibe.devwith an API token - The Cloudflare token stored in GCP Secret Manager under project
didiberman, secret namecloudflare_key_mlops_project - An SSH key pair (
~/.ssh/id_ed25519by default) gcloudCLI authenticated (gcloud auth application-default login)
cd terraform
cp terraform.tfvars.example terraform.tfvars # if it exists, otherwise create:# terraform/terraform.tfvars
hcloud_token = "your-hetzner-api-token"
ssh_public_key = "ssh-ed25519 AAAA..."terraform init
terraform plan # review: 2 servers, 1 volume, firewall rules, DNS records
terraform applyThis provisions:
- Control plane (
cx23) + Worker (cx53) in Hetznernbg1 - 30GB volume attached to worker, mounted at the K3s storage path (all PVCs land here)
- Cloudflare DNS records for all platform and tenant subdomains
- Installs K3s, joins the worker node
- Installs ArgoCD via Helm and bootstraps the App-of-Apps
Once Terraform completes, ArgoCD syncs everything from k8s/argocd/ automatically in wave order:
Wave -5 security namespaces, NetworkPolicies, RBAC, quotas
Wave -1 cert-manager CRDs + controller; nginx-gateway-fabric
Wave 0 gateway Gateway + GatewayClass; monitoring stack
Wave 1 issuers ClusterIssuer + wildcard TLS cert; Qdrant; vLLM + TEI
Wave 2 monitoring AlertManager rules + Grafana dashboards
Wave 5 tenants HR app, Legal app, ArgoCD HTTPRoute
| Service | URL |
|---|---|
| HR enclave | https://hr.sovereign.didibe.dev/chat |
| Legal enclave | https://legal.sovereign.didibe.dev/chat |
| ArgoCD | https://argocd.didibe.dev |
| Grafana | http://<worker-ip>:30082 |
Retrieve the ArgoCD initial admin password:
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -dcd apps/tenant-app
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt -r requirements-dev.txt
pytest tests/ -vAll 19 tests run fully offline — external dependencies (vLLM, Qdrant, TEI) are mocked at the sys.modules level.
sovereign-mesh-k8s/
├── apps/
│ └── tenant-app/ # FastAPI RAG application (shared by HR + Legal)
│ ├── main.py
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── requirements-dev.txt
│ ├── pytest.ini
│ ├── promptfoo.yaml # LLM accuracy benchmarks
│ └── tests/
├── k8s/
│ ├── argocd/ # ArgoCD Application manifests (App-of-Apps)
│ ├── argocd-routes/ # HTTPRoute for the ArgoCD UI itself
│ ├── cert-manager/ # ClusterIssuer + wildcard Certificate
│ ├── platform/
│ │ ├── gateway/ # Gateway + GatewayClass
│ │ ├── inference/ # vLLM + TEI Deployments
│ │ ├── monitoring/ # PrometheusRules + Grafana dashboards
│ │ ├── nginx-gateway/ # Nginx Gateway Fabric CRDs
│ │ ├── security/ # NetworkPolicies, RBAC, quotas, PDBs
│ │ └── vector-db/ # Qdrant + backup CronJob
│ └── tenants/
│ ├── hr/ # HR Deployment, Service, HTTPRoute, HPA
│ └── legal/ # Legal Deployment, Service, HTTPRoute, HPA
└── terraform/ # Hetzner Cloud + Cloudflare + K3s provisioning
The pipeline (.github/workflows/tenant-ci.yml) runs on every push to main or PR touching apps/tenant-app/, k8s/tenants/, or the workflow file itself.
| Job | Trigger | What it does |
|---|---|---|
test |
PR + push | Installs deps, runs pytest (19 tests) |
lint-and-scan |
PR + push | kube-linter on manifests, Checkov on all K8s YAML (hard fail) |
model-evaluation |
PR (if ENABLE_MODEL_EVAL=true) |
Runs promptfoo against live vLLM endpoint |
build-and-push |
Push to main only |
Docker build → push to GHCR → Trivy scan → SBOM → Cosign sign → GitOps commit |
Images are signed with Cosign keyless signing via GitHub OIDC — no long-lived credentials.
- Pre-built dashboard: Sovereign Mesh Platform — service availability, request rates, error rates, container memory/CPU, pod restarts, replica status
- AlertManager enabled with rules for: service down, crash-looping pods, memory pressure (>90%), disk pressure (>85%), replica mismatch, high error rate (>5%)
- Promtail DaemonSet collects all container logs and ships to Loki
- Logs are labelled with
namespace,pod,containerfor filtering in Grafana
- Qdrant snapshot CronJob runs nightly at 02:00 UTC, snapshots all collections to the persistent volume
- Data Ingestion Worker — automated pipeline for loading tenant documents into Qdrant
- External Secrets Operator — replace plain K8s secrets with GCP Secret Manager sync
- Terraform remote state — move
tfstateto Hetzner Object Storage or Terraform Cloud - ArgoCD OIDC — replace initial admin password with SSO
- Argo Rollouts — canary deployments for tenant app upgrades
- Off-cluster Qdrant backup — upload nightly snapshots to S3-compatible object storage