A production-grade, GitOps-driven platform for dynamically scaling Kusama/Polkadot validators on Hetzner Cloud. Built with Terraform, K3s, ArgoCD, and Helm.
┌─────────────────────────────────────────────────────────────┐
│ Git Repository │
│ validators/ │
│ ├── validator-001.yaml ← Add file = new validator │
│ ├── validator-002.yaml │
│ └── validator-003.yaml ← Delete file = remove validator │
├─────────────────────────────────────────────────────────────┤
│ ArgoCD ApplicationSet │
│ (Git File Generator - Auto-deploys) │
├─────────────────────────────────────────────────────────────┤
│ K3s Cluster (Hetzner Cloud) │
│ fsn1 | nbg1 | hel1 (multi-geo HA) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Validator Pods (StatefulSets) │ │
│ │ ├─ Polkadot Node (warp sync) │ │
│ │ ├─ HAProxy Sidecar (P2P rate limiting) │ │
│ │ └─ Persistent Volume (Hetzner CSI) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- GitOps Workflow: Add/remove validators by pushing YAML files to Git
- Multi-Geo HA: Automatic distribution across 3 datacenters (fsn1, nbg1, hel1)
- Auto Key Generation: Session keys generated via PostSync ArgoCD hooks
- Fast Sync: Warp sync gets validators ready in ~10 minutes
- Anti-Affinity: No two validators share the same physical node (prevents slashing)
- Rate Limiting: HAProxy sidecar protects P2P from DDoS attacks
- Auto-Scaling: Cluster Autoscaler provisions new nodes when needed (optional)
- Monitoring: Prometheus + Grafana with pre-configured validator dashboards
- Reproducible: Entire infrastructure defined as code (Terraform + Helm)
cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars:
# - Add hcloud_token
# - Set allowed_ips (Recommended)
# - Set initial_workers_per_location (Optional)
terraform init
terraform applyInstead of storing secrets in Git, inject them directly into the cluster:
# Usage: ./scripts/bootstrap-secrets.sh <hcloud-token> <grafana-password>
./scripts/bootstrap-secrets.sh "YOUR_HETZNER_TOKEN" "strong-password-123"export KUBECONFIG=$(pwd)/terraform/kubeconfig
kubectl get nodes# Get ArgoCD admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# Port forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# Update the Git repo URL in applicationset.yaml
# Then apply:
kubectl apply -f argocd/applicationset.yaml
# Enable Autoscaling (Optional but Recommended)
kubectl apply -f argocd/hetzner-autoscaler.yamlSingle validator:
./scripts/generate-validator.sh validator-001
# Edit validators/validator-001.yaml with your accounts
git add validators/validator-001.yaml
git commit -m "Add validator-001"
git pushBatch (e.g., 20 validators):
./scripts/batch-generate-validators.sh 20
# Edit accounts via CSV:
# validator-001,STASH_ADDR,CONTROLLER_ADDR
./scripts/update-accounts.sh accounts.csv
git add validators/
git commit -m "Add 20 validators"
git pushAfter ArgoCD deploys the validator, check the keygen job logs:
kubectl logs -n validators job/validator-001-keygenThen submit session.setKeys(keys, 0x) from your controller account on polkadot.js.
├── terraform/ # K3s cluster on Hetzner
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars.example
│ └── templates/ # Cloud-init bootstrapping
├── charts/ # Helm charts
│ └── kusama-validator/ # Validator StatefulSet + Service
├── argocd/ # GitOps Manifests
│ ├── applicationset.yaml # Dynamic Validator Generator
│ ├── monitoring.yaml # Prometheus + Grafana Stack
│ ├── alerts.yaml # AlertManager Rules
│ ├── dashboard-configmap.yaml # Custom Grafana Dashboard
│ └── hetzner-autoscaler.yaml # Cluster Autoscaler
├── validators/ # Validator Configurations (The "State")
│ └── example.yaml
└── scripts/ # Operations Scripts
├── bootstrap-secrets.sh # Inject secrets to cluster (TOKEN, PASSWORD)
├── generate-validator.sh # Create single validator config
├── batch-generate-validators.sh # Bulk create validators
├── update-accounts.sh # Mass-update keys from CSV
├── enable-snapshot.sh # Enable snapshot restore for validator
└── rotate-keys.sh # Helper for key rotation
| Action | Command |
|---|---|
| Add 1 validator | ./scripts/generate-validator.sh validator-XXX |
| Add N validators | ./scripts/batch-generate-validators.sh N |
| Remove validator | git rm validators/validator-XXX.yaml && git push |
| Enable snapshot | ./scripts/enable-snapshot.sh validator-XXX kusama |
| Rotate keys | ./scripts/rotate-keys.sh validator-XXX |
Validators use warp sync by default, which syncs in ~10-15 minutes instead of days:
# In validator YAML
chain: kusama
# Warp sync is enabled by defaultHow it works:
- Downloads GRANDPA finality proofs (not all blocks)
- Fetches latest state directly
- Validator ready in 10-15 minutes
Sync times by mode:
warp: ~10-15 minutes (default)fast: ~1-2 hoursfull: 2-7 days (not recommended)
For production deployments or when you need validators online immediately, use database snapshots:
Enable for a single validator:
./scripts/enable-snapshot.sh validator-001 kusama
git add validators/validator-001.yaml
git commit -m "Enable snapshot for validator-001"
git pushOr manually edit the validator YAML:
# validators/validator-001.yaml
name: validator-001
chain: kusama
storageSize: 500Gi
# Enable snapshot restore
snapshotEnabled: true
snapshotUrl: "https://ksm-rocksdb.polkashots.io/snapshot"
snapshotCompression: lz4Snapshot providers:
- Polkashots (Global CDN, recommended):
- Kusama:
https://ksm-rocksdb.polkashots.io/snapshot - Polkadot:
https://dot-rocksdb.polkashots.io/snapshot - Westend:
https://wnd-rocksdb.polkashots.io/snapshot
- Kusama:
- Stakeworld (EU mirrors):
- Kusama:
https://snapshots.stakeworld.io/kusama/kusama-latest.tar.lz4
- Kusama:
How it works:
- InitContainer downloads snapshot before validator starts
- Extracts database to persistent volume (~10-30 minutes)
- Validator starts with pre-synced database
- Skips download on subsequent restarts (checks if DB exists)
Compression formats:
lz4: Fastest decompression (recommended)zstd: Good balance of speed and sizegz: Slowest but most compatible
Benefits:
- Validator ready in ~15-45 minutes total (download + warp sync)
- Reduces initial sync load on network
- Predictable startup times for production
- Database verified before validator starts
Keys are automatically generated when a validator is deployed:
┌─────────────────────────────────────────────────────────────┐
│ 1. ArgoCD deploys validator StatefulSet │
│ 2. Validator syncs (warp sync ~10 min) │
│ 3. PostSync Job waits for RPC ready │
│ 4. Job calls author_rotateKeys() │
│ 5. Keys printed to logs │
│ 6. YOU submit session.setKeys() on-chain │
└─────────────────────────────────────────────────────────────┘
View generated keys:
kubectl logs -n validators job/validator-001-keygenSubmit keys on-chain:
- Go to polkadot.js
- Connect to Kusama/Westend
- Developer → Extrinsics
- Select your controller account
- Submit:
session.setKeys(keys, 0x)
Update the image tag in charts/kusama-validator/values.yaml:
image:
repository: parity/polkadot
tag: v1.7.0 # New versionThen push to Git:
git add charts/kusama-validator/values.yaml
git commit -m "Upgrade polkadot to v1.7.0"
git push
# ArgoCD will rolling-update all validators# View releases
curl -s https://api.github.com/repos/paritytech/polkadot-sdk/releases | jq '.[].tag_name' | head -10ArgoCD performs rolling updates by default:
- New validator pod starts with new version
- Waits for it to be healthy
- Terminates old pod
- Repeats for each validator
⚠️ Important: For critical runtime upgrades, update validators before the on-chain upgrade deadline!
kubectl apply -f argocd/monitoring.yaml
kubectl apply -f argocd/alerts.yamlThis deploys:
- Prometheus - Metrics collection
- Grafana - Dashboards (pre-loaded with Polkadot dashboard)
- Alertmanager - Alert routing
# Port forward
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000
# Default: admin / admin| Metric | Description |
|---|---|
substrate_block_height |
Current block height |
substrate_sub_libp2p_peers_count |
Connected peers |
substrate_sub_libp2p_is_major_syncing |
Sync status |
substrate_proposer_block_constructed_count |
Blocks produced |
| Alert | Severity | Condition |
|---|---|---|
ValidatorDown |
Critical | No metrics for 5 min |
ValidatorNotSynced |
Warning | Syncing > 15 min |
LowPeerCount |
Warning | < 10 peers |
DiskSpaceLow |
Critical | < 10% disk free |
Edit argocd/monitoring.yaml to add Slack/PagerDuty:
alertmanager:
config:
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/...'
channel: '#validator-alerts'| Feature | Details | Action Required |
|---|---|---|
| Firewall | SSH/API restricted to whitelisted IPs | Set allowed_ips in terraform.tfvars |
| Secrets | Grafana/ArgoCD credentials encrypted | Use Sealed Secrets or K8s Secrets |
| Keys | Validator keys stored in persistent PVC | Automatic (managed by StatefulSet) |
| RPC | Unsafe RPC blocked from internet | Internal access only (ClusterIP) |
In terraform.tfvars:
allowed_ips = ["YOUR_OFFICE_IP/32", "YOUR_HOME_IP/32"]Define initial capacity in terraform.tfvars:
# Start with 2 workers per location (Total: 6 workers + 1 CP)
initial_workers_per_location = 2The cluster automatically provisions new nodes when you add more validators than current capacity allows.
- Enable Autoscaler:
kubectl apply -f argocd/hetzner-autoscaler.yaml
- Add Validators:
./scripts/batch-generate-validators.sh 10 git push
- Watch it scale:
- Pods go
Pending - Autoscaler detects pending pods
- Provisions new Hetzner servers
- Pods start automatically
- Pods go
- Terraform >= 1.0
- Hetzner Cloud API token
- kubectl
- Git