Kusama Validator Platform

A production-grade, GitOps-driven platform for dynamically scaling Kusama/Polkadot validators on Hetzner Cloud. Built with Terraform, K3s, ArgoCD, and Helm.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Git Repository                            │
│  validators/                                                 │
│  ├── validator-001.yaml  ← Add file = new validator         │
│  ├── validator-002.yaml                                      │
│  └── validator-003.yaml  ← Delete file = remove validator   │
├─────────────────────────────────────────────────────────────┤
│              ArgoCD ApplicationSet                          │
│         (Git File Generator - Auto-deploys)                  │
├─────────────────────────────────────────────────────────────┤
│              K3s Cluster (Hetzner Cloud)                     │
│         fsn1 | nbg1 | hel1 (multi-geo HA)                   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Validator Pods (StatefulSets)                       │   │
│  │  ├─ Polkadot Node (warp sync)                        │   │
│  │  ├─ HAProxy Sidecar (P2P rate limiting)              │   │
│  │  └─ Persistent Volume (Hetzner CSI)                  │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Features

GitOps Workflow: Add/remove validators by pushing YAML files to Git
Multi-Geo HA: Automatic distribution across 3 datacenters (fsn1, nbg1, hel1)
Auto Key Generation: Session keys generated via PostSync ArgoCD hooks
Fast Sync: Warp sync gets validators ready in ~10 minutes
Anti-Affinity: No two validators share the same physical node (prevents slashing)
Rate Limiting: HAProxy sidecar protects P2P from DDoS attacks
Auto-Scaling: Cluster Autoscaler provisions new nodes when needed (optional)
Monitoring: Prometheus + Grafana with pre-configured validator dashboards
Reproducible: Entire infrastructure defined as code (Terraform + Helm)

Quick Start

1. Provision Infrastructure

cd terraform
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars:
# - Add hcloud_token
# - Set allowed_ips (Recommended)
# - Set initial_workers_per_location (Optional)

terraform init
terraform apply

2. Bootstrap Secrets (Securely)

Instead of storing secrets in Git, inject them directly into the cluster:

# Usage: ./scripts/bootstrap-secrets.sh <hcloud-token> <grafana-password>
./scripts/bootstrap-secrets.sh "YOUR_HETZNER_TOKEN" "strong-password-123"

3. Access the Cluster

export KUBECONFIG=$(pwd)/terraform/kubeconfig
kubectl get nodes

4. Configure ArgoCD

# Get ArgoCD admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

# Port forward to access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# Update the Git repo URL in applicationset.yaml
# Then apply:
kubectl apply -f argocd/applicationset.yaml

# Enable Autoscaling (Optional but Recommended)
kubectl apply -f argocd/hetzner-autoscaler.yaml

5. Add Validators

Single validator:

./scripts/generate-validator.sh validator-001
# Edit validators/validator-001.yaml with your accounts
git add validators/validator-001.yaml
git commit -m "Add validator-001"
git push

Batch (e.g., 20 validators):

./scripts/batch-generate-validators.sh 20
# Edit accounts via CSV:
# validator-001,STASH_ADDR,CONTROLLER_ADDR
./scripts/update-accounts.sh accounts.csv
git add validators/
git commit -m "Add 20 validators"
git push

5. Get Session Keys

After ArgoCD deploys the validator, check the keygen job logs:

kubectl logs -n validators job/validator-001-keygen

Then submit session.setKeys(keys, 0x) from your controller account on polkadot.js.

Project Structure

├── terraform/                # K3s cluster on Hetzner
│   ├── main.tf
│   ├── variables.tf
│   ├── terraform.tfvars.example
│   └── templates/            # Cloud-init bootstrapping
├── charts/                   # Helm charts
│   └── kusama-validator/     # Validator StatefulSet + Service
├── argocd/                   # GitOps Manifests
│   ├── applicationset.yaml   # Dynamic Validator Generator
│   ├── monitoring.yaml       # Prometheus + Grafana Stack
│   ├── alerts.yaml           # AlertManager Rules
│   ├── dashboard-configmap.yaml # Custom Grafana Dashboard
│   └── hetzner-autoscaler.yaml # Cluster Autoscaler
├── validators/               # Validator Configurations (The "State")
│   └── example.yaml
└── scripts/                  # Operations Scripts
    ├── bootstrap-secrets.sh   # Inject secrets to cluster (TOKEN, PASSWORD)
    ├── generate-validator.sh  # Create single validator config
    ├── batch-generate-validators.sh # Bulk create validators
    ├── update-accounts.sh     # Mass-update keys from CSV
    ├── enable-snapshot.sh     # Enable snapshot restore for validator
    └── rotate-keys.sh         # Helper for key rotation

Scaling

Action	Command
Add 1 validator	`./scripts/generate-validator.sh validator-XXX`
Add N validators	`./scripts/batch-generate-validators.sh N`
Remove validator	`git rm validators/validator-XXX.yaml && git push`
Enable snapshot	`./scripts/enable-snapshot.sh validator-XXX kusama`
Rotate keys	`./scripts/rotate-keys.sh validator-XXX`

Fast Sync Options

Warp Sync (Default - Recommended)

Validators use warp sync by default, which syncs in ~10-15 minutes instead of days:

# In validator YAML
chain: kusama
# Warp sync is enabled by default

How it works:

Downloads GRANDPA finality proofs (not all blocks)
Fetches latest state directly
Validator ready in 10-15 minutes

Sync times by mode:

warp: ~10-15 minutes (default)
fast: ~1-2 hours
full: 2-7 days (not recommended)

Snapshot Restore (Optional - Even Faster!)

For production deployments or when you need validators online immediately, use database snapshots:

Enable for a single validator:

./scripts/enable-snapshot.sh validator-001 kusama
git add validators/validator-001.yaml
git commit -m "Enable snapshot for validator-001"
git push

Or manually edit the validator YAML:

# validators/validator-001.yaml
name: validator-001
chain: kusama
storageSize: 500Gi

# Enable snapshot restore
snapshotEnabled: true
snapshotUrl: "https://ksm-rocksdb.polkashots.io/snapshot"
snapshotCompression: lz4

Snapshot providers:

Polkashots (Global CDN, recommended):
- Kusama: https://ksm-rocksdb.polkashots.io/snapshot
- Polkadot: https://dot-rocksdb.polkashots.io/snapshot
- Westend: https://wnd-rocksdb.polkashots.io/snapshot
Stakeworld (EU mirrors):
- Kusama: https://snapshots.stakeworld.io/kusama/kusama-latest.tar.lz4

How it works:

InitContainer downloads snapshot before validator starts
Extracts database to persistent volume (~10-30 minutes)
Validator starts with pre-synced database
Skips download on subsequent restarts (checks if DB exists)

Compression formats:

lz4: Fastest decompression (recommended)
zstd: Good balance of speed and size
gz: Slowest but most compatible

Benefits:

Validator ready in ~15-45 minutes total (download + warp sync)
Reduces initial sync load on network
Predictable startup times for production
Database verified before validator starts

Session Key Generation

Keys are automatically generated when a validator is deployed:

┌─────────────────────────────────────────────────────────────┐
│  1. ArgoCD deploys validator StatefulSet                    │
│  2. Validator syncs (warp sync ~10 min)                     │
│  3. PostSync Job waits for RPC ready                        │
│  4. Job calls author_rotateKeys()                           │
│  5. Keys printed to logs                                    │
│  6. YOU submit session.setKeys() on-chain                   │
└─────────────────────────────────────────────────────────────┘

View generated keys:

kubectl logs -n validators job/validator-001-keygen

Submit keys on-chain:

Go to polkadot.js
Connect to Kusama/Westend
Developer → Extrinsics
Select your controller account
Submit: session.setKeys(keys, 0x)

Version Upgrades

Upgrade All Validators

Update the image tag in charts/kusama-validator/values.yaml:

image:
  repository: parity/polkadot
  tag: v1.7.0  # New version

Then push to Git:

git add charts/kusama-validator/values.yaml
git commit -m "Upgrade polkadot to v1.7.0"
git push
# ArgoCD will rolling-update all validators

Check Available Versions

# View releases
curl -s https://api.github.com/repos/paritytech/polkadot-sdk/releases | jq '.[].tag_name' | head -10

Rolling Update Strategy

ArgoCD performs rolling updates by default:

New validator pod starts with new version
Waits for it to be healthy
Terminates old pod
Repeats for each validator

⚠️ Important: For critical runtime upgrades, update validators before the on-chain upgrade deadline!

Observability

Deploy Monitoring Stack

kubectl apply -f argocd/monitoring.yaml
kubectl apply -f argocd/alerts.yaml

This deploys:

Prometheus - Metrics collection
Grafana - Dashboards (pre-loaded with Polkadot dashboard)
Alertmanager - Alert routing

Access Grafana

# Port forward
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Open http://localhost:3000
# Default: admin / admin

Key Metrics

Metric	Description
`substrate_block_height`	Current block height
`substrate_sub_libp2p_peers_count`	Connected peers
`substrate_sub_libp2p_is_major_syncing`	Sync status
`substrate_proposer_block_constructed_count`	Blocks produced

Alerts Configured

Alert	Severity	Condition
`ValidatorDown`	Critical	No metrics for 5 min
`ValidatorNotSynced`	Warning	Syncing > 15 min
`LowPeerCount`	Warning	< 10 peers
`DiskSpaceLow`	Critical	< 10% disk free

Configure Notifications

Edit argocd/monitoring.yaml to add Slack/PagerDuty:

alertmanager:
  config:
    receivers:
      - name: 'slack'
        slack_configs:
          - api_url: 'https://hooks.slack.com/...'
            channel: '#validator-alerts'

Security

Feature	Details	Action Required
Firewall	SSH/API restricted to whitelisted IPs	Set `allowed_ips` in `terraform.tfvars`
Secrets	Grafana/ArgoCD credentials encrypted	Use Sealed Secrets or K8s Secrets
Keys	Validator keys stored in persistent PVC	Automatic (managed by StatefulSet)
RPC	Unsafe RPC blocked from internet	Internal access only (ClusterIP)

Restrict Access (Recommended)

In terraform.tfvars:

allowed_ips = ["YOUR_OFFICE_IP/32", "YOUR_HOME_IP/32"]

Scaling Infrastructure

Initial Scale (Day 1)

Define initial capacity in terraform.tfvars:

# Start with 2 workers per location (Total: 6 workers + 1 CP)
initial_workers_per_location = 2

Auto-Scaling (Day 2+)

The cluster automatically provisions new nodes when you add more validators than current capacity allows.

Enable Autoscaler:

kubectl apply -f argocd/hetzner-autoscaler.yaml

Add Validators:

./scripts/batch-generate-validators.sh 10
git push

Watch it scale:
- Pods go Pending
- Autoscaler detects pending pods
- Provisions new Hetzner servers
- Pods start automatically

Requirements

Terraform >= 1.0
Hetzner Cloud API token
kubectl
Git

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.omc/state		.omc/state
ansible		ansible
argocd		argocd
charts/kusama-validator		charts/kusama-validator
scripts		scripts
terraform		terraform
validators		validators
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
observation-guide.md		observation-guide.md
rollout-guide.md		rollout-guide.md
study-gemini3pro.md		study-gemini3pro.md
study-guide.md		study-guide.md
study.md		study.md

Folders and files

Latest commit

History

Repository files navigation

Kusama Validator Platform

Architecture

Features

Quick Start

1. Provision Infrastructure

2. Bootstrap Secrets (Securely)

3. Access the Cluster

4. Configure ArgoCD

5. Add Validators

5. Get Session Keys

Project Structure

Scaling

Fast Sync Options

Warp Sync (Default - Recommended)

Snapshot Restore (Optional - Even Faster!)

Session Key Generation

Version Upgrades

Upgrade All Validators

Check Available Versions

Rolling Update Strategy

Observability

Deploy Monitoring Stack

Access Grafana

Key Metrics

Alerts Configured

Configure Notifications

Security

Restrict Access (Recommended)

Scaling Infrastructure

Initial Scale (Day 1)

Auto-Scaling (Day 2+)

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages