HA k3s cluster on NixOS, bootstrapped with Ansible and reconciled with ArgoCD.
- Currently all nodes are master+worker
- Longhorn config relies on having at least 2 replicas (>=2 nodes)
- Logs Drilldown plugin is downloaded straight from GH Releases, skipping any of the grafana cloud stuff
- Nixos-init uses the whole disk, formatting and installing on it
- Fix loki-canary drop rules
- TF For CF
- Better support for custom dashboards
- Better way of declaring plugin GH links for Grafana
- Home Assistant + IoT network bridge
- NixOS - declarative OS configuration
- k3s - lightweight Kubernetes
- ArgoCD - GitOps reconciliation for cluster apps
- Cloudflare Tunnels - zero-trust SSH and ingress access
- Longhorn - distributed block storage
- kube-prometheus-stack - Prometheus, Grafana, Alertmanager, node-exporter
- Loki + Promtail - log aggregation
cd ansible/inventory
cp hosts.yml.example hosts.yml
cp group_vars/all.yml.example group_vars/all.ymlEdit hosts.yml with node IPs and Cloudflare SSH tunnel tokens.
Edit group_vars/all.yml with k3s_token, cloudflare_ingress_tunnel_token, and any optional ArgoCD repo overrides.
For each control-plane node in hosts.yml:
ansible_hostis only for operator access and SSH. It can be a Cloudflare/public hostname.k3s_addressis the node's control-plane/peer address used by k3s. Use a LAN IP or internal DNS name, not a public Cloudflare hostname.k3s_join_addressis optional. When set, non-seed nodes join that LAN/internal k3s endpoint instead of the seed node'sk3s_address.
Per node:
- Boot NixOS minimal ISO
- Set password:
passwd nixos - Run:
ansible-playbook playbooks/nixos-init.yml -i inventory/hosts.yml --limit <node> - Remove USB and reboot
- Change password from default "changeme" set by the config
First node in k8s_control_plane is the cluster seed.
By default, other control-plane nodes join that seed via its k3s_address.
cd ansible
ansible-playbook playbooks/install-argocd.yml -i inventory/hosts.ymlThis installs ArgoCD, applies the upstream AppProject and root Application, and bootstraps the rest of the stack from Git. After this point, update Kubernetes apps by changing manifests or values in Git and letting ArgoCD sync them.
If you want ArgoCD to track a private repo instead of upstream defaults, set argocd_repo_url, argocd_target_revision, and optional repo credentials in group_vars/all.yml before running the playbook.
- Edit values under
k8s/helm/<app>/values.yamlor manifests underk8s/. - Commit and push those changes to the repo ArgoCD is tracking.
- Let ArgoCD reconcile the cluster; no Ansible run is needed for app updates.
Upstream ships fully usable ArgoCD applications pointing at this repo by default. A private repo can layer on top by patching repoURL and targetRevision to follow itself instead.
For declarative Longhorn backups:
# 1. Configure r2_* variables in ansible/inventory/group_vars/all.yml
# 2. Create the backup secret in your tracked repo or bootstrap it separately
# 3. Edit k8s/helm/longhorn/values.yaml to configure defaultBackupStore
# 4. Commit and push; ArgoCD will apply the Longhorn changeKeep backup secrets out of Git unless your private repo already has a sealed/external secret flow.
Bridges homelab to home network (192.168.0.0/24) via OpenWRT's 5GHz WiFi radio. Enables Home Assistant to reach IoT devices on the home network.
# 1. Add openwrt host to hosts.yml (see hosts.yml.example)
# 2. Configure home_wifi_ssid and home_wifi_password in group_vars/all.yml
# 3. Run bridge playbook
ansible-playbook playbooks/openwrt-home-lan.yml -i inventory/hosts.ymlTraffic is NAT'd — no changes needed on the home network. To revert:
# SSH to OpenWRT, then:
uci revert wireless; uci revert network; uci revert firewall; /etc/init.d/network restartUpdate node config:
ansible-playbook playbooks/nixos-update.yml -i inventory/hosts.ymlArgoCD-managed apps update from Git. Ansible is only for node lifecycle, bootstrap, and non-GitOps machine configuration.
For control-plane recovery, make sure k3s_address and any k3s_join_address values stay on the LAN or internal DNS. Do not point k3s peer/bootstrap traffic at public Cloudflare hostnames from ansible_host.
Reset k3s on a node (rejoin cluster):
ansible-playbook playbooks/nixos-update.yml -i inventory/hosts.yml -e reset_k3s=true --limit <node>Set these per control-plane node in ansible/inventory/hosts.yml:
cloudflare_ssh_ca_pubkeycloudflare_ssh_allowed_principals(must include your Cloudflare cert principal)
Generate per-node local SSH config blocks (localhost only):
cd ansible
ansible-playbook playbooks/configure-local-cloudflare-ssh.yml -i inventory/hosts.ymlThis writes explicit entries for each k8s_control_plane host in ~/.ssh/config and keeps cert generation per host/app.
Then connect directly using inventory hostnames:
ssh nixos@<control-plane-ansible_host>To see your principal from a generated cert:
ssh-keygen -Lf ~/.cloudflared/<host>-cf_key-cert.pubPassword auth remains enabled by default for rollback (ssh_password_auth_enabled: true) and can be disabled later by setting it to false and applying nixos-update.yml.