Skip to content

Releases: cozystack/cozystack

v1.3.0

23 Apr 11:56
5c89a2c

Choose a tag to compare

Cozystack v1.3.0

Cozystack v1.3.0 brings storage-aware pod scheduling via a LINSTOR scheduler extender, a managed LINSTOR GUI web console with Keycloak SSO, a curated VM Default Images catalog for out-of-the-box virtual-machine provisioning, a new WorkloadsReady / Events observability surface with S3 bucket metering, and cross-namespace VMInstance backup restore with a full RestoreJob dashboard flow. The release also ships stricter tenant-name validation, VMInstance network-selector improvements, Keycloak theme injection and SMTP configuration, a host-runtime preflight check, and rolls up every fix from the v1.2.1 → v1.2.4 patch line.

Note: Items marked (backported to v1.2.x) were also shipped in v1.2.1, v1.2.2, v1.2.3, or v1.2.4 patch releases.

Feature Highlights

Storage-Aware Scheduling via the LINSTOR Extender

The cozystack-scheduler now calls a LINSTOR scheduler extender for storage-locality-aware pod placement. When a pod declares both a SchedulingClass and LINSTOR-backed PVCs, the scheduler consults LINSTOR to prefer nodes where volume replicas already exist — reducing cross-node replication traffic and improving I/O latency for storage-heavy workloads such as databases, object stores, and VMs.

The integration builds on the existing SchedulingClass tenant workload placement system introduced in v1.2.0 and requires no tenant-side configuration — workloads simply benefit once a SchedulingClass is assigned. Administrators can mix storage locality with the existing data-center / hardware-generation constraints defined on SchedulingClass CRs (@lllamnyp in #2330).

LINSTOR GUI: Managed Web Console for Storage Administration

A new opt-in linstor-gui system package deploys LINBIT's linstor-gui web UI alongside the LINSTOR controller with mTLS client authentication, non-root security context, and a ClusterIP-only service by default. When OIDC is configured on the platform, an optional Keycloak-protected Ingress (via oauth2-proxy) exposes the UI for browser access. Access is restricted to members of the cozystack-cluster-admin Keycloak group, consistent with host-cluster admin RBAC, and the gatekeeper blocks in-app LINSTOR authentication setup at the nginx proxy layer so the managed configuration cannot be subverted through the UI.

Operators who prefer CLI access keep the existing linstor command; the GUI is strictly additive and stays disabled by default (@myasnikovdaniil in #2382, #2390, #2415, #2419).

VM Default Images: Out-of-the-Box VM Provisioning

The new vm-default-images package provides a curated set of cluster-wide virtual-machine images (Ubuntu, Debian, CentOS Stream, and others) as pre-populated DataVolumes, so tenants can provision VMs against well-known base images without first having to upload them. The package is opt-in via the iaas bundle and defaults to replicated storage for high availability. Migration 38 renames legacy vm-image-* DataVolumes to the new vm-default-images-* naming scheme, and the vm-disk chart gains a new "disk" source type for cloning from existing vm-disks in the same namespace (@myasnikovdaniil in #2258).

Application Observability: WorkloadsReady, Events, and S3 Bucket Metering

Applications now expose a WorkloadsReady condition on their status by querying associated WorkloadMonitor resources, giving operators a single place to check whether all underlying workloads (Deployments, StatefulSets, DaemonSets, PVCs) are healthy. The dashboard gains a new Events tab showing namespace-scoped Kubernetes events per application, with fallback to .firstTimestamp when .eventTime is absent. A long-standing bug where WorkloadMonitor's Operational status was never persisted is fixed in the same change (@lexfrei in #2356).

The WorkloadMonitor reconciler is extended to track COSI BucketClaim objects as first-class Workloads, and the bucket controller now queries SeaweedFS logical and physical bucket-size metrics from VictoriaMetrics via a namespace-scoped monitoring endpoint, enabling S3 billing integration on par with Pods and PVCs (@kitsunoff in #2391). Workloads are also enriched with workloads.cozystack.io/resource-preset and source-object labels so downstream billing pipelines can correlate monitors with the tenant preset that produced them (@androndo in #2416).

Cross-Namespace VM Backup Restore and RestoreJob Dashboard

The backup system now supports restoring VMInstance backups into a different namespace (cross-namespace copy restores) with IP/MAC preservation and safe rename semantics. In-place backup and restore flows for VMDisk and VMInstance are improved: HelmReleases and DataVolumes are properly handled, and Velero failure messages are propagated to the Application status. The backup status structure has been refactored to store underlying resources as a generic opaque JSON object, enabling arbitrary application-specific metadata without status-schema churn (@androndo in #2251, #2319, #2329).

The dashboard now ships a complete RestoreJob experience: list view, details page, create form, and sidebar entry, with a "Same as backup" fallback rendering when spec.targetApplicationRef is omitted. Non-CRD-backed sidebar factories (kube-*, plan, backupjob, backup, restorejob) are marked static so they pick up consistent managed-by labels across reconciles (@myasnikovdaniil in #2437).

Major Features and Improvements

  • [api] Reject tenant names with dashes at Create time: Enforces alphanumeric-only naming for Tenants at the API level, preventing names with hyphens that would silently fail during Helm reconciliation. A corresponding regex tightening and regression test suite hardens the validation (@lexfrei in #2380).

  • [platform] Validate computed tenant namespace length: Rejects Tenant creation when the computed ancestor-chain namespace would exceed the 63-character Kubernetes namespace limit, preventing opaque HelmRelease reconcile errors downstream (@lexfrei in #2376).

  • [vm-instance] Rename subnets to networks and add dropdown selector: Renames the misleading subnets field to networks in VMInstance for clarity, adds a dropdown selector for available networks in the dashboard form, and includes migration 36 to copy existing subnets values. The old field remains supported for backward compatibility (@sircthulhu in #2263).

  • [keycloak] Enable injecting themes: Cozystack administrators can now inject custom Keycloak themes via initContainers for UI white-labeling and customization (@lllamnyp in #2142).

  • [keycloak-configure] Add email verification and SMTP configuration: Adds configurable Keycloak settings for user self-registration, email verification, and SMTP server configuration, enabling automated user onboarding flows (@BROngineer in #2318).

  • [postgres] Pin system PostgreSQL to 17.7-standard-trixie: Pins the PostgreSQL image for system databases (Grafana, Alerta, Harbor, Keycloak, SeaweedFS) to 17.7-standard-trixie across chart templates and values.yaml, and ships migration 37 to patch existing CNPG Cluster imageName fields to the same variant (handling unset, any PG 17 tag, and bare-version tags). This prevents CNPG from defaulting to PostgreSQL 18 and locks system databases to the trixie variant consistent with the monitoring stack requirements (related backports shipped in v1.2.1 via #2309 and v1.2.2 via #2364) (@myasnikovdaniil in #2369).

  • [platform] Prevent installed packages deletion: Adds the helm.sh/resource-policy: keep annotation to platform packages so disabling a package no longer triggers automatic Helm deletion, restoring the documented behavior where operators must explicitly delete a package (backported to v1.2.1) (@kvaps in #2273).

  • [mariadb] Always enable replication for consistent service naming: MariaDB now always enables replication, creating -primary/-secondary services even for single-replica instances. This fixes dashboard visibility and backup functionality for single-replica setups (@sircthulhu in #2279).

  • [hack] Add host runtime preflight check: New check-host-runtime.sh script and make preflight target that warns operators when a standalone containerd or docker runtime is running alongside the embedded k3s runtime, helping diagnose container-runtime conflicts early in an installation (@lexfrei in #2371).

  • [hack] Add check-readiness.sh diagnostic script: A new diagnostic script for tracking platform reconciliation by checking readiness of Packages, ArtifactGenerators, ExternalArtifacts, and HelmReleases, with support for watch mode and continuous monitoring (@myasnikovdaniil in #2294).

  • [platform] Add resourcePreset labels to WorkloadMonitor labels: WorkloadMonitor labels with the workloads.cozystack.io/ prefix are now propagated onto created Workloads; created Workloads always include the reserved workloads.cozystack.io/monitor label, and Helm app charts add workloads.cozystack.io/resource-preset metadata to WorkloadMonitor manifests, enabling downstream billing pipelines to correlate monitors with the tenant preset that produced them (@androndo in #2416)....

Read more

v1.2.3

21 Apr 11:46
d59a691

Choose a tag to compare

v1.2.3 (2026-04-20)

A patch release with bug fixes and documentation updates.

Features and Improvements

No notable features in this patch release.

Fixes

  • fix(kubernetes): set explicit ephemeral-storage on virt-launcher pods: Prevents VM crashes caused by ephemeral-storage eviction by setting explicit domain.resources ephemeral-storage on the VirtualMachine spec. Uses sanitized limits and requests so virt-launcher pods do not inherit too-small namespace defaults. (@kvaps in #2317, backport #2423).

Documentation

Other repositories

Contributors

Thanks to everyone who contributed to this patch release:

Full Changelog: v1.2.2...v1.2.3

Download cozystack

v1.3.0-rc.1

16 Apr 10:31
d9657bc

Choose a tag to compare

v1.3.0-rc.1 Pre-release
Pre-release

Cozystack v1.3.0-rc.1

Cozystack v1.3.0-rc.1 is the first release candidate for v1.3.0, bringing storage-aware scheduling via the LINSTOR scheduler extender, a managed LINSTOR GUI web UI with Keycloak SSO, a VM Default Images catalog for out-of-the-box virtual machine provisioning, WorkloadsReady conditions with a real-time Events tab in the dashboard, and cross-namespace VM backup restore capabilities. Additional highlights include stricter tenant name validation, VM network selector improvements, Keycloak theme injection and SMTP configuration, and a comprehensive host runtime preflight check.

Note: Fixes marked with (backported to v1.2.x) were also included in v1.2.1 or v1.2.2 patch releases.

Feature Highlights

Storage-Aware Scheduling via LINSTOR Extender

The cozystack-scheduler now calls the LINSTOR scheduler extender for storage-locality-aware pod placement. When a pod declares both a SchedulingClass and LINSTOR-backed PVCs, the scheduler consults LINSTOR to prefer nodes where volume replicas already exist — reducing cross-node replication traffic and improving I/O latency for storage-heavy workloads (@lllamnyp in #2330).

LINSTOR GUI: Managed Web UI for Storage Administration

A new opt-in linstor-gui system package deploys LINBIT's linstor-gui web UI alongside the LINSTOR controller with mTLS client authentication, non-root security context, and ClusterIP-only service. An optional Keycloak-protected Ingress (via oauth2-proxy) can be enabled for SSO-authenticated browser access when OIDC is configured on the platform (@myasnikovdaniil in #2382, #2390).

VM Default Images: Out-of-the-Box VM Provisioning

The new vm-default-images package provides a curated set of cluster-wide virtual machine images (Ubuntu, Debian, CentOS Stream, and others) as pre-populated DataVolumes. The package is opt-in via the iaas bundle and defaults to replicated storage for high availability. A companion migration (migration 38) renames legacy vm-image-* DataVolumes to the new vm-default-images-* naming scheme. The vm-disk chart also gains a new "disk" source type for cloning from existing vm-disks in the same namespace (@myasnikovdaniil in #2258).

WorkloadsReady Condition and Events Tab

Applications now expose a WorkloadsReady condition on their status by querying associated WorkloadMonitor resources, giving operators a single place to check whether all underlying workloads (Deployments, StatefulSets, DaemonSets) are healthy. The dashboard gains a new Events tab showing namespace-scoped Kubernetes events for each application, with fallback to .firstTimestamp when .eventTime is absent. A bug where WorkloadMonitor's Operational status was never persisted is also fixed (@lexfrei in #2356).

Cross-Namespace VM Backup Restore

The backup system now supports restoring VMInstance backups into a different namespace (cross-namespace copy restores), with IP/MAC preservation and safe rename semantics. In-place backup/restores for VMDisk and VMInstance are improved: HelmReleases and DataVolumes are properly handled, and Velero failure messages are propagated to the Application status. The backup status structure has been refactored to store underlying resources as a generic opaque JSON object, enabling arbitrary application-specific metadata (@androndo in #2251, #2329, #2319).

Major Features and Improvements

  • [api] Reject tenant names with dashes at Create time: Enforces alphanumeric-only naming for Tenants at the API level, preventing names with hyphens that would silently fail during Helm reconciliation. A corresponding regex tightening and regression test suite hardens the validation (@lexfrei in #2380).

  • [platform] Validate computed tenant namespace length: Rejects Tenant creation when the computed ancestor-chain namespace would exceed the 63-character Kubernetes namespace limit, preventing opaque HelmRelease reconcile errors downstream (@lexfrei in #2376).

  • [vm-instance] Rename subnets to networks and add dropdown selector: Renames the misleading subnets field to networks in VMInstance for clarity, adds a dropdown selector for available networks in the dashboard form, and includes a migration to copy existing subnets values. The old field remains supported for backward compatibility (@sircthulhu in #2263).

  • [keycloak] Enable injecting themes: Cozystack administrators can now inject custom Keycloak themes via initContainers for UI white-labeling and customization (@lllamnyp in #2142).

  • [keycloak-configure] Add email verification and SMTP configuration: Adds configurable Keycloak settings for user self-registration, email verification, and SMTP server configuration, enabling automated user onboarding flows (@BROngineer in #2318).

  • [postgres] Hardcode PostgreSQL 17 for monitoring databases: Pins PostgreSQL 17.7 images for system databases (Grafana, Alerta, Harbor, Keycloak, SeaweedFS) and adds migration 37 to backfill spec.version=v17 for existing PostgreSQL resources, preventing CNPG from defaulting to PostgreSQL 18 (backported to v1.2.1) (@IvanHunters in #2304).

  • [hack] Add host runtime preflight check: New check-host-runtime.sh script and make preflight target that warns operators when a standalone containerd or docker runtime is running alongside the embedded k3s runtime, helping diagnose container runtime conflicts (@lexfrei in #2371).

  • [hack] Add check-readiness.sh diagnostic script: A new diagnostic script for tracking platform reconciliation by checking readiness of Packages, ArtifactGenerators, ExternalArtifacts, and HelmReleases, with support for watch mode and continuous monitoring (@myasnikovdaniil in #2294).

  • [mariadb] Always enable replication for consistent service naming: MariaDB now always enables replication, creating -primary/-secondary services even for single-replica instances. This fixes dashboard visibility and backup functionality for single-replica setups (@sircthulhu in #2279).

  • [platform] Prevent installed packages deletion: Adds helm.sh/resource-policy: keep annotation to packages, preventing automatic deletion when packages are disabled and restoring documented behavior (backported to v1.2.1) (@kvaps in #2273).

Bug Fixes

  • [cilium] Opt-out of cri-containerd.apparmor.d for nsenter init containers: Opts cilium-agent init containers out of the cri-containerd.apparmor.d AppArmor profile on non-Talos variants, fixing Init:CrashLoopBackOff on Ubuntu 22.04+ and Debian (backported to v1.2.2) (@lexfrei in #2370).

  • [virtual-machine] Exclude external VM services from Cilium BPF LB: Adds service-proxy-name: cozy-proxy label to VM LoadBalancer services, telling Cilium to skip BPF processing. Fixes inter-tenant connectivity via public LB IPs and WholeIP functionality on Cilium 1.19+ (backported to v1.2.2) (@mattia-eleuteri in #2357).

  • [monitoring] Fix infra dashboards missing in default variant: Includes cozy-monitoring namespace in the dashboard rendering condition, fixing infrastructure Grafana dashboards not rendering in the default platform variant (backported to v1.2.2) (@mattia-eleuteri in #2365).

  • [postgres] Fix system PostgreSQL images to 17.7-standard-trixie: Normalizes system PostgreSQL image tags to use 17.7-standard-trixie variant with migration logic for existing CNPG clusters (backported to v1.2.2) (@myasnikovdaniil in #2364).

  • [build] Filter git describe to match only v* tags: Adds --match 'v*' to git describe calls, preventing API subtags from being picked up instead of release tags and producing invalid Docker image tags (backported to v1.2.2) (@kvaps in #2386).

  • [platform] Fix resource allocation ratios not propagated to packages: Restores propagation of CPU, memory, and ephemeral-storage allocation ratios to managed applications and KubeVirt, which were silently ignored since the bundle restructure (backported to v1.2.1) (@sircthulhu in #2296).

  • [kubernetes] Set explicit ephemeral-storage on virt-launcher pods: Sets explicit domain.resources with ephemeral-storage on VirtualMachine spec to prevent virt-launcher pods from being evicted due to LimitRange defaults being too low for actual emptyDisk capacity (@kvaps in #2317).

  • [multus] Pin master CNI to 05-cilium.conflist: Prevents a boot-time race condition where multus could auto-detect kube-ovn's conflist instead of Cilium's (backported to v1.2.1) (@kvaps in #2315).

  • [multus] Build custom image with DEL cache fix: Fixes sandbox cleanup deadlock when CNI ADD never completes, preventing stale sandbox name reservations from permanently blocking pod creation (backported to v1.2.1) (@kvaps in #2313).

  • [linstor] Set verify-alg to crc32c: Prevents DRBD connection failures on kernels where crct10dif is unavailable (e.g., Talos v1.12.6 with kernel 6.18.18) (backported to v1.2.1) (@kvaps in #2303).

  • **[lin...

Read more

v1.2.2

14 Apr 07:53
93ad936

Choose a tag to compare

Features and Improvements

  • [linstor] Update piraeus-server to v1.33.2 with selected backports: Bumps LINSTOR server from v1.33.1 to v1.33.2 and adds backported patches for improved storage reliability: a stale bitmap adjust retry mechanism for automatic recovery after bitmap attach errors, LUKS2 header sizing and optimal I/O size detection improvements for more reliable disk formatting, and the maintainer implementation backport. All patches verified against upstream v1.33.2 with git apply --check and gradlew compileJava (@kvaps in #2331, #2377).

Fixes

  • [postgres] Fix system PostgreSQL images to 17.7-standard-trixie: Hardcodes PostgreSQL 17.7-standard-trixie images for system PostgreSQL instances. This ensures system databases use the correct image variant consistent with the monitoring stack requirements introduced in v1.2.1 (@myasnikovdaniil in #2364, #2369).

  • [cilium] Opt-out of cri-containerd.apparmor.d for nsenter init containers: On Ubuntu 22.04+, Debian, and other distributions that load the cri-containerd.apparmor.d AppArmor profile by default for containerd workloads, the kernel denied nsenter namespace entry in cilium-agent init containers (mount-cgroup, apply-sysctl-overwrites, clean-cilium-state), causing the agent to land in Init:CrashLoopBackOff and cascading platform failures. Per-container container.apparmor.security.beta.kubernetes.io annotations now opt the affected containers out of this profile, applied only on non-Talos cilium variants (cilium-generic, kubeovn-cilium-generic). The vendored daemonset template is also patched to strip the upstream semverCompare "<1.30.0" AppArmor block, preventing duplicate annotation keys. Talos variants are untouched as Talos does not load the AppArmor LSM (@lexfrei in #2370, #2378).

  • [virtual-machine] Exclude external VM services from Cilium BPF LB: Adds the service.kubernetes.io/service-proxy-name: "cozy-proxy" label to VM LoadBalancer services when external: true, telling Cilium to skip BPF processing entirely for these services. This fixes two issues: inter-tenant connectivity via public LB IPs (Cilium's DNAT caused cross-tenant pod-to-pod flow classification, triggering CiliumClusterwideNetworkPolicy blocks) and WholeIP broken on Cilium 1.19+ (wildcard service drop entries blocked traffic to LB IPs on undeclared ports before it reached netfilter/cozy-proxy). MetalLB L2 advertisement and kube-ovn routing remain unaffected (@mattia-eleuteri in #2357, #2361).

  • [monitoring] Fix infra dashboards missing in default variant: The default platform variant deploys the monitoring chart to the cozy-monitoring namespace, but the dashboard rendering condition introduced in #2197 only checked for tenant-root. Infrastructure dashboards were not rendered in the default variant. The cozy-monitoring namespace is now included in the rendering condition, consistent with the existing pattern in vmagent.yaml (@mattia-eleuteri in #2365, #2367).

  • [build] Filter git describe to match only v tags*: Adds --match 'v*' to all git describe calls in hack/common-envs.mk. The api/apps/v1alpha1/* subtags share the same commit as release tags, causing git describe --exact-match to pick api/apps/v1alpha1/vX.Y.Z instead of vX.Y.Z, producing invalid Docker image tags (@kvaps in #2386, #2389).

Development, Testing, and CI/CD

  • [ci] Replace cozystack-bot PAT with cozystack-ci GitHub App: Replaces the long-lived cozystack-bot personal access token with short-lived, scoped tokens from the cozystack-ci GitHub App across all release workflows (tags.yaml, auto-release.yaml, pull-requests-release.yaml). Improves security and auditability of CI operations (@tym83 in #2351).

  • [ci] Use cozystack org noreply email for bot commits: Updates CI workflows to use the cozystack organization noreply email for bot commits (@kvaps in #2392, #2393).

  • [ci] Replace GH_PAT with cozystack-ci GitHub App token in pull-requests workflow: Switches the pull-requests release workflow to use the cozystack-ci GitHub App token instead of the personal access token (@kvaps in #2383, #2384).

Documentation

  • [website] Add ApplicationDefinition naming convention reference: Added reference documentation on ApplicationDefinition naming conventions and how cozystack-api resolves kinds to their backing definitions (@lexfrei in cozystack/website#478).

  • [website] Document Talos / talosctl / Cozystack version pairing: Added documentation covering Talos, talosctl, and Cozystack version compatibility matrix for installation (@lexfrei in cozystack/website#484).

  • [website] Fix KubeOVN MASTER_NODES example path and key in troubleshooting: Corrected the MASTER_NODES example path and key in the KubeOVN troubleshooting guide (@lexfrei in cozystack/website#483).

  • [website] Prefix bundle package names with cozystack. in v1 examples: Updated documentation examples to use the correct cozystack. prefix for bundle package names in enabled/disabledPackages (@lexfrei in cozystack/website#482).

  • [website] Finish isolated-field removal and document opt-in policy labels: Removed the obsolete isolated field from tenant documentation and documented the new opt-in policy labels approach (@lexfrei in cozystack/website#481).

  • [website] Add --take-ownership flag and describe networking. fields*: Added documentation for the --take-ownership flag and described the networking.* fields in the installation guide (@lexfrei in cozystack/website#480).

  • [website] Add bonding (LACP) configuration how-to guide: Added a guide for configuring network bonding with LACP on Cozystack installations (@sircthulhu in cozystack/website#459).

  • [website] Improve registry mirrors for tenant Kubernetes in air-gapped guide: Improved documentation for configuring registry mirrors in tenant Kubernetes clusters for air-gapped environments (@sircthulhu in cozystack/website#461).

  • [website] Update backup/restore documentation for VMI/VMDisk: Updated backup documentation with information related to VM instance and VM disk restore improvements (@androndo in cozystack/website#466).

  • [website] Add updated OpenAPI spec: Updated the OpenAPI specification for managed applications reference (@myasnikovdaniil in cozystack/website#469).

  • [website] Add OSS Health pages and OpenSSF badge: Added OSS Health section with OpenSSF Scorecard and Best Practices badge to the website footer (@tym83 in cozystack/website#470).

  • [website] Add CozySummit Virtual 2026 program announcement: Published the CozySummit Virtual 2026 program announcement blog post (@tym83 in cozystack/website#472).

  • [website] Add missing release announcements for v0.1–v0.41: Backfilled missing release announcement blog posts for Cozystack versions v0.1 through v0.41 (@tym83 in cozystack/website#468).

  • [talm] Render templates online in apply to resolve lookups: Fixed talm apply command to render templates online, resolving template lookup failures when using modeline templates (@myasnikovdaniil in cozystack/talm#119).

  • [talm] Update default Talos image to v1.12.6: Updated the default Talos image version to v1.12.6 in talm (@kvaps in cozystack/talm@03e9b6e).


Full Changelog: v1.2.1...v1.2.2

Download cozystack

v1.1.6

13 Apr 17:04
3f3470a

Choose a tag to compare

Fixes

  • [build] Filter git describe to match only v tags*: Adds --match 'v*' to all git describe calls in hack/common-envs.mk. The api/apps/v1alpha1/* subtags share the same commit as release tags, causing git describe --exact-match to pick api/apps/v1alpha1/vX.Y.Z instead of vX.Y.Z, producing invalid Docker image tags (@kvaps in #2386, #2388).

Development, Testing, and CI/CD

  • [ci] Replace cozystack-bot PAT with cozystack-ci GitHub App: Replaces the long-lived cozystack-bot personal access token with short-lived, scoped tokens from the cozystack-ci GitHub App across all release workflows. Improves security and auditability of CI operations (@tym83 in #2351).

  • [ci] Replace GH_PAT with cozystack-ci GitHub App token in pull-requests workflow: Switches the pull-requests release workflow to use the cozystack-ci GitHub App token instead of the personal access token (@kvaps in #2383).

  • [ci] Use cozystack org noreply email for bot commits: Updates CI workflows to use the cozystack organization noreply email for bot commits (@kvaps in #2392).


Full Changelog: v1.1.5...v1.1.6

Download cozystack

v1.2.1

31 Mar 12:59
1b32142

Choose a tag to compare

Features and Improvements

  • [postgres] Hardcode PostgreSQL 17 for monitoring databases and add migration: CloudNativePG operator defaults to PostgreSQL 18.3 when no explicit image is specified, but monitoring queries in Grafana and Alerta rely on PostgreSQL 17 features such as pg_stat_checkpointer and the updated pg_stat_bgwriter. This mismatch could break monitoring after fresh installs or database recreation. PostgreSQL 17.7 images are now hardcoded for monitoring databases, and migration 37 is added to set version v17 for any existing PostgreSQL resources (@IvanHunters in #2304, #2309).

Fixes

  • [platform] Prevent installed packages deletion: Added the helm.sh/resource-policy: keep annotation to all platform packages. Previously, moving a package to disabledPackages or removing it from enabledPackages caused Helm to automatically delete the corresponding resource, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2297).

  • [linstor] Preserve TCP ports during toggle-disk operations: During toggle-disk operations, removeLayerData() freed TCP ports from the number pool and ensureStackDataExists() could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix adds copyDrbdTcpPortsIfExists() which saves existing TCP ports into the LayerPayload before removeLayerData() deletes them (@kvaps in #2292, #2299).

  • [platform] Fix resource allocation ratios not propagated to managed packages: A regression introduced in the bundle restructure caused cpuAllocationRatio, memoryAllocationRatio, and ephemeralStorageAllocationRatio set in platform/values.yaml to become no-ops — they were never written to the cozystack-values Secret that cozy-lib reads in child packages. This meant all managed applications silently used the hardcoded defaults (10, 1, 40) regardless of operator-configured values. The fix restores propagation by writing the ratios into the _cluster section of the cozystack-values Secret and passing cpuAllocationRatio to the KubeVirt Package component (@sircthulhu in #2296, #2301).

  • [linstor] Fix DRBD connectivity failures on kernels without crct10dif by setting verify-alg to crc32c: LINSTOR's auto-verify algorithm selection defaults to crct10dif, but this kernel crypto module is no longer available in newer kernels (e.g. Talos v1.12.6, kernel 6.18.18). When crct10dif is unavailable, DRBD peer connections fail with VERIFYAlgNotAvail: failed to allocate crct10dif for verify, causing all DRBD resources to enter Diskless state and lose quorum. DrbdOptions/Net/verify-alg is now set to crc32c at the controller level (@kvaps in #2303, #2312).

  • [multus] Fix stale sandbox reservations permanently blocking pod creation after CNI ADD failure: After a node disruption (e.g. DRBD or kube-ovn issues during upgrade), containerd accumulated stale sandbox name reservations. Cleanup failed because multus called delegate plugins for DEL without cached state and they rejected the incomplete config, causing DEL to fail instead of succeeding. Stale entries were never released, permanently blocking new pod creation on the affected node. A custom multus-cni image is now built with a patch that returns success from DEL when CNI ADD never completed (@kvaps in #2313, #2314).

  • [multus] Pin master CNI to 05-cilium.conflist to prevent race condition at boot: During node boot or Talos upgrade, multus auto-detects the master CNI conflist by scanning the CNI config directory. If kube-ovn writes 10-kube-ovn.conflist before Cilium writes 05-cilium.conflist, multus selects the wrong file and pods bypass the Cilium chain entirely, have no Cilium endpoint, and their traffic is blocked by cluster-wide network policies. multusMasterCNI is now pinned to 05-cilium.conflist (@kvaps in #2315, #2316).

Documentation

  • [website] Add custom Keycloak themes documentation: Added documentation for custom Keycloak theme injection to the White Labeling guide, covering the theme image contract (/themes/ directory structure), configuration via the cozystack.keycloak Package resource, imagePullSecrets for private registries, and theme activation in the Keycloak admin console (@lexfrei in cozystack/website#463).

  • [website] Add documentation for Go types usage: Added a guide for using the generated Go types for Cozystack managed applications as a Go module, including installation instructions, programmatic resource management examples, and deployment approaches (@myasnikovdaniil in cozystack/website#465).


Full Changelog: v1.2.0...v1.2.1

Download cozystack

v1.1.5

31 Mar 08:15
601b605

Choose a tag to compare

Fixes

  • [platform] Prevent installed packages deletion: Added the helm.sh/resource-policy: keep annotation to all platform packages. Previously, moving a package to disabledPackages or removing it from enabledPackages caused Helm to automatically delete it, contradicting the documented behavior that requires the platform administrator to manually delete packages when needed (@myasnikovdaniil in #2273, #2298).

  • [linstor] Fix TCP port mismatches after toggle-disk operations causing DRBD resources to enter StandAlone state: During toggle-disk operations, removeLayerData() freed TCP ports from the number pool and ensureStackDataExists() could then allocate different ports. If a satellite missed the resulting update (e.g. due to a controller restart), it retained the old ports while peers received the new ones, causing DRBD connections to fail with StandAlone state. The fix introduces copyDrbdTcpPortsIfExists(), which preserves existing TCP ports in the LayerPayload before removeLayerData() releases them (@kvaps in #2292, #2300).


Full Changelog: v1.1.4...v1.1.5

Download cozystack

v1.1.4

30 Mar 11:33
576e9ac

Choose a tag to compare

Features and Improvements

  • [boot-to-talos] Add support for ISO, RAW, and HTTP image sources: The boot-to-talos tool can now use ISO files, raw disk images, and HTTP URLs as Talos image sources in addition to container registry images. This allows bootstrapping nodes in air-gapped environments or from locally stored images without requiring a container registry (@lexfrei in cozystack/boot-to-talos#13).

  • [boot-to-talos] Use permanent MAC address for predictable network interface names: Interface name detection now reads the permanent MAC address directly from sysfs instead of relying on udev data, providing a stable hardware MAC that is unaffected by user modifications to the active MAC address. This makes network interface naming more reliable across reboots and hardware changes (@IvanHunters in cozystack/boot-to-talos#14).

Fixes

  • [dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not ApplicationDefinitions, so ensureNavigation() never created their baseFactoriesMapping entries. Without these entries the OpenUI frontend could not resolve the {cluster} context for backup pages, producing broken sidebar links with an empty cluster segment (e.g. /openapi-ui//tenant-root/...). The missing baseFactoriesMapping entries for all backup resource types are now added to the static Navigation resource (@sircthulhu in #2232, #2269).

  • [platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The cozy:tenant:admin:base ClusterRole was missing seven application resources from apps.cozystack.io (foundationdbs, harbors, mongodbs, openbaos, opensearches, qdrants, vpns). Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The missing resources have been added to the ClusterRole (@sircthulhu in #2268, #2272).

  • [dashboard] Fix StorageClass dropdown showing "Error" in application forms: The dashboard UI fetches StorageClass resources to populate dropdowns (e.g. in the Postgres form), but the cozystack-dashboard-readonly ClusterRole did not include storage.k8s.io/storageclasses. This caused authenticated users to see "Error" instead of the StorageClass name. get/list/watch permissions for storageclasses have been added to the dashboard readonly role (@sircthulhu in #2267, #2274).

  • [system] Fix 403 error on Service details page by granting tenants read access to EndpointSlices: The dashboard requested EndpointSlices from the discovery.k8s.io API group to display the "Pod serving" section on the Service details page, but cozy:tenant:base and cozy:tenant:view:base ClusterRoles lacked permissions for this resource. Tenant users received a 403 error when opening the Service details page. get/list/watch permissions for endpointslices have been added to both tenant ClusterRoles (@sircthulhu in #2257, #2285).

  • [dashboard] Fix "Pod serving" table displaying "Raw:" and "Invalid Date" on Service details page: The Service details page EndpointSlice table showed "Raw:" prefixes and "Invalid Date" values because the EnrichedTable referenced customizationId factory-kube-service-details-endpointslice which had no corresponding CustomColumnsOverride. Column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) have been added (@sircthulhu in #2266, #2283).

  • [piraeus-operator] Fix LINSTOR satellite alert labels, reduce scrape-flap false positives, and improve controller alerting: Three alerting issues in cozy-piraeus-operator have been addressed: (1) linstorSatelliteErrorRate used a non-existent name label in annotations, resulting in Satellite "" in alert notifications — corrected to {{ $labels.hostname }}; (2) linstorSatelliteErrorRate could produce false positives when the linstor-controller scrape flapped and historical linstor_error_reports_count counters reappeared inside the alert window — fixed by adding a minimum scrape-count guard; (3) The LinstorControllerOffline alert has been split into separate availability and metrics-availability alerts with configurable hold time to reduce noise during brief connectivity interruptions (@sasha-sup in #2265, #2286).

  • [linstor] Fix swapped VMPodScrape job labels causing incorrect controller offline alerts: The cozy-linstor VictoriaMetrics VMPodScrape templates had the job relabeling rules swapped: linstor-satellite metrics were labeled as job=linstor-controller and vice versa. This caused linstorControllerOffline alerts to fire for satellite endpoints (:9942) while reporting that the controller was unreachable. The job labels are now correctly assigned to their respective targets (@sasha-sup in #2264, #2289).

  • [boot-to-talos] Fix triple-fault on hosts with 5-level paging (LA57) enabled: On hosts with CONFIG_X86_5LEVEL=y in the kernel, kexec into Talos caused a triple-fault because the Talos kernel does not support 5-level page tables. boot-to-talos now detects LA57 before kexec and automatically patches GRUB with no5lvl, runs update-grub, and reboots. After reboot with 5-level paging disabled, boot-to-talos proceeds normally (@IvanHunters in cozystack/boot-to-talos#15).

  • [boot-to-talos] Fix EFI boot entry creation when using loop device images: Talos installer skips EFI variable creation when running on loop devices. boot-to-talos now creates a proper UEFI boot entry with an HD() device path pointing to the real target disk's ESP by reading the GPT partition table from the target disk after image copy, instead of relying on the Talos installer (@kvaps in cozystack/boot-to-talos#16).

  • [talm] Fix silent empty output when no template files are specified: Running talm template without --file or --template flags previously produced minimal or empty output without any error. Validation has been added to engine.Render to return a clear error message when no template files are specified, making misconfigured invocations immediately apparent (@kvaps in cozystack/talm#112).

Documentation

  • [website] Add documentation for VMInstance and VMDisk backups: Added a new virtualization-focused Backup and Recovery guide covering one-off and scheduled backups for VMInstance and VMDisk resources, restore procedures, status verification commands, and troubleshooting notes including Velero-related issues (@myasnikovdaniil in cozystack/website#456).

  • [website] Update developer guide with operator-driven architecture and OCIRepository migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and the platform install/update sequence. Added an "OCIRepositories and Migration Flow" section with migration hook examples and sequencing rules for pre-upgrade hooks (@myasnikovdaniil in cozystack/website#458).


Full Changelog: v1.1.3...v1.1.4

Download cozystack

v1.0.7

30 Mar 11:33
63af323

Choose a tag to compare

Fixes

  • [platform] Fix tenant admins unable to create FoundationDB, Harbor, MongoDB, OpenBAO, OpenSearch, Qdrant, and VPN applications: The cozy:tenant:admin:base ClusterRole was missing RBAC entries for foundationdbs, harbors, mongodbs, openbaos, opensearches, qdrants, and vpns resources from apps.cozystack.io. Without these permissions, tenant admins could not create these applications — the "Add" button was inactive in the dashboard. The fix adds all seven missing resource verbs (@sircthulhu in #2268, #2271).

  • [system] Fix 403 error on Service details page for tenant users: The cozy:tenant:base and cozy:tenant:view:base ClusterRoles were missing read permissions for discovery.k8s.io/endpointslices. The dashboard requests EndpointSlices to display the "Pod serving" section on the Service details page, and without this permission tenant users received a 403 error. The fix adds get, list, and watch verbs for endpointslices to both tenant roles (@sircthulhu in #2257, #2284).

  • [dashboard] Fix "Pod serving" table showing "Raw:" prefixes and "Invalid Date" on Service details page: The EndpointSlice table on the service details page displayed raw data and broken timestamps because the EnrichedTable component referenced the factory-kube-service-details-endpointslice customization ID which had no corresponding CustomColumnsOverride. The fix adds column definitions for Pod (.targetRef.name), Addresses (.addresses), Ready (.conditions.ready), and Node (.nodeName) (@sircthulhu in #2266, #2282).

  • [dashboard] Fix broken backup menu links missing cluster context: Backup resources (plans, backupjobs, backups) are not ApplicationDefinitions, so ensureNavigation() never created their baseFactoriesMapping entries. Without these mappings, the OpenUI frontend could not resolve the {cluster} context for backup pages, producing broken sidebar links with an empty cluster segment (e.g. /openapi-ui//tenant-root/... instead of /openapi-ui/default/tenant-root/...). The fix adds the three missing static entries to the Navigation resource (@sircthulhu in #2232, #2270).

  • [linstor] Fix swapped VMPodScrape job labels causing incorrect alerts: The job labels in the cozy-linstor VictoriaMetrics VMPodScrape templates were swapped: linstor-satellite metrics were relabeled as job=linstor-controller and vice versa. This caused linstorControllerOffline alerts to fire against satellite endpoints (:9942) while reporting the controller as unreachable. The fix ensures linstor-satellite metrics keep job=linstor-satellite and linstor-controller metrics keep job=linstor-controller, restoring consistent alerting and dashboard semantics (@sasha-sup in #2264, #2288).

  • [piraeus-operator] Fix LINSTOR satellite alert annotations and reduce false-positive alerts: Two issues in the LINSTOR alerts shipped by cozy-piraeus-operator were fixed. First, linstorSatelliteErrorRate used a non-existent name label in annotations, resulting in Satellite "" in alert notifications — corrected to use {{ $labels.hostname }}. Second, linstorSatelliteErrorRate produced false positives when the linstor-controller scrape flapped and historical linstor_error_reports_count counters reappeared inside the alert window — fixed by requiring stable up{job="linstor-controller"} for the full 15-minute window. Additionally, the controller availability alert was split to add a dedicated warning for metrics scrape failures with a 10-minute hold time to reduce transient noise (@sasha-sup in #2265, #2287).

Documentation

  • [website] Add Backup and Recovery guide for VMInstance and VMDisk: Replaced the generic Kubernetes Backup and Recovery guide with a virtualization-focused Backup and Recovery doc covering VMInstance and VMDisk one-off and scheduled backups, restores, status checks, and troubleshooting (including Velero-related notes) (@myasnikovdaniil in cozystack/website#456).

  • [website] Update developer guide with operator-driven architecture and OCIRepository/migration flow: Rewrote the development guide to describe the operator-driven in-cluster architecture, bootstrap flow, operator responsibilities, and platform install/update sequence. Added documentation for OCIRepositories and the migration flow with migration hook examples and sequencing rules for pre-upgrade/install migrations. Also updated the concepts guide with the two-repository update model, dependency ordering rules, namespace creation behavior, and cluster-wide values injection (@myasnikovdaniil in cozystack/website#458).


Full Changelog: v1.0.6...v1.0.7

Download cozystack

v1.2.0

27 Mar 15:02
b45df97

Choose a tag to compare

Cozystack v1.2.0

⚠️ WARNING: Do not use this release. This version includes CloudNativePG operator, which updates the default PostgreSQL image to version 18. CNPG is unable to perform the migration from the previous major version automatically, which will cause PostgreSQL clusters to fail to start after the upgrade. Please use v1.2.1 instead.

Cozystack v1.2.0 delivers significant platform enhancements: a fully managed OpenSearch service joining the application catalog, VPC peering for secure inter-tenant networking, tenant workload placement control via the new SchedulingClass system, a highly-available VictoriaLogs cluster replacing the single-node setup, and Linstor volume relocation for optimized clone and snapshot restore placement. Additional highlights include external-dns as a standalone extra package, multi-node RWX volume fixes, and a wave of dashboard and monitoring improvements.

Feature Highlights

OpenSearch: Managed Search and Analytics Service

Cozystack now ships OpenSearch as a fully managed PaaS application — supporting OpenSearch v1, v2, and v3 in a multi-role topology with dedicated master, data, ingest, coordinating, and ML nodes. TLS is enabled by default, HTTP Basic auth is provided out of the box, and custom user definitions allow per-application credentials. The optional OpenSearch Dashboards UI can be enabled alongside the engine. External access, topology spread policies, and a comprehensive JSON schema are all included.

A companion opensearch-operator system package wraps the upstream Opster OpenSearch Operator v2.8.0 and adds a sysctl DaemonSet to configure the required vm.max_map_count kernel parameter on every node automatically. An ApplicationDefinition package ties everything into the Cozystack platform dashboard with schema validation and resource management.

SchedulingClass: Tenant Workload Placement

Cozystack now supports a SchedulingClass CRD that allows platform operators to define cluster-wide scheduling constraints — pinning tenant workloads to specific data centers, hardware generations, or node groups without requiring tenants to manage scheduler configuration themselves. Tenants declare a schedulingClass in their Tenant spec; the platform injects the appropriate schedulerName into all workloads in that namespace.

The lineage-controller-webhook has been extended to verify the referenced SchedulingClass CR before injection, and child tenants inherit their parent's scheduling constraints (children cannot override). A SchedulingClass dropdown in the Tenant creation form in the dashboard makes the feature fully self-service. The underlying cozystack-scheduler — a custom kube-scheduler extension with SchedulingClass-aware affinity plugins — is now installed and enabled by default as part of the platform.

VPC Peering for Multi-Tenant Environments

The vpc application gains bilateral VPC peering using Kube-OVN's native vpcPeerings mechanism, allowing tenants to securely interconnect their private networks without routing traffic through public endpoints. Peering link-local IPs (169.254.0.0/16) are allocated deterministically from a hash of the sorted VPC pair names, ensuring stable addresses across reconciliations. Static route support (staticRoutes) enables fine-grained inter-VPC routing policies. A cozy-lib helper (hexToInt) performs the deterministic IP allocation, and a JSON Schema validation enforces the ^tenant- namespace pattern for peered VPCs.

VictoriaLogs: Clustered Mode for High Availability

The platform's log storage has been upgraded from the deprecated single-node VLogs CR to a VLCluster deployment with separate vlinsert, vlselect, and vlstorage components, each running with 2 replicas by default — consistent with the existing VMCluster setup. This brings horizontal scalability and resilience to the logging tier. VPA autoscaling is enabled for all VLCluster components, and the victoria-metrics-operator has been upgraded from v0.55.0 to v0.68.1 to add VLCluster CRD support.

Linstor CSI: Volume Relocation After Clone and Restore

The Linstor CSI driver now carries upstream patches enabling automatic replica relocation after PVC clone and snapshot restore operations. Two new parameters control the behavior: linstor.csi.linbit.com/relocateAfterClone on StorageClasses moves replicas to optimal nodes after a clone, and snap.linstor.csi.linbit.com/relocate-after-restore on VolumeSnapshotClasses does the same after a restore. VolumeSnapshotClasses for Velero and Kasten use cases are pre-configured. This enables full PVC-level backup and restore workflows with automatic data rebalancing, a key prerequisite for production Velero/Kasten integrations.

Major Features and Improvements

  • [apps] Add managed OpenSearch service: Deployed as a PaaS application supporting OpenSearch v1/v2/v3 with multi-role node topology, TLS, HTTP Basic auth, custom users, optional OpenSearch Dashboards UI, external access, and topology spread policies; backed by the opster OpenSearch Operator v2.8.0 and a sysctl DaemonSet for vm.max_map_count (@matthieu-robin in #1953).

  • [vpc] Add VPC peering support for multi-tenant environments: Bilateral VPC peering via Kube-OVN's vpcPeerings, deterministic link-local IP allocation from sorted VPC pair hash, static routes support, ConfigMap peer discovery enrichment, and JSON Schema validation enforcing ^tenant- namespace pattern (@mattia-eleuteri in #2152).

  • [monitoring] Migrate VictoriaLogs from VLogs to VLCluster: Replaced deprecated single-node VLogs CR with clustered VLCluster (vlinsert/vlselect/vlstorage, 2 replicas each), added VPA for all components, upgraded victoria-metrics-operator to v0.68.1 (@sircthulhu in #2153).

  • [scheduler] Integrate SchedulingClass support for tenant workloads: Added schedulingClass Tenant parameter with inheritance enforcement, scheduling.cozystack.io/class namespace label, lineage-webhook extension to verify and inject schedulerName, SchedulingClass dropdown in Tenant dashboard form (@sircthulhu in #2223).

  • [cozystack-scheduler] Add custom scheduler as an optional system package: Vendored cozystack-scheduler from github.com/cozystack/cozystack-scheduler — a kube-scheduler extension with SchedulingClass-aware affinity plugins, including Helm chart with RBAC, ConfigMap, Deployment, and CRD (@lllamnyp in #2205).

  • [platform] Enable cozystack-scheduler by default: The cozystack-scheduler and SchedulingClass CRD are now installed as default system packages; the backup tool has been moved to optional packages (@lllamnyp in #2253).

  • [extra] Add external-dns as a standalone extra package: Packaged external-dns as an installable extra (tenant-level) component for automatic DNS record management from Kubernetes Service and Ingress resources (@mattia-eleuteri in #1988).

  • [linstor] Add linstor-csi patches for clone/snapshot relocation: New patch enabling relocateAfterClone StorageClass parameter and relocate-after-restore VolumeSnapshotClass parameter; pre-configured VolumeSnapshotClasses for Velero and relocation workflows; CDI switched to csi-clone strategy (@kvaps in #2133).

  • [monitoring] Add inlineScrapeConfig support to tenant vmagent: Tenants can now define inline scrape configurations directly in their VMAgent spec, enabling custom metrics collection from services that are not discoverable via standard Kubernetes service discovery (@mattia-eleuteri in #2200).

  • [monitoring] Add Slack dashboard URL, vmagent environment label, and dynamictext Grafana plugin: Added SLACK_DASHBOARD_URL and SLACK_SUMMARY_FMT environment variables for richer alert notifications, per-vmagent environment label for metric source identification, and the dynamictext-panel plugin for Grafana dashboards (@vnyakas in #2210).

  • [monitoring] Scope infrastructure dashboards to tenant-root only: Infrastructure-level Grafana dashboards are now scoped to the tenant-root namespace only, preventing them from appearing in tenant sub-namespaces and reducing dashboard noise (@mattia-eleuteri in #2197).

  • [tenant] Allow egress to virt-handler for VM metrics scraping: Extended tenant NetworkPolicy to permit egress to virt-handler pods, enabling Prometheus to scrape VM-level metrics from KubeVirt without additional policy exceptions (@mattia-eleuteri in #2199).

  • [dashboard] Add keycloakInternalUrl for backend-to-backend OIDC requests: Added a keycloakInternalUrl platform value for the dashboard backend to perform OIDC token introspection via an internal cluster URL, avoiding external round-trips and improving reliability in air-gapped environments (@sircthulhu in #2224).

  • [dashboard] Add secret-hash annotation to KeycloakClient for secret sync: Added a secret-hash annotation to the KeycloakClient resource so that changes to the client secret trigger automatic reconciliation and propagation to dependent components (@sircthulhu in #2231).

  • [docs] Add OpenAPI and Go types code generation for apps: Added tooling to generate OpenAPI schemas and Go types from Helm chart values, enabling type-safe programmatic access to managed application configurations and automatic API reference ge...

Read more