Skip to content

serviceAccount and roleBinding objects can accidentally get deleted #675

@soenkeliebau

Description

@soenkeliebau

Observed Behavior
When deploying multiple clusters of the same product (Airflow, NiFi, ...) into one namespace and then deleting one of them, it can happen that the roleBinding and serviceAccount objects that are shared by all these clusters accidentally get deleted as well.

Root Cause
The reason for this is shown in the following diagram

Image

All cluster objects share the same roleBinding and serviceAccount objects. Which in principle works fine, as the content is the same with regards to every cluster.

Stackable operators track the resources they deploy via labels to ensure that they can delete "orphaned" objects that are no longer needed, this is done via the labels app.kubernetes.io/managed-by and app.kubernetes.io/instance as can be seen in the snippet below.

apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2024-11-22T13:23:34Z"
  labels:
    app.kubernetes.io/instance: simple-nifi2
    app.kubernetes.io/managed-by: nifi.stackable.tech_nificluster
    app.kubernetes.io/name: nifi
  name: simple-nifi2-serviceaccount

Whenever the object with the type matching mangedBy and the name matching instance is deleted, this serviceAccount will also be deleted as it is considered "orphaned" now.

In the scenario shown in the diagram above, the value for the instance label will constantly be set to either "simple-nifi" or "simple-nifi2" depending on which cluster was last changed. If that cluster is then deleted, the roleBinding and serviceAccount will also be cleaned up until the other cluster is next reconciled and the objects are recreated.

Workaround
The only currently know workaround is to trigger a manual reconciliation of the now broken cluster object by making some change, for example adding an annotation:

kubectl patch zookeepercluster/simple-zk --type=merge --patch=({metadata:{annotations:{touch: (date now)}}} | to json)

Fix
The best fix for this seems to be to move to individual roleBinding and serviceAccount objects for every cluster, instead of sharing them.

Todos

Integration tests Kind & Managed

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions