Skip to content

Support running on non-default cluster DNS settings #436

@sbernauer

Description

@sbernauer

Reported in https://github.com/orgs/stackabletech/discussions/35

Currently we hard-code svc.cluster.local in a lot of places.
This is bad, as some user installations have a non-default installation by doing stuff such as https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/, e.g. --cluster-domain=<default-local-domain>.
We need users to be able to configure the Service DNS suffix.

Possible solutions:

  1. Add an cli flag (or better: env var) to the product operators which overrides the svc.cluster.local default.
  2. More ideal: Somehow let operators detect the DNS suffix of the k8s cluster and use that
  3. Maybe listener-operator can help us here
  4. [...]

Research Tasks

  • Leave out e.g. cluster.local and let the DNS lookup do its thing? What about secret op certs? (timeboxed 2h) -> does not work with secret op
  • How do other operatores deal with DNS cluster settings? (timeboxed 2h) -> Strimzi e.g. uses env vars as well alla KUBERNETES_SERVICE_DNS_DOMAIN

Refinement

Option 1: Use ENV var only

  • The operator uses an ENV var (e.g. CLUSTER_DNS_SUFFIX) deployed via helm (openshift will differ)
  • Will default to cluster.local if not set

Pro

  • Easy to implement
  • Straight forward, no need parsing resolv.conf
  • Foundation for Option 2 that can be extended
  • Implementation does not differ for Kubernetes / OS

Con

  • Openshift/Olm: We cannot set the var CLUSTER_DNS_SUFFIX for the secret and listener operator (due its special daemonset deployment).
    We can however edit the demonset afterwards and add the env var (cumbersome but possible).

Option 2: Use ENV var + kubernetes + dns suffix auto detection

  1. Operator reads an env var e.g. CLUSTER_DNS_SUFFIX (containing e.g. my-cluster.local)

    • If this exists use the suffix provided in there and return
  2. If CLUSTER_DNS_SUFFIX does not exist, determine whether we run in kubernetes or not via

    • Checking e.g. KUBERNETES_SERVICE_HOST variable
    • Checking e.g. KUBERNETES_SERVICE_PORT variable
  3. If we run in Kubernetes, read and parse the resolv.conf

    cat /etc/resolv.conf 
    search sble-operators.svc.cluster.local svc.cluster.local cluster.local
    nameserver 10.243.21.53
    options ndots:5
    

    We need to parse the "shortest" entry in the last "search" entry (here: cluster.local).

    If we do not run in Kubernetes, we default to cluster.local and return

  4. If this did not result in a proper DNS suffix we do not default, but error out. There wont be any working deployment.

Pro

  • Non breaking (unless we act upon positive research results from research task 1)
  • Definite improvement for Kubernetes and Openshift (the resolv.conf parsing)
  • Implementation does not differ for Kubernetes / OS

Con

  • Openshift/Olm: We cannot set the CLUSTER_DNS_SUFFIX for the secret and listener operator (due its special demonset deployment).
    We can however edit the demonset afterwards and add the env var (cumbersome but possible).
  • The auto-detection is only a 95% solution, since there may be edge cases not considered for now. But users can set their DNS explicitly using the config env var.

Option 3: DNS operator (OS) / Custom object (Kubernetes) containing the DNS suffix read by all operators

Pro

  • Generic solution utilizing OS DNS operator or kube-dns / coredns

Con

  • Operators will differ for kubernetes / openshift
  • Currently operators do not "know" if they run on Kubernetes or OS (its just our templating around that)
  • More implementation effort

Option 4: Provide a config map in the namespace of the operators containing common shared settings

  • We deploy an additional configmap in the operator namespace containing shared settings like cluster domain

Pro

  • Implementation does not differ for Kubernetes / OS
  • Simple solution
  • Possible preperation for other shared settings

Con

  • Which namespace? Possible name collisions if installed in default?
  • Not sure if that is possible / acceptable in OS
  • Deployed by us or by user?

Outcome

  • Chosen option 2

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions