MIXEDARCH-80: Guide on using node affinity to safely schedule pods in heterogeneous clusters. by jaypoulz · Pull Request #5983 · operator-framework/operator-sdk

jaypoulz · 2022-08-10T21:59:49Z

Description of the change:
Expanded guide for operator support for multiple architectures that covers setting nodeAffinity for pod-defining objects to ensure safe scheduling.

Motivation for the change:
There is already a guide for creating an operator with support for multiple architectures, but this is not sufficient to cover the nuances of scheduling pods on a cluster with multi-architecture compute nodes. This guide leverages the existing sample applications to deep dive into the changes needed for operators of each type (ansible, go, helm).

website/content/en/docs/advanced-topics/heterogeneous.md

bparees · 2022-09-09T12:52:58Z

I'd expect to see some discussion of the subscription.spec.config.nodeSelector and subscription.spec.nodeAffinity(new 4.12) and how that can also set nodeselector/affinity values on the operator deployment.

jaypoulz · 2022-09-09T16:03:44Z

@bparees I completely agree that the subscription is an important topic. I need to check with @perdasilva in terms of where those docs should live in full, since I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

bparees · 2022-09-09T16:06:24Z

I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

yup, my original draft of my comment said something like "this may not belong in SDK docs, but....." and then i got lazy and deleted it.

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

jmrodri

Overall looks really good. Thanks for the contribution.

website/content/en/docs/advanced-topics/heterogeneous.md

jmrodri · 2022-09-09T21:25:09Z

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

I agree, mentioning them here is a good idea.

camilamacedo86 · 2022-09-10T22:03:02Z

website/content/en/docs/advanced-topics/heterogeneous.md

+ Go operators may also create Pod-defining objects dynamically. An example of this kind of object would be in `testdata/go/v3/memcached-operator/controllers/memcached_controller.go`.
+```go
+Spec: corev1.PodSpec{
+    SecurityContext: &corev1.PodSecurityContext{
+      ...


Why do we need this part? Why not just add "Following an example using Golang for you know how can you configure the nodeAffinity for the workloads in the reconciliation"

So my thinking was:

Provide the general idea and the common ground between the operator types

Provide a before and after walkthrough to explain how those changes might look different for each of the operator types when those objects are defined in the different languages.

I could absolutely simplify the whole back-half of the doc down to - we leave the specifics of the transformation to include node-affinity in your Pod definitions as an exercise for the user.

However, I guess I was writing the document more for people like me. I don't know the specific syntactical differences between languages (and would struggle to even know where to look for such info), so I find it helpful to see an example where I can see exactly what changed so that I can replicate it each place where I can identify the pattern.

camilamacedo86 · 2022-09-10T22:10:36Z

website/content/en/docs/advanced-topics/heterogeneous.md

+In the `testdata/go/v3` samples app, you'll find several configuration objects that need a nodeAffinity specification in the `config` and `bundle/manifests` directories. The full list is as follows:
+- `testdata/go/v3/memcached-operator/bundle/manifests/memcached-operator.clusterserviceversion.yaml`
+- `testdata/go/v3/memcached-operator/config/default/manager_config_patch.yaml`
+- `testdata/go/v3/memcached-operator/config/default/manager_webhook_patch.yaml`
+- `testdata/go/v3/memcached-operator/config/default/manager_auth_proxy_patch.yaml`
+- `testdata/go/v3/memcached-operator/config/manager/manager.yaml`


All languages (with the exception of JAVA) scaffold the config directory.
I do not think that we should say in the testdata/ ... we can describe what authors should to do in the default scaffold of their project. We can link the testdata as an example.

Also, not all these files should be updated to receive the nodeAffinity.
Only config/default/manager_config_patch.yaml and config/manager/manager.yaml
The other ones are only patches.

I think would be nice if you check the PR for update Kubebuilder: kubernetes-sigs/kubebuilder#2906. Those changes will endup here as well, also look at the doc proposed there. I think we can shape this part here.

c/c @jmrodri

camilamacedo86 · 2022-09-10T22:14:24Z

website/content/en/docs/advanced-topics/heterogeneous.md

+- `testdata/ansible/memcached-operator/config/testing/manager_image.yaml`
+- `testdata/ansible/memcached-operator/config/testing/pull_policy/Always.yaml`
+- `testdata/ansible/memcached-operator/config/testing/pull_policy/IfNotPresent.yaml`
+- `testdata/ansible/memcached-operator/config/testing/pull_policy/Never.yaml`


The only Deployment define here is in the manager_image (but that is for test the project only)
Note that the other files are patches to add envars and etcs.

Ansible/Helm also uses the same structure for Golang and the projects by default scaffolds one 2 workloads which are the containers for the manager and the proxy.

Check the PR for the docs in KB: https://github.com/kubernetes-sigs/kubebuilder/pull/2906/files#diff-b79471388fbaeb59d4a9793846e4b1c0e533f34c4768b5fad4907338b90c74a8

Btw, I think we can link the KB doc and just add here what is SDK specific.

c/c @jmrodri

camilamacedo86 · 2022-09-10T22:15:12Z

website/content/en/docs/advanced-topics/heterogeneous.md

+## Overview
+A heterogeneous cluster is a cluster in which the nodes differ from each other in a way that makes their containers incompatible with each other. The most common incompatibility that applies is CPU architecture. The rest of this document assumes that the user is developing an application that needs to be able to deploy cleanly in a cluster with a mix of node CPU architectures. Because an operator deployed to a heterogeneous cluster may not support all of the architectures in the cluster, it is important to set up safeguards that prevent scheduling pods on nodes of incompatible architectures. These scheduling implications affect both the operator as well as its operands.
+
+### Determining the Supported Architectures for an Operator/Operand Image


Suggested change

### Determining the Supported Architectures for an Operator/Operand Image

### Determining the Supported Architectures for an workloads images

camilamacedo86 · 2022-09-10T22:16:55Z

website/content/en/docs/advanced-topics/heterogeneous.md

+```bash
+docker manifest inspect <image>
+```
+If the above command fails, it may be because the referenced name points to a single image rather than a manifest list. In this case, you can find the architecture by inspecting the image.


Not only the arch matters right? So, I think this explanation can lead to a mistake.
If you follow this one you might have an image that supports darwin/arch64 and you will configure it to work in a node that is linux/arm64.

c/c @jmrodri

So while OS matters, its much less relevant when we talk about heterogeneous clusters. It would be a very strange set-up that called for some nodes on Windows and some on Linux for example.

You mention darwin as a possibility, but that would generally have to involve clusters that span across a workstation and some external source of nodes. It's not that these are impossible, but they're very unlikely.

--

That said, I will rework the working so that it's a little less "architecture" and a little more "platform".

Hi @jaypoulz ,

My point is that the image can have support for example to SO/arm64 and NOT linux/arm64
Then, following this guide I will be configuring it to work in nodes with Linux/arm54 which could fail.

The guidance does not cover so here (which makes it harder for someone that does not know how it works) and in the nodeAffiinity example. So, I think that is missing and ihmo would be better to recommend the command without the grep and just put a part of the result to illustrate how that works and be meaninful.

jaypoulz · 2023-05-30T15:29:58Z

/retest

openshift-ci · 2023-05-30T15:30:15Z

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

everettraven

Thanks for this contribution @jaypoulz ! Overall this looks great, I just have some nits

website/content/en/docs/advanced-topics/multi-arch.md

everettraven

/lgtm

jaypoulz · 2023-06-10T02:09:15Z

/retest

openshift-ci · 2023-06-10T02:09:31Z

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jaypoulz · 2023-06-12T18:24:25Z

I believe this draft is ready for final review. :)
@everettraven the only differences between this draft and the one your reviewed is the formatting for the internal links according to the hugo.io standard.

…d scheduling in multi-architecture compute clusters. Signed-off-by: Jeremy Poulin <[email protected]>

jaypoulz · 2023-06-12T19:29:43Z

Discovered a last inconsistency with the latest rebase, but should be all clear now as tests are passing locally.

everettraven

Super tiny nit, but it shouldn't block this PR. Looks good to me otherwise - thanks for all the work on this @jaypoulz !

/lgtm

everettraven · 2023-06-13T19:48:45Z

website/content/en/docs/advanced-topics/multi-arch.md

+The basic principle of supporting multiple architectures is to ensure that each of your operator images is built for each of the architectures to be supported. From there, the images should be hosted in image registries as manifest lists. Finally, you'll need to update your distribution configuration to set which architectures are supported. This section explains each of these concepts in turn.
+
+### Building an Operator for Multiple Architectures
+[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.


Nit:

Suggested change

[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.

[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages Kubebuilder to ensure that builds can be cross-platform from the start.

Appears to have gone stale.

jaypoulz · 2023-06-13T23:37:52Z

Thanks guys! :)

…d scheduling in multi-architecture compute clusters. (operator-framework#5983) Signed-off-by: Jeremy Poulin <[email protected]> Signed-off-by: uguendu <[email protected]>

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 10, 2022

openshift-ci bot requested review from jmrodri and marc-obrien August 10, 2022 22:00

camilamacedo86 reviewed Aug 11, 2022

View reviewed changes

website/content/en/docs/advanced-topics/heterogeneous.md Outdated Show resolved Hide resolved