Skip to content

MIXEDARCH-80: Guide on using node affinity to safely schedule pods in heterogeneous clusters.#5983

Merged
everettraven merged 1 commit intooperator-framework:masterfrom
multi-arch:heterogeneous-support
Jun 13, 2023
Merged

MIXEDARCH-80: Guide on using node affinity to safely schedule pods in heterogeneous clusters.#5983
everettraven merged 1 commit intooperator-framework:masterfrom
multi-arch:heterogeneous-support

Conversation

@jaypoulz
Copy link
Contributor

@jaypoulz jaypoulz commented Aug 10, 2022

Description of the change:
Expanded guide for operator support for multiple architectures that covers setting nodeAffinity for pod-defining objects to ensure safe scheduling.

Motivation for the change:
There is already a guide for creating an operator with support for multiple architectures, but this is not sufficient to cover the nuances of scheduling pods on a cluster with multi-architecture compute nodes. This guide leverages the existing sample applications to deep dive into the changes needed for operators of each type (ansible, go, helm).

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 10, 2022
@openshift-ci openshift-ci bot requested review from jmrodri and marc-obrien August 10, 2022 22:00
@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from 8325f07 to fa0a439 Compare August 26, 2022 21:45
@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from fa0a439 to b93e1da Compare September 9, 2022 01:38
@jaypoulz jaypoulz changed the title WIP - Using node affinity to protect scheduling in heterogeneous clusters. Guide on using node affinity to safely schedule operator/operand pods in heterogeneous clusters. Sep 9, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 9, 2022
@jaypoulz jaypoulz changed the title Guide on using node affinity to safely schedule operator/operand pods in heterogeneous clusters. Guide on using node affinity to safely schedule pods in heterogeneous clusters. Sep 9, 2022
@jaypoulz jaypoulz force-pushed the heterogeneous-support branch 3 times, most recently from a21f23d to 2f84b12 Compare September 9, 2022 02:16
@bparees
Copy link
Contributor

bparees commented Sep 9, 2022

I'd expect to see some discussion of the subscription.spec.config.nodeSelector and subscription.spec.nodeAffinity(new 4.12) and how that can also set nodeselector/affinity values on the operator deployment.

@jaypoulz
Copy link
Contributor Author

jaypoulz commented Sep 9, 2022

@bparees I completely agree that the subscription is an important topic. I need to check with @perdasilva in terms of where those docs should live in full, since I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

@bparees
Copy link
Contributor

bparees commented Sep 9, 2022

I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though.

yup, my original draft of my comment said something like "this may not belong in SDK docs, but....." and then i got lazy and deleted it.

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

jmrodri
jmrodri previously requested changes Sep 9, 2022
Copy link
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks really good. Thanks for the contribution.

@jmrodri
Copy link
Member

jmrodri commented Sep 9, 2022

So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin.

I agree, mentioning them here is a good idea.

Comment on lines +157 to +184
Go operators may also create Pod-defining objects dynamically. An example of this kind of object would be in `testdata/go/v3/memcached-operator/controllers/memcached_controller.go`.
```go
Spec: corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this part? Why not just add "Following an example using Golang for you know how can you configure the nodeAffinity for the workloads in the reconciliation"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my thinking was:

  1. Provide the general idea and the common ground between the operator types
  2. Provide a before and after walkthrough to explain how those changes might look different for each of the operator types when those objects are defined in the different languages.

I could absolutely simplify the whole back-half of the doc down to - we leave the specifics of the transformation to include node-affinity in your Pod definitions as an exercise for the user.

However, I guess I was writing the document more for people like me. I don't know the specific syntactical differences between languages (and would struggle to even know where to look for such info), so I find it helpful to see an example where I can see exactly what changed so that I can replicate it each place where I can identify the pattern.

Comment on lines +150 to +178
In the `testdata/go/v3` samples app, you'll find several configuration objects that need a nodeAffinity specification in the `config` and `bundle/manifests` directories. The full list is as follows:
- `testdata/go/v3/memcached-operator/bundle/manifests/memcached-operator.clusterserviceversion.yaml`
- `testdata/go/v3/memcached-operator/config/default/manager_config_patch.yaml`
- `testdata/go/v3/memcached-operator/config/default/manager_webhook_patch.yaml`
- `testdata/go/v3/memcached-operator/config/default/manager_auth_proxy_patch.yaml`
- `testdata/go/v3/memcached-operator/config/manager/manager.yaml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All languages (with the exception of JAVA) scaffold the config directory.
I do not think that we should say in the testdata/ ... we can describe what authors should to do in the default scaffold of their project. We can link the testdata as an example.

Also, not all these files should be updated to receive the nodeAffinity.
Only config/default/manager_config_patch.yaml and config/manager/manager.yaml
The other ones are only patches.

I think would be nice if you check the PR for update Kubebuilder: kubernetes-sigs/kubebuilder#2906. Those changes will endup here as well, also look at the doc proposed there. I think we can shape this part here.

c/c @jmrodri

- `testdata/ansible/memcached-operator/config/testing/manager_image.yaml`
- `testdata/ansible/memcached-operator/config/testing/pull_policy/Always.yaml`
- `testdata/ansible/memcached-operator/config/testing/pull_policy/IfNotPresent.yaml`
- `testdata/ansible/memcached-operator/config/testing/pull_policy/Never.yaml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only Deployment define here is in the manager_image (but that is for test the project only)
Note that the other files are patches to add envars and etcs.

Ansible/Helm also uses the same structure for Golang and the projects by default scaffolds one 2 workloads which are the containers for the manager and the proxy.

Check the PR for the docs in KB: https://github.com/kubernetes-sigs/kubebuilder/pull/2906/files#diff-b79471388fbaeb59d4a9793846e4b1c0e533f34c4768b5fad4907338b90c74a8

Btw, I think we can link the KB doc and just add here what is SDK specific.

c/c @jmrodri

## Overview
A heterogeneous cluster is a cluster in which the nodes differ from each other in a way that makes their containers incompatible with each other. The most common incompatibility that applies is CPU architecture. The rest of this document assumes that the user is developing an application that needs to be able to deploy cleanly in a cluster with a mix of node CPU architectures. Because an operator deployed to a heterogeneous cluster may not support all of the architectures in the cluster, it is important to set up safeguards that prevent scheduling pods on nodes of incompatible architectures. These scheduling implications affect both the operator as well as its operands.

### Determining the Supported Architectures for an Operator/Operand Image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Determining the Supported Architectures for an Operator/Operand Image
### Determining the Supported Architectures for an workloads images

```bash
docker manifest inspect <image>
```
If the above command fails, it may be because the referenced name points to a single image rather than a manifest list. In this case, you can find the architecture by inspecting the image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only the arch matters right? So, I think this explanation can lead to a mistake.
If you follow this one you might have an image that supports darwin/arch64 and you will configure it to work in a node that is linux/arm64.

c/c @jmrodri

Copy link
Contributor Author

@jaypoulz jaypoulz Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So while OS matters, its much less relevant when we talk about heterogeneous clusters. It would be a very strange set-up that called for some nodes on Windows and some on Linux for example.

You mention darwin as a possibility, but that would generally have to involve clusters that span across a workstation and some external source of nodes. It's not that these are impossible, but they're very unlikely.

--

That said, I will rework the working so that it's a little less "architecture" and a little more "platform".

Copy link
Contributor

@camilamacedo86 camilamacedo86 Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jaypoulz ,

My point is that the image can have support for example to SO/arm64 and NOT linux/arm64
Then, following this guide I will be configuring it to work in nodes with Linux/arm54 which could fail.

The guidance does not cover so here (which makes it harder for someone that does not know how it works) and in the nodeAffiinity example. So, I think that is missing and ihmo would be better to recommend the command without the grep and just put a part of the result to illustrate how that works and be meaninful.

@jaypoulz
Copy link
Contributor Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented May 30, 2023

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @jaypoulz ! Overall this looks great, I just have some nits

@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from cbec207 to cb9098d Compare May 31, 2023 14:27
Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@jaypoulz jaypoulz temporarily deployed to deploy May 31, 2023 18:24 — with GitHub Actions Inactive
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 31, 2023
@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from cb9098d to e922784 Compare June 9, 2023 21:56
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2023
@jaypoulz
Copy link
Contributor Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented Jun 10, 2023

@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from e922784 to 7735151 Compare June 12, 2023 18:06
@jaypoulz
Copy link
Contributor Author

I believe this draft is ready for final review. :)
@everettraven the only differences between this draft and the one your reviewed is the formatting for the internal links according to the hugo.io standard.

…d scheduling in multi-architecture compute clusters.

Signed-off-by: Jeremy Poulin <[email protected]>
@jaypoulz jaypoulz force-pushed the heterogeneous-support branch from 7735151 to a27d401 Compare June 12, 2023 19:28
@jaypoulz
Copy link
Contributor Author

Discovered a last inconsistency with the latest rebase, but should be all clear now as tests are passing locally.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super tiny nit, but it shouldn't block this PR. Looks good to me otherwise - thanks for all the work on this @jaypoulz !

/lgtm

The basic principle of supporting multiple architectures is to ensure that each of your operator images is built for each of the architectures to be supported. From there, the images should be hosted in image registries as manifest lists. Finally, you'll need to update your distribution configuration to set which architectures are supported. This section explains each of these concepts in turn.

### Building an Operator for Multiple Architectures
[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start.
[Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages Kubebuilder to ensure that builds can be cross-platform from the start.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 13, 2023
@grokspawn grokspawn dismissed jmrodri’s stale review June 13, 2023 20:02

Appears to have gone stale.

@everettraven everettraven merged commit 025f247 into operator-framework:master Jun 13, 2023
@jaypoulz
Copy link
Contributor Author

Thanks guys! :)

UgurTheG pushed a commit to UgurTheG/operator-sdk that referenced this pull request Jun 15, 2023
…d scheduling in multi-architecture compute clusters. (operator-framework#5983)

Signed-off-by: Jeremy Poulin <[email protected]>
Signed-off-by: uguendu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants