MIXEDARCH-80: Guide on using node affinity to safely schedule pods in heterogeneous clusters.#5983
Conversation
8325f07 to
fa0a439
Compare
fa0a439 to
b93e1da
Compare
a21f23d to
2f84b12
Compare
|
I'd expect to see some discussion of the subscription.spec.config.nodeSelector and subscription.spec.nodeAffinity(new 4.12) and how that can also set nodeselector/affinity values on the operator deployment. |
|
@bparees I completely agree that the subscription is an important topic. I need to check with @perdasilva in terms of where those docs should live in full, since I feel like it would probably be closer to the OLM functionality than OperatorSDK. I can absolutely cross-link the docs for that from this guide, though. |
yup, my original draft of my comment said something like "this may not belong in SDK docs, but....." and then i got lazy and deleted it. So yeah, this may not be the spot to document it fully, but these docs should reference the functionality since it has implications for how someone's operator may get configured/installed by the admin. |
jmrodri
left a comment
There was a problem hiding this comment.
Overall looks really good. Thanks for the contribution.
I agree, mentioning them here is a good idea. |
| Go operators may also create Pod-defining objects dynamically. An example of this kind of object would be in `testdata/go/v3/memcached-operator/controllers/memcached_controller.go`. | ||
| ```go | ||
| Spec: corev1.PodSpec{ | ||
| SecurityContext: &corev1.PodSecurityContext{ | ||
| ... |
There was a problem hiding this comment.
Why do we need this part? Why not just add "Following an example using Golang for you know how can you configure the nodeAffinity for the workloads in the reconciliation"
There was a problem hiding this comment.
So my thinking was:
- Provide the general idea and the common ground between the operator types
- Provide a before and after walkthrough to explain how those changes might look different for each of the operator types when those objects are defined in the different languages.
I could absolutely simplify the whole back-half of the doc down to - we leave the specifics of the transformation to include node-affinity in your Pod definitions as an exercise for the user.
However, I guess I was writing the document more for people like me. I don't know the specific syntactical differences between languages (and would struggle to even know where to look for such info), so I find it helpful to see an example where I can see exactly what changed so that I can replicate it each place where I can identify the pattern.
| In the `testdata/go/v3` samples app, you'll find several configuration objects that need a nodeAffinity specification in the `config` and `bundle/manifests` directories. The full list is as follows: | ||
| - `testdata/go/v3/memcached-operator/bundle/manifests/memcached-operator.clusterserviceversion.yaml` | ||
| - `testdata/go/v3/memcached-operator/config/default/manager_config_patch.yaml` | ||
| - `testdata/go/v3/memcached-operator/config/default/manager_webhook_patch.yaml` | ||
| - `testdata/go/v3/memcached-operator/config/default/manager_auth_proxy_patch.yaml` | ||
| - `testdata/go/v3/memcached-operator/config/manager/manager.yaml` |
There was a problem hiding this comment.
All languages (with the exception of JAVA) scaffold the config directory.
I do not think that we should say in the testdata/ ... we can describe what authors should to do in the default scaffold of their project. We can link the testdata as an example.
Also, not all these files should be updated to receive the nodeAffinity.
Only config/default/manager_config_patch.yaml and config/manager/manager.yaml
The other ones are only patches.
I think would be nice if you check the PR for update Kubebuilder: kubernetes-sigs/kubebuilder#2906. Those changes will endup here as well, also look at the doc proposed there. I think we can shape this part here.
c/c @jmrodri
| - `testdata/ansible/memcached-operator/config/testing/manager_image.yaml` | ||
| - `testdata/ansible/memcached-operator/config/testing/pull_policy/Always.yaml` | ||
| - `testdata/ansible/memcached-operator/config/testing/pull_policy/IfNotPresent.yaml` | ||
| - `testdata/ansible/memcached-operator/config/testing/pull_policy/Never.yaml` |
There was a problem hiding this comment.
The only Deployment define here is in the manager_image (but that is for test the project only)
Note that the other files are patches to add envars and etcs.
Ansible/Helm also uses the same structure for Golang and the projects by default scaffolds one 2 workloads which are the containers for the manager and the proxy.
Check the PR for the docs in KB: https://github.com/kubernetes-sigs/kubebuilder/pull/2906/files#diff-b79471388fbaeb59d4a9793846e4b1c0e533f34c4768b5fad4907338b90c74a8
Btw, I think we can link the KB doc and just add here what is SDK specific.
c/c @jmrodri
| ## Overview | ||
| A heterogeneous cluster is a cluster in which the nodes differ from each other in a way that makes their containers incompatible with each other. The most common incompatibility that applies is CPU architecture. The rest of this document assumes that the user is developing an application that needs to be able to deploy cleanly in a cluster with a mix of node CPU architectures. Because an operator deployed to a heterogeneous cluster may not support all of the architectures in the cluster, it is important to set up safeguards that prevent scheduling pods on nodes of incompatible architectures. These scheduling implications affect both the operator as well as its operands. | ||
|
|
||
| ### Determining the Supported Architectures for an Operator/Operand Image |
There was a problem hiding this comment.
| ### Determining the Supported Architectures for an Operator/Operand Image | |
| ### Determining the Supported Architectures for an workloads images |
| ```bash | ||
| docker manifest inspect <image> | ||
| ``` | ||
| If the above command fails, it may be because the referenced name points to a single image rather than a manifest list. In this case, you can find the architecture by inspecting the image. |
There was a problem hiding this comment.
Not only the arch matters right? So, I think this explanation can lead to a mistake.
If you follow this one you might have an image that supports darwin/arch64 and you will configure it to work in a node that is linux/arm64.
c/c @jmrodri
There was a problem hiding this comment.
So while OS matters, its much less relevant when we talk about heterogeneous clusters. It would be a very strange set-up that called for some nodes on Windows and some on Linux for example.
You mention darwin as a possibility, but that would generally have to involve clusters that span across a workstation and some external source of nodes. It's not that these are impossible, but they're very unlikely.
--
That said, I will rework the working so that it's a little less "architecture" and a little more "platform".
There was a problem hiding this comment.
Hi @jaypoulz ,
My point is that the image can have support for example to SO/arm64 and NOT linux/arm64
Then, following this guide I will be configuring it to work in nodes with Linux/arm54 which could fail.
The guidance does not cover so here (which makes it harder for someone that does not know how it works) and in the nodeAffiinity example. So, I think that is missing and ihmo would be better to recommend the command without the grep and just put a part of the result to illustrate how that works and be meaninful.
|
/retest |
|
@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
everettraven
left a comment
There was a problem hiding this comment.
Thanks for this contribution @jaypoulz ! Overall this looks great, I just have some nits
cbec207 to
cb9098d
Compare
cb9098d to
e922784
Compare
|
/retest |
|
@jaypoulz: Cannot trigger testing until a trusted user reviews the PR and leaves an DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
e922784 to
7735151
Compare
|
I believe this draft is ready for final review. :) |
…d scheduling in multi-architecture compute clusters. Signed-off-by: Jeremy Poulin <[email protected]>
7735151 to
a27d401
Compare
|
Discovered a last inconsistency with the latest rebase, but should be all clear now as tests are passing locally. |
everettraven
left a comment
There was a problem hiding this comment.
Super tiny nit, but it shouldn't block this PR. Looks good to me otherwise - thanks for all the work on this @jaypoulz !
/lgtm
| The basic principle of supporting multiple architectures is to ensure that each of your operator images is built for each of the architectures to be supported. From there, the images should be hosted in image registries as manifest lists. Finally, you'll need to update your distribution configuration to set which architectures are supported. This section explains each of these concepts in turn. | ||
|
|
||
| ### Building an Operator for Multiple Architectures | ||
| [Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start. |
There was a problem hiding this comment.
Nit:
| [Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages KubeBuilder to ensure that builds can be cross-platform from the start. | |
| [Kubebuilder][kubebuilder] explains how you can use `docker buildx` to build images for multiple architectures. Operator SDK leverages Kubebuilder to ensure that builds can be cross-platform from the start. |
|
Thanks guys! :) |
…d scheduling in multi-architecture compute clusters. (operator-framework#5983) Signed-off-by: Jeremy Poulin <[email protected]> Signed-off-by: uguendu <[email protected]>
Description of the change:
Expanded guide for operator support for multiple architectures that covers setting nodeAffinity for pod-defining objects to ensure safe scheduling.
Motivation for the change:
There is already a guide for creating an operator with support for multiple architectures, but this is not sufficient to cover the nuances of scheduling pods on a cluster with multi-architecture compute nodes. This guide leverages the existing sample applications to deep dive into the changes needed for operators of each type (ansible, go, helm).