Dulek’s Blog

Kuryr - when to and when not to consider it

2022-10-24T18:00:00+00:00

Kuryr-Kubernetes is a CNI plugin for Kubernetes, that uses OpenStack native networking layer - Neutron and Octavia, to provide networking for Pods and Services in K8s clusters running on OpenStack. While it’s a tempting concept to unify the networking layers of both OpenStack and Kubernetes and there are clear performance benefits, such setups also have a number of disadvantages that I’ll try to explain in this article.

Why would I even bother with Kuryr?

So what are the advantages of Kuryr over, say, ovn-kubernetes? The problem with virtually any more complicated K8s CNI plugin is that it’ll introduce double encapsulation of the packets. Basically Pod’s packets will be wrapped with ovn-kubernetes tunneling on CNI level and then by Neutron tunneling on VM level. This means increased latency and lower throughput. To prove this we did a bunch of testing a while ago and Kuryr seems to be better in terms of performance in most cases.

You can technically work this around by using a provider network in Neutron, but provider networks are much less flexible than virtualized tenant networks created on the fly, so it’s not a magic bullet.

Another advantage is operational simplicity. Basically it’s easier to learn how to analyze and debug a single SDN instead of a couple, with two totally different concepts and virtually a blackbox on the VM level for CNI. There is a caveat here, you’ll probably need to learn to debug Neutron features you rarely used before. Moreover debugging Kuryr is a thing too, but most issues boil down to problems in Neutron or Octavia. I’ll talk more about this later in the article.

Issues with Kuryr environments

But obviously not everything is sunshine and rainbows. Kuryr environments suffer from some limitations and issues related to the general design, underlying OpenStack infrastructure and bugs. First let’s talk about problems introduced by design assumptions of Kuryr.

First of all Kuryr is tied to OpenStack. This means that if your application depends on any behavior that is Kuryr-specific (manipulating Neutron ports or SGs, Octavia LBs, expecting each K8s namespace to have its own subnet in OpenStack, etc.), then you won’t be able to easily run it on a non-OpenStack public cloud.

As with Kuryr all Services are represented as Octavia loadbalancers, you’ll face the usual limitations of Octavia. E.g. if you’re using Amphora provider, each LB will be a VM that consumes some cloud resources. This is solved by Octavia OVN provider, but it’s only available for deployments using OVN as the Neutron driver.

Some features are currently impossible to implement with Octavia, e.g. sessionAffinity, OpenShift’s automatic unidling, etc. There are also some bugs in Octavia that haunted Kuryr deployments. In particular with high churn LBs often get stuck in PENDING_UPDATE or PENDING_DELETE states which are immutable and Kuryr cannot do anything about them, leading to issues with the Service represented by that LB.

Another set of problems comes from Neutron, which is used to provide networking for all the Pods. In general the scale Kuryr puts Neutron in is the main issue. With Kuryr there will be hundreds of subports attached to trunk ports and I don’t think there’s any other application that exercises Neutron in such scale. An obvious difference from other SDNs is that in high churn Kuryr envs pod creation times will be significantly higher as each time Neutron has to be contacted to create ports. This is slightly mitigated by Kuryr maintaining a pool of ports ready to be attached to pods, but it’s still a problem when a lot of pods get created at once, e.g. when applying Helm charts.

Neutron has its own bugs too. For example we’ve observed very high time to create ports when there are many concurrent bulk port create requests. Neutron team tracked this down to IPAM conflict-and-retry races between threads serving these requests and situation improved, but it can still take quite some time to create multiple ports. In critical cases this lead to Kuryr orphaning ports in OpenStack and only possible strategy we had to fix this was to periodically look for such ports and delete them.

Summary

So when to consider Kuryr? It’s probably easier to answer an opposite question. If you expect a lot of churn in your environment - pods and services getting created often and in high numbers, then you need to check if your Neutron will be able to handle that and the longer Pod and Services time-to-wire is acceptable for you. Besides that you probably want to make sure your underlying OpenStack cloud is in a current version, so that all the fixes Kuryr team requested from Neutron and Octavia teams are included.

If the above issues are not of a concern for you, then you should really consider using Kuryr for the sake of improved performance and simplicity of the networking layer of your combined K8s and OpenStack clouds

type=LoadBalancer Services in Kuryr and cloud provider

2022-07-26T13:40:52+00:00

type=LoadBalancer Services are the backbone of many applications based on Kubernetes. They allow using cloud’s load balancer implementation to expose a Service to the outside world. An alternative is using Ingress, but that might feel more complicated, requires running Ingress Controller and only works with HTTP, so often times you’re forced to use the hero of this post. Let’s then dive in into how it works with OpenStack as Kubernetes or OpenShift platform.

Octavia

First we got to understand how load balancers are done in OpenStack. The LBaaS project here is called Octavia. By default Octavia uses Amphora provider (providers are like backends for Octavia). Amphora implements load balancers by spawning small VMs (called… Amphoras) with HAProxy and a tiny agent on them. Octavia will then call the agent each time HAProxy configuration needs to be adjusted. And HAProxy will obviously serve as the LB itself. The idea is simple but the downside is obvious - VMs consume vCPUs and memory.

A modern way to tackle the problem comes with OVN. If your cloud uses it as a Neutron backend, you can also use the Octavia OVN provider. Instead of spawning VMs it will set up load balancers in OVN, greatly reducing the overhead they create. The downside here is that OVN LBs lack any L7 capabilities and in general only recently started to catch up with Amphora’s features.

`type=LoadBalancer` with cloud-provider-openstack

Kubernetes needs to do some trickery to actually use cloud’s load balancers to expose Services. First of all there’s a concept of NodePort Services, which expose Service’s pods on a random port of the node (from a specified range). This means that you can call node on a specified port and the traffic will be redirected to the Service pods. This allows cloud provider to create an LB and add all the K8s nodes as its members. External traffic will reach the LB, get redirected to one of the nodes and node will direct it into the Service pod. But what if that particular node doesn’t have any matching pod?

ExternalTrafficPolicy

By default in the case described above, the traffic gets redirected to the node that has a matching pod. This creates a problem - traffic loses the source IP which may be important for some applications. K8s solves this by introducing ExternalTrafficPolicy setting on Services. If you’ll set it to Local, you’re guaranteed that the traffic will never get redirected. But you cannot expect all the nodes will have a pod of your Service, so how to deal with that? Kubernetes requires the LB to healtcheck the members of the LB. That way even with ETP=Local you’re guaranteed to hit the nodes that actually hold the correct pods.

In Octavia healthchecks are called health monitors and you can easily configure your cloud-provider-openstack to create them:

[LoadBalancer]
use-octavia = True
create-monitor = True
monitor-delay = 3
monitor-max-retries = 3
monitor-timeout = 1

So what’s the problem here? OVN Octavia provider doesn’t support health monitors until OpenStack Wallaby. In case of RH OSP it’s until release of version 17.0. This means that ETP=Local Services will be unreliable there.

Other issues

There are other problems related to integration between Octavia and Kubernetes. Legacy in-tree cloud provider doesn’t support protocols different from TCP. This is because legacy provider was developed with Neutron LBaaS in mind, which didn’t support that protocol¹. Another deficiency here is that in-tree cloud provider won’t use Octavia bulk APIs and anyone who worked with Octavia and Amphora knows that it’s slow to apply changes. This means that if you have a high number of nodes, creation of type=LoadBalancer Services will take long because members are added one-by-one, waiting for the LB to become ACTIVE again. We’ve got reports about it taking around and hour for 80 nodes to get added. And note that the floating IP is added as the last operation on LB creation, so until it finishes you cannot call your Service from the outside. Using OVN Octavia provider helps quite a bit because it’s significantly faster in terms of time to configure an LB.

Using the modern out-of-tree cloud-provider-openstack helps quite a bit here. It’ll allow you to create UDP and even SCTP LBs and will use bulk APIs to speed up LB creation. But we’ve still found some issues with it. Turns out it wasn’t tested much with OVN Octavia provider and the bulk APIs it is using weren’t functioning correctly in OVN provider. This is fixed now, but you need to make sure OpenStack cloud you’re using is upgraded.

Kuryr-Kubernetes and LBs

You might know that Kuryr-Kubernetes is a CNI plugin option that ties the Kubernetes networking very closely to Neutron. In general it will provide networking by creating a subport in Neutron for every pod, and connect them to the node’s port, being a trunk port. This way isolation is achieved using VLAN tags. This also means that the Octavia load balancers for Services can have pods added as members directly, without relying on NodePorts. This alone is solving the issue of ETP=local as Kuryr will make sure only correct addresses are added to the LB, but also greatly speeds up LB creation time because it doesn’t need to add all the nodes to the LB as members. All that’s needed to be done for a Service to become type=LoadBalancer is to attach a floating ip to it.

Kuryr is also actively maintained, meaning that it implements newer APIs, enabling UDP and SCTP load balancing even for older OpenShift versions.

It’s worth saying that using Kuryr with OpenShift will greatly increase the number of LBs created in Octavia, as they’re created not only for type=LoadBalancer Services, but also the type=ClusterIP ones. In bare OpenShift installation that’s around 50 LBs, so Amphora is not recommended as the provider due to resource consumption. Please also note that there are scalability issues when a high number of pods is created at once, as it puts a lot of stress on the Neutron API.

Summary

I hope this helps to decide which load balancing model is better for your use case. To overcome the listed Octavia deficiencies you need a version of OpenShift allowing you to use the external cloud provider and new OpenStack (even newer if you’re using ETP=Local. On the other hand using Kuryr will put a lot stress onto your OpenStack cluster as Neutron will be used not only to network VMs but also all the pods and load balancers.

In fact we’re pretty lucky here, legacy cloud provider uses gophercloud (OpenStack client for golang) version that doesn’t have a notion of Octavia. Only a clever hack and the fact that Octavia v1 API is identical with Neutron LBaaS allows it to work with Octavia. ↩

CCM, cloud provider, CAPO, MAPO - OpenStack in OpenShift explained

2022-07-14T13:40:52+00:00

Integrating an app with a cloud is a difficult thing in general, but when the app is called OpenShift and implements a whole container platform abstracting the underlying clouds, the task becomes a serious challenge. That’s why when researching the topic you might feel lost hearing acronyms like CAPO, MAPO or OCCM. This blog post’s goal is to explain roles of these integration points between OpenShift and OpenStack, the components implementing that integration and why this ended up so complicated. For a more detailed discussion of the integration you can check out the talk by my colleague Matt Booth presented at OpenInfra Summit in Berlin.

Integration components

First of all it’s important to understand the relationship between vanilla Kubernetes and OpenShift. In general OpenShift uses upstream Kubernetes code, but may carry additional patches or even completely diverge from upstream. Later in this article we’ll see that this has an undesirable cost, but obviously it happens sometimes.

In general Kubernetes has two integration components with the underlying cloud - cloud provider and cluster API provider. As a rule of thumb you can assume that cloud provider serves the K8s cluster users, while CAPO is what cluster admin interacts with.

Cloud provider

The cloud provider is responsible for serving the workloads running on the Kubernetes platform. For example, when you’re creating a Service of type=LoadBalancer then the cloud provider will make sure to create a load balancer on the cloud that will serve traffic for that Service. In the case of OpenStack, it also manages Node IP addresses, provides an Ingress controller based on Octavia and hosts Manilla and Cinder CSI plugins.

The cloud provider implementation for OpenStack clouds is called cloud-provider-openstack.

Cluster API provider

The cluster API is a Kubernetes extension that allows Kubernetes to manage its own cluster lifecycle. The cluster API providers are what implement that API. It means it deploys, monitors and removes VMs on which Kubernetes nodes are running. Cluster API provider acts for example when you scale down your Kubernetes cluster. The Cluster API Provider implementation for OpenStack clouds is called cluster-api-provider-openstack or CAPO.

cloud-provider-openstack in OpenShift

So far, so good, right? Well, it’s a bit more complicated when it comes to OpenShift. Cloud providers weren’t always in neat, separate repos. In the past Kubernetes kept them in the main repo and today these are called in-tree cloud providers. Up to 4.11 OpenShift only supported the in-tree OpenStack cloud provider. While these are still supported, the Kubernetes community is no longer allowing new changes to these legacy in-tree cloud providers. Instead, all new feature development is taking place in the new cloud providers, which can be called either external, out-of-tree or CCM, for Cloud Controller Manager. The OpenStack external cloud provider is only supported in tech preview in OpenShift 4.11 and is planned to GA in a future release of OpenShift.

In practice this means that OpenShift’s cloud-provider-openstack has several limitations compared to Kubernetes running with the external cloud provider, e.g. doesn’t support UDP when it comes to Services of type=LoadBalancer.

Why have we ended up here, using legacy for so long? Well, making a switch is not trivial. External cloud provider OpenStack has a bit different configuration options (a problem when considering upgrades), made some breaking changes (e.g. regarding node IPs management) and uses different Octavia calls that aren’t fully tested with OVN Octavia provider. But hey, at least OpenShift’s version of the legacy OpenStack cloud provider had not diverged too much from upstream, so making the switch is possible in general. The same cannot be said of Cluster API provider.

CAPO, MAPO and how we got there

In the case of CAPO, the OpenShift version evolved independently of the upstream version. While this allowed OpenShift to move forward more quickly, it also means that API design of the Machine and MachineSet objects diverged from what’s in the upstream Kubernetes. Such a situation is hard to maintain as any bugfix made upstream most likely needs to be adapted to a different API on the OpenShift side. The team decided to act to solve the issue.

What we needed was basically a translation layer between OpenShift CAPO and the upstream Kubernetes version of it. We’ve called it MAPO - machine-api-provider-openstack. MAPO is built in a pretty clever way - it watches for the OpenShift version of Machine objects, translates them into Kubernetes Machine objects and uses them to execute upstream CAPO functions that it is vendoring. An alternative approach could just be to run upstream CAPO normally and let MAPO create the K8s version of Machine objects from the OpenShift ones, but that could lead to users being confused by seeing two versions of Machine objects.

It’s important to say that MAPO is deployed by default starting from OpenShift 4.11 and currently it’s the only supported option. For the users nothing serious should change as it is supporting all the APIs that diverged OpenShift CAPO was.

What’s next in OpenShift on OpenStack?

So what’s going to happen in these topics in the near future? In the case of MAPO the answer is pretty simple - we’ll just make sure to keep the vendored CAPO up to date and implement any new upstream API in our translation layer.

The situation with switching out of legacy cloud provider is more complicated. The OCCM support was initially planned to be completed in 4.11, but we’ve hit unforeseen upgrade issues. The upgrade is complicated as we cannot have two controllers running concurrently and processing events as we would end up with duplicated Octavia load-balancers or see weird behaviors of Node objects. At the moment the migration path is planned to be completed and GA in one of the upcoming OpenShift releases.

First post

2022-07-14T09:48:52+00:00

Alright, so this is the first, test entry on the blog. Welcome, more content to follow soon. Meanwhile I’ll just put a random code snippet here to have a cheat sheet for later. And also to see how it looks like rendered.

/// [Actuator]
// Actuator controls machines on a specific infrastructure. All
// methods should be idempotent unless otherwise specified.
type Actuator interface {
	// Create the machine.
	Create(context.Context, *machinev1.Machine) error
	// Delete the machine. If no error is returned, it is assumed that all dependent resources have been cleaned up.
	Delete(context.Context, *machinev1.Machine) error
	// Update the machine to the provided definition.
	Update(context.Context, *machinev1.Machine) error
	// Checks if the machine currently exists.
	Exists(context.Context, *machinev1.Machine) (bool, error)
}

Dulek’s Blog

Kuryr - when to and when not to consider it

Why would I even bother with Kuryr?

Issues with Kuryr environments

Summary

type=LoadBalancer Services in Kuryr and cloud provider

Octavia

type=LoadBalancer with cloud-provider-openstack

ExternalTrafficPolicy

Other issues

Kuryr-Kubernetes and LBs

Summary

CCM, cloud provider, CAPO, MAPO - OpenStack in OpenShift explained

Integration components

Cloud provider

Cluster API provider

cloud-provider-openstack in OpenShift

CAPO, MAPO and how we got there

What’s next in OpenShift on OpenStack?

First post

`type=LoadBalancer` with cloud-provider-openstack