<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.2">Jekyll</generator><link href="https://dulek.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://dulek.github.io/" rel="alternate" type="text/html" /><updated>2022-10-24T18:19:47+00:00</updated><id>https://dulek.github.io/feed.xml</id><title type="html">Dulek’s Blog</title><subtitle>My name is Michał Dulko. I work as an OpenShift and OpenStack developer at Red Hat. This blog will gather articles about the stuff I'm coding.</subtitle><author><name>Michał Dulko</name><email>michal.dulko@gmail.com</email></author><entry><title type="html">Kuryr - when to and when not to consider it</title><link href="https://dulek.github.io/2022/10/24/why-kuryr.html" rel="alternate" type="text/html" title="Kuryr - when to and when not to consider it" /><published>2022-10-24T18:00:00+00:00</published><updated>2022-10-24T18:00:00+00:00</updated><id>https://dulek.github.io/2022/10/24/why-kuryr</id><content type="html" xml:base="https://dulek.github.io/2022/10/24/why-kuryr.html">&lt;p&gt;Kuryr-Kubernetes is a CNI plugin for Kubernetes, that uses OpenStack native
networking layer - Neutron and Octavia, to provide networking for Pods and
Services in K8s clusters running on OpenStack. While it’s a tempting concept to
unify the networking layers of both OpenStack and Kubernetes and there are
clear performance benefits, such setups also have a number of disadvantages
that I’ll try to explain in this article.&lt;/p&gt;

&lt;h2 id=&quot;why-would-i-even-bother-with-kuryr&quot;&gt;Why would I even bother with Kuryr?&lt;/h2&gt;

&lt;p&gt;So what are the advantages of Kuryr over, say, ovn-kubernetes? The problem with
virtually any more complicated K8s CNI plugin is that it’ll introduce double
encapsulation of the packets. Basically Pod’s packets will be wrapped with
ovn-kubernetes tunneling on CNI level and then by Neutron tunneling on VM
level. This means increased latency and lower throughput.  To prove this we
did a bunch of testing a while ago and &lt;a href=&quot;https://cloud.redhat.com/blog/accelerate-your-openshift-network-performance-on-openstack-with-kuryr&quot;&gt;Kuryr seems to be better in terms of
performance in most
cases&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/kuryr-tunneling.png&quot; alt=&quot;Kuryr encapsulation diagram&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You can technically work this around by using a provider network in Neutron,
but provider networks are much less flexible than virtualized tenant networks
created on the fly, so it’s not a magic bullet.&lt;/p&gt;

&lt;p&gt;Another advantage is operational simplicity. Basically it’s easier to learn how
to analyze and debug a single SDN instead of a couple, with two totally
different concepts and virtually a blackbox on the VM level for CNI. There is a
caveat here, you’ll probably need to learn to debug Neutron features you rarely
used before. Moreover debugging Kuryr is a thing too, but most issues boil down
to problems in Neutron or Octavia. I’ll talk more about this later in the
article.&lt;/p&gt;

&lt;h2 id=&quot;issues-with-kuryr-environments&quot;&gt;Issues with Kuryr environments&lt;/h2&gt;

&lt;p&gt;But obviously not everything is sunshine and rainbows. Kuryr environments
suffer from some limitations and issues related to the general design,
underlying OpenStack infrastructure and bugs. First let’s talk about problems
introduced by design assumptions of Kuryr.&lt;/p&gt;

&lt;p&gt;First of all Kuryr is tied to OpenStack. This means that if your application
depends on any behavior that is Kuryr-specific (manipulating Neutron ports or
SGs, Octavia LBs, expecting each K8s namespace to have its own subnet in
OpenStack, etc.), then you won’t be able to easily run it on a non-OpenStack
public cloud.&lt;/p&gt;

&lt;p&gt;As with Kuryr &lt;em&gt;all&lt;/em&gt; Services are represented as Octavia loadbalancers, you’ll
face the usual limitations of Octavia. E.g. if you’re using Amphora provider,
each LB will be a VM that consumes some cloud resources. This is solved by
Octavia OVN provider, but it’s only available for deployments using OVN as the
Neutron driver.&lt;/p&gt;

&lt;p&gt;Some features are currently impossible to implement with Octavia, e.g.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sessionAffinity&lt;/code&gt;, OpenShift’s automatic unidling, etc. There are also some
bugs in Octavia that haunted Kuryr deployments. In particular with high churn
LBs often get stuck in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PENDING_UPDATE&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;PENDING_DELETE&lt;/code&gt; states which
are immutable and Kuryr cannot do anything about them, leading to issues with
the Service represented by that LB.&lt;/p&gt;

&lt;p&gt;Another set of problems comes from Neutron, which is used to provide networking
for all the Pods. In general the scale Kuryr puts Neutron in is the main
issue. With Kuryr there will be hundreds of subports attached to trunk ports
and I don’t think there’s any other application that exercises Neutron in such
scale. An obvious difference from other SDNs is that in high churn Kuryr envs
pod creation times will be significantly higher as each time Neutron has to be
contacted to create ports. This is slightly mitigated by Kuryr maintaining a
pool of ports ready to be attached to pods, but it’s still a problem when a lot
of pods get created at once, e.g. when applying Helm charts.&lt;/p&gt;

&lt;p&gt;Neutron has its own bugs too. For example we’ve observed very high time to
create ports when there are many concurrent bulk port create requests. Neutron
team tracked this down to IPAM conflict-and-retry races between threads serving
these requests and situation improved, but it can still take quite some time to
create multiple ports. In critical cases this lead to Kuryr orphaning ports in
OpenStack and only possible strategy we had to fix this was to periodically
look for such ports and delete them.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;So when to consider Kuryr? It’s probably easier to answer an opposite question.
If you expect a lot of churn in your environment - pods and services getting
created often and in high numbers, then you need to check if your Neutron will
be able to handle that and the longer Pod and Services time-to-wire is
acceptable for you. Besides that you probably want to make sure your underlying
OpenStack cloud is in a current version, so that all the fixes Kuryr team
requested from Neutron and Octavia teams are included.&lt;/p&gt;

&lt;p&gt;If the above issues are not of a concern for you, then you should really
consider using Kuryr for the sake of improved performance and simplicity of the
networking layer of your combined K8s and OpenStack clouds&lt;/p&gt;</content><author><name>Michał Dulko</name><email>michal.dulko@gmail.com</email></author><category term="openshift" /><category term="openstack" /><category term="kubernetes" /><category term="kuryr" /><summary type="html">Kuryr-Kubernetes is a CNI plugin for Kubernetes, that uses OpenStack native networking layer - Neutron and Octavia, to provide networking for Pods and Services in K8s clusters running on OpenStack. While it’s a tempting concept to unify the networking layers of both OpenStack and Kubernetes and there are clear performance benefits, such setups also have a number of disadvantages that I’ll try to explain in this article.</summary></entry><entry><title type="html">type=LoadBalancer Services in Kuryr and cloud provider</title><link href="https://dulek.github.io/2022/07/26/loadbalancer-services.html" rel="alternate" type="text/html" title="type=LoadBalancer Services in Kuryr and cloud provider" /><published>2022-07-26T13:40:52+00:00</published><updated>2022-07-26T13:40:52+00:00</updated><id>https://dulek.github.io/2022/07/26/loadbalancer-services</id><content type="html" xml:base="https://dulek.github.io/2022/07/26/loadbalancer-services.html">&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt; are the backbone of many applications based on
Kubernetes. They allow using cloud’s load balancer implementation to expose a
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; to the outside world. An alternative is using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Ingress&lt;/code&gt;, but that
might feel more complicated, requires running Ingress Controller and only works
with HTTP, so often times you’re forced to use the hero of this post. Let’s then
dive in into how it works with OpenStack as Kubernetes or OpenShift platform.&lt;/p&gt;

&lt;h2 id=&quot;octavia&quot;&gt;Octavia&lt;/h2&gt;

&lt;p&gt;First we got to understand how load balancers are done in OpenStack. The
LBaaS project here is called Octavia. By default Octavia uses Amphora provider
(providers are like backends for Octavia). Amphora implements load balancers by
spawning small VMs (called… Amphoras) with HAProxy and a tiny agent on them.
Octavia will then call the agent each time HAProxy configuration needs to be
adjusted. And HAProxy will obviously serve as the LB itself. The idea is simple
but the downside is obvious - VMs consume vCPUs and memory.&lt;/p&gt;

&lt;p&gt;A modern way to tackle the problem comes with OVN. If your cloud uses it as a
Neutron backend, you can also use the Octavia OVN provider. Instead of spawning
VMs it will set up load balancers in OVN, greatly reducing the overhead they
create. The downside here is that OVN LBs lack any L7 capabilities and in
general only recently started to catch up with Amphora’s features.&lt;/p&gt;

&lt;h2 id=&quot;typeloadbalancer-with-cloud-provider-openstack&quot;&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; with cloud-provider-openstack&lt;/h2&gt;

&lt;p&gt;Kubernetes needs to do some trickery to actually use cloud’s load balancers to
expose &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt;. First of all there’s a concept of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NodePort&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt;,
which expose &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt;’s pods on a random port of the node (from a specified
range). This means that you can call node on a specified port and the traffic
will be redirected to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; pods. This allows cloud provider to create
an LB and add all the K8s nodes as its members. External traffic will reach the
LB, get redirected to one of the nodes and node will direct it into the
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; pod. But what if that particular node doesn’t have any matching pod?&lt;/p&gt;

&lt;h3 id=&quot;externaltrafficpolicy&quot;&gt;ExternalTrafficPolicy&lt;/h3&gt;

&lt;p&gt;By default in the case described above, the traffic gets redirected to the node
that has a matching pod. This creates a problem - traffic loses the source IP
which may be important for some applications. K8s solves this by introducing
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ExternalTrafficPolicy&lt;/code&gt; setting on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt;. If you’ll set it to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Local&lt;/code&gt;,
you’re guaranteed that the traffic will never get redirected. But you cannot
expect all the nodes will have a pod of your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt;, so how to deal with
that? Kubernetes requires the LB to healtcheck the members of the LB. That way
even with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ETP=Local&lt;/code&gt; you’re guaranteed to hit the nodes that actually hold the
correct pods.&lt;/p&gt;

&lt;p&gt;In Octavia healthchecks are called health monitors and you can easily
configure your cloud-provider-openstack to create them:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;&lt;span class=&quot;nn&quot;&gt;[LoadBalancer]&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;use-octavia&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;True&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;create-monitor&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;True&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;monitor-delay&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;3&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;monitor-max-retries&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;3&lt;/span&gt;
&lt;span class=&quot;py&quot;&gt;monitor-timeout&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;1&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;So what’s the problem here? OVN Octavia provider doesn’t support health
monitors &lt;a href=&quot;https://docs.openstack.org/releasenotes/ovn-octavia-provider/wallaby.html#new-features&quot;&gt;until OpenStack
Wallaby&lt;/a&gt;.
In case of RH OSP it’s until release of version 17.0. This means that
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ETP=Local&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt; will be unreliable there.&lt;/p&gt;

&lt;h3 id=&quot;other-issues&quot;&gt;Other issues&lt;/h3&gt;

&lt;p&gt;There are other problems related to integration between Octavia and Kubernetes.
&lt;a href=&quot;/2022/07/14/capo-mapo-cloud-provider.html#cloud-provider-openstack-in-openshift&quot;&gt;Legacy in-tree cloud provider&lt;/a&gt;
doesn’t support protocols different from TCP. This is because legacy provider
was developed with Neutron LBaaS in mind, which didn’t support that
protocol&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;. Another deficiency here is that in-tree cloud provider won’t use
Octavia bulk APIs and anyone who worked with Octavia and Amphora knows that
it’s slow to apply changes. This means that if you have a high number of nodes,
creation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt; will take long because members are
added one-by-one, waiting for the LB to become ACTIVE again. We’ve got reports
about it taking around and hour for 80 nodes to get added. And note that the
floating IP is added as the last operation on LB creation, so until it finishes
you cannot call your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; from the outside. Using OVN Octavia provider
helps quite a bit because it’s significantly faster in terms of time to
configure an LB.&lt;/p&gt;

&lt;p&gt;Using the modern out-of-tree cloud-provider-openstack helps quite a bit here.
It’ll allow you to create UDP and even SCTP LBs and will use bulk APIs to speed
up LB creation. But we’ve still found
&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=2042976&quot;&gt;some&lt;/a&gt;
&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=2100135&quot;&gt;issues&lt;/a&gt; with it. Turns
out it wasn’t tested much with OVN Octavia provider and the bulk APIs it is
using weren’t functioning correctly in OVN provider. This is fixed now, but you
need to make sure OpenStack cloud you’re using is upgraded.&lt;/p&gt;

&lt;h2 id=&quot;kuryr-kubernetes-and-lbs&quot;&gt;Kuryr-Kubernetes and LBs&lt;/h2&gt;

&lt;p&gt;You might know that Kuryr-Kubernetes is a CNI plugin option that ties the
Kubernetes networking very closely to Neutron. In general it will provide
networking by creating a subport in Neutron for every pod, and connect them to
the node’s port, being a trunk port. This way isolation is achieved using VLAN
tags. This also means that the Octavia load balancers for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt; can have
pods added as members directly, without relying on NodePorts. This alone is
solving the issue of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ETP=local&lt;/code&gt; as Kuryr will make sure only correct addresses
are added to the LB, but also greatly speeds up LB creation time because it
doesn’t need to add all the nodes to the LB as members. All that’s needed to be
done for a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; to become &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; is to attach a floating ip to
it.&lt;/p&gt;

&lt;p&gt;Kuryr is also actively maintained, meaning that it implements newer APIs,
enabling UDP and SCTP load balancing even for older OpenShift versions.&lt;/p&gt;

&lt;p&gt;It’s worth saying that using Kuryr with OpenShift will greatly increase the
number of LBs created in Octavia, as they’re created not only for
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt;, but also the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=ClusterIP&lt;/code&gt; ones. In bare
OpenShift installation that’s around 50 LBs, so Amphora is not recommended as
the provider due to resource consumption. Please also note that there are
scalability issues when a high number of pods is created at once, as it puts a
lot of stress on the Neutron API.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;I hope this helps to decide which load balancing model is better for your use
case. To overcome the listed Octavia deficiencies you need a version of
OpenShift allowing you to use the external cloud provider and new OpenStack
(even newer if you’re using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ETP=Local&lt;/code&gt;. On the other hand using Kuryr will put
a lot stress onto your OpenStack cluster as Neutron will be used not only to
network VMs but also all the pods and load balancers.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In fact we’re pretty lucky here, legacy cloud provider uses gophercloud
  (OpenStack client for golang) version that doesn’t have a notion of 
  Octavia. Only a clever hack and the fact that Octavia v1 API is identical
  with Neutron LBaaS allows it to work with Octavia. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name>Michał Dulko</name><email>michal.dulko@gmail.com</email></author><category term="openshift" /><category term="openstack" /><category term="kubernetes" /><category term="kuryr" /><category term="loadbalancer" /><category term="octavia" /><category term="ovn" /><summary type="html">type=LoadBalancer Services are the backbone of many applications based on Kubernetes. They allow using cloud’s load balancer implementation to expose a Service to the outside world. An alternative is using Ingress, but that might feel more complicated, requires running Ingress Controller and only works with HTTP, so often times you’re forced to use the hero of this post. Let’s then dive in into how it works with OpenStack as Kubernetes or OpenShift platform.</summary></entry><entry><title type="html">CCM, cloud provider, CAPO, MAPO - OpenStack in OpenShift explained</title><link href="https://dulek.github.io/2022/07/14/capo-mapo-cloud-provider.html" rel="alternate" type="text/html" title="CCM, cloud provider, CAPO, MAPO - OpenStack in OpenShift explained" /><published>2022-07-14T13:40:52+00:00</published><updated>2022-07-14T13:40:52+00:00</updated><id>https://dulek.github.io/2022/07/14/capo-mapo-cloud-provider</id><content type="html" xml:base="https://dulek.github.io/2022/07/14/capo-mapo-cloud-provider.html">&lt;p&gt;Integrating an app with a cloud is a difficult thing in general, but when the app is called OpenShift and implements a whole container platform abstracting the underlying clouds, the task becomes a serious challenge. That’s why when researching the topic you might feel lost hearing acronyms like &lt;em&gt;CAPO&lt;/em&gt;, &lt;em&gt;MAPO&lt;/em&gt; or &lt;em&gt;OCCM&lt;/em&gt;. This blog post’s goal is to explain roles of these integration points between OpenShift and OpenStack, the components implementing that integration and why this ended up so complicated. For a more detailed discussion of the integration you can check out &lt;a href=&quot;https://www.youtube.com/watch?v=ue0JE4SewCY&quot;&gt;the talk by my colleague Matt Booth presented at OpenInfra Summit in Berlin.&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;integration-components&quot;&gt;Integration components&lt;/h2&gt;

&lt;p&gt;First of all it’s important to understand the relationship between vanilla Kubernetes and OpenShift. In general OpenShift uses upstream Kubernetes code, but may carry additional patches or even completely diverge from upstream. Later in this article we’ll see that this has an undesirable cost, but obviously it happens sometimes.&lt;/p&gt;

&lt;p&gt;In general Kubernetes has two integration components with the underlying cloud - &lt;em&gt;cloud provider&lt;/em&gt; and &lt;em&gt;cluster API provider&lt;/em&gt;. As a rule of thumb you can assume that cloud provider serves the K8s cluster users, while CAPO is what cluster admin interacts with.&lt;/p&gt;

&lt;h3 id=&quot;cloud-provider&quot;&gt;Cloud provider&lt;/h3&gt;

&lt;p&gt;The cloud provider is responsible for serving the workloads running on the Kubernetes platform. For example, when you’re creating a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Service&lt;/code&gt; of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt; then the cloud provider will make sure to create a load balancer on the cloud that will serve traffic for that Service. In the  case of OpenStack, it also manages Node IP addresses, provides an Ingress controller based on Octavia and hosts Manilla and Cinder CSI plugins.&lt;/p&gt;

&lt;p&gt;The cloud provider implementation for OpenStack clouds is called &lt;a href=&quot;https://github.com/kubernetes/cloud-provider-openstack&quot;&gt;cloud-provider-openstack&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;cluster-api-provider&quot;&gt;Cluster API provider&lt;/h3&gt;

&lt;p&gt;The &lt;em&gt;&lt;a href=&quot;https://cluster-api.sigs.k8s.io/&quot;&gt;cluster API&lt;/a&gt;&lt;/em&gt; is a Kubernetes extension that allows Kubernetes to manage its own cluster lifecycle. The &lt;em&gt;cluster API providers&lt;/em&gt; are what implement that API. It means it deploys, monitors and removes VMs on which Kubernetes nodes are running. Cluster API provider acts for example when you scale down your Kubernetes cluster. The Cluster API Provider implementation for OpenStack clouds is called &lt;a href=&quot;https://github.com/kubernetes-sigs/cluster-api-provider-openstack&quot;&gt;cluster-api-provider-openstack&lt;/a&gt; or CAPO.&lt;/p&gt;

&lt;h2 id=&quot;cloud-provider-openstack-in-openshift&quot;&gt;cloud-provider-openstack in OpenShift&lt;/h2&gt;

&lt;p&gt;So far, so good, right? Well, it’s a bit more complicated when it comes to OpenShift. Cloud providers weren’t always in neat, separate repos. In the past Kubernetes kept them in the main repo and today these are called &lt;em&gt;in-tree cloud providers&lt;/em&gt;. Up to 4.11 OpenShift only supported the &lt;em&gt;&lt;a href=&quot;https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/legacy-cloud-providers/openstack&quot;&gt;in-tree OpenStack cloud provider&lt;/a&gt;&lt;/em&gt;. While these are still supported, the Kubernetes community is no longer  allowing new changes to these legacy &lt;em&gt;in-tree&lt;/em&gt; cloud providers. Instead, all new feature development is taking place in the &lt;em&gt;new&lt;/em&gt; cloud providers, which can be called either &lt;em&gt;external&lt;/em&gt;, &lt;em&gt;out-of-tree&lt;/em&gt; or &lt;em&gt;CCM&lt;/em&gt;, for &lt;em&gt;Cloud Controller Manager.&lt;/em&gt; The OpenStack external cloud provider is only supported in tech preview in OpenShift 4.11 and is planned to GA in a future release of OpenShift.&lt;/p&gt;

&lt;p&gt;In practice this means that OpenShift’s cloud-provider-openstack has several limitations compared to Kubernetes running with the external cloud provider, e.g. doesn’t support UDP when it comes to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Services&lt;/code&gt; of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;type=LoadBalancer&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Why have we ended up here, using legacy for so long? Well, making a switch is not trivial. External cloud provider OpenStack has a bit different configuration options (a problem when considering upgrades), made some breaking changes (e.g. regarding node IPs management) and uses different Octavia calls that aren’t fully tested with OVN Octavia provider. But hey, at least OpenShift’s version of the legacy OpenStack cloud provider had not diverged too much from upstream, so making the switch is possible in general. The same cannot be said of Cluster API provider.&lt;/p&gt;

&lt;h2 id=&quot;capo-mapo-and-how-we-got-there&quot;&gt;CAPO, MAPO and how we got there&lt;/h2&gt;

&lt;p&gt;In the case of CAPO, the OpenShift version evolved independently of the upstream version. While this allowed OpenShift to move forward more quickly, it also means that API design of the Machine and MachineSet objects diverged from what’s in the upstream Kubernetes. Such a situation is hard to maintain as any bugfix made upstream most likely needs to be adapted to a different API on the OpenShift side. The team decided to act to solve the issue.&lt;/p&gt;

&lt;p&gt;What we needed was basically a translation layer between OpenShift CAPO and the upstream Kubernetes version of it. We’ve called it MAPO - &lt;em&gt;machine-api-provider-openstack&lt;/em&gt;. MAPO is built in a pretty clever way - it watches for the OpenShift version of Machine objects, translates them into Kubernetes Machine objects and uses them to execute upstream CAPO functions that it is vendoring. An alternative approach could just be to run upstream CAPO normally and let MAPO create the K8s version of Machine objects from the OpenShift ones, but that could lead to users being confused by seeing two versions of Machine objects.&lt;/p&gt;

&lt;p&gt;It’s important to say that MAPO is deployed by default starting from OpenShift 4.11 and currently it’s the only supported option. For the users nothing serious should change as it is supporting all the APIs that diverged OpenShift CAPO was.&lt;/p&gt;

&lt;h2 id=&quot;whats-next-in-openshift-on-openstack&quot;&gt;What’s next in OpenShift on OpenStack?&lt;/h2&gt;

&lt;p&gt;So what’s going to happen in these topics in the near future? In the case of MAPO the answer is pretty simple - we’ll just make sure to keep the vendored CAPO up to date and implement any new upstream API in our translation layer.&lt;/p&gt;

&lt;p&gt;The situation with switching out of legacy cloud provider is more complicated. The OCCM support was initially planned to be completed in 4.11, but we’ve hit unforeseen upgrade issues. The upgrade is complicated as we cannot have two controllers running concurrently and processing events as we would end up with duplicated Octavia load-balancers or see &lt;a href=&quot;https://github.com/kubernetes/kubernetes/issues/109793&quot;&gt;weird behaviors of Node objects&lt;/a&gt;. At the moment the migration path is planned to be completed and GA in one of the upcoming OpenShift releases.&lt;/p&gt;</content><author><name>Michał Dulko</name><email>michal.dulko@gmail.com</email></author><category term="openshift" /><category term="openstack" /><category term="kubernetes" /><category term="capo" /><category term="mapo" /><summary type="html">Integrating an app with a cloud is a difficult thing in general, but when the app is called OpenShift and implements a whole container platform abstracting the underlying clouds, the task becomes a serious challenge. That’s why when researching the topic you might feel lost hearing acronyms like CAPO, MAPO or OCCM. This blog post’s goal is to explain roles of these integration points between OpenShift and OpenStack, the components implementing that integration and why this ended up so complicated. For a more detailed discussion of the integration you can check out the talk by my colleague Matt Booth presented at OpenInfra Summit in Berlin.</summary></entry><entry><title type="html">First post</title><link href="https://dulek.github.io/blog/2022/07/14/welcome-to-jekyll.html" rel="alternate" type="text/html" title="First post" /><published>2022-07-14T09:48:52+00:00</published><updated>2022-07-14T09:48:52+00:00</updated><id>https://dulek.github.io/blog/2022/07/14/welcome-to-jekyll</id><content type="html" xml:base="https://dulek.github.io/blog/2022/07/14/welcome-to-jekyll.html">&lt;p&gt;Alright, so this is the first, test entry on the blog. Welcome, more content to
follow soon. Meanwhile I’ll just put a random code snippet here to have a cheat
sheet for later. And also to see how it looks like rendered.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-golang&quot; data-lang=&quot;golang&quot;&gt;&lt;span class=&quot;c&quot;&gt;/// [Actuator]&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;// Actuator controls machines on a specific infrastructure. All&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;// methods should be idempotent unless otherwise specified.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;type&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Actuator&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;// Create the machine.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;machinev1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Machine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;error&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;// Delete the machine. If no error is returned, it is assumed that all dependent resources have been cleaned up.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Delete&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;machinev1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Machine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;error&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;// Update the machine to the provided definition.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;machinev1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Machine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;error&lt;/span&gt;
	&lt;span class=&quot;c&quot;&gt;// Checks if the machine currently exists.&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;Exists&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;machinev1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Machine&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;bool&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;error&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;</content><author><name>Michał Dulko</name><email>michal.dulko@gmail.com</email></author><category term="blog" /><summary type="html">Alright, so this is the first, test entry on the blog. Welcome, more content to follow soon. Meanwhile I’ll just put a random code snippet here to have a cheat sheet for later. And also to see how it looks like rendered.</summary></entry></feed>