Torlenor.org

Building Chirm: A calmer way to share information across a company

2026-03-05T22:00:00+00:00

Introduction

In many companies, important operational updates are spread across too many places. Some things land in chat, some in email, some are mentioned in meetings and some are simply learned by accident. This gets especially messy once the company grows beyond a few tightly connected teams, or when remote work becomes normal, as it did especially since Covid.

Chat tools are great for fast back-and-forth communication, but they are not necessarily a good place for information that should remain visible, easy to catch up on, and available to the right people later on. A release update, a process change, a location-specific announcement or maybe a very helpful tool that was built in a team should not have to drown in a river of chat messages or Teams teams.

This is one of the reasons why I started building Chirm, an internal communication platform for calmer, asynchronous company-wide updates. The project is available at https://chirm.app/en.

A mildly amusing detail is that this did not start as that grand product plan. At first, it was mainly a playground for microservice architecture and I thought that a microblogging-style application would be a good idea for experimenting with service boundaries, timelines, events and the general joys and pains of distributed systems. Only while building it did I notice that the thing I was building maps surprisingly well to solving a real communication problem inside companies. That is the point where the project started turning from a technical playground into what Chirm is now.

In this article I want to describe the problem Chirm is trying to solve, what kind of system I ended up building and a bit of the technical setup behind it. I may come up with some more technical articles later.

Setup

The basic observation behind Chirm is simple: Many companies do not have a good calm, asynchronous place for the exchange of information.

There are of course many tools around already. Slack and Microsoft Teams are widely used, but they are primarily optimized for collaboration, direct interaction and quick exchange. That is useful, but it also creates a certain mode of communication where everything competes for immediate attention, the last chat message wins. Important updates can easily be buried under ongoing conversations, side discussions, quick questions and the usual background noise of daily work.

At the same time, alternatives on the more social-intranet side often try to do many things at once. That can be useful too, but it also increases complexity.

What I wanted instead was something much narrower: One central place for company communication, built around a personal timeline, where relevant updates can be read asynchronously, reacted to and discussed without creating the feeling that one has to constantly watch a chat window. Basically a social microblogging service inside a company.

And I came up with the core principle that guides the development: The user’s time is precious, do not steal it.

This also means that Chirm is not supposed to replace every communication or collaboration tool inside a company. It is not a project management system, not a wiki and not a general-purpose chat tool. The goal is much more focused: Provide a clean channel for internal company communication that remains readable and useful as an organization grows.

Typical examples that should go over Chirm would be company-wide operational updates, release notes for internal teams, process changes, announcements from leadership, updates for specific departments or locations, and compliance-relevant information that should clearly reach the correct audience. But it should also work for smaller, useful things that normally stay trapped inside one team. A team might build a small internal tool, automate a repetitive task, improve a workflow or discover a useful practice. In many companies, that kind of progress remains local because nobody outside the team ever hears about it, especially since remote work increased the the coffee machine encounters do not happen that often anymore. In Chirm, sharing that should be as easy as writing a short post, so that other teams can react, comment and potentially adapt the idea for their own work.

This is where a timeline-oriented model becomes interesting. Instead of forcing all communication into chat threads or emails, updates can be posted into a structured shared space and consumed when people actually have time for it. And for the really important things, there are mechanisms that allow you to request confirmation, which also makes it a very natural fit for areas like QM.

What did we do?

From the beginning, I wanted Chirm to feel closer to a microblogging platform than to a classic enterprise chat tool, because the goal was not only to distribute important information, but also to make useful work inside a company more visible across team boundaries.

That means a few things. A post is the central object. It can belong to a group, have a topic, receive replies and reactions and remain visible in a timeline. The timeline itself is the main interface. Ideally, most of the daily interaction with the system happens there. Users should not need to jump through ten different menus to understand what is going on or to look into various chat channels or Teams teams to find something.

This also means the system should support both broad and restricted visibility. Some updates are relevant for everyone in a company, while others only make sense within a department, team or location. Therefore group handling and access control are part of the core model.

Another requirement was that the platform should work well in a multi-tenant setup. Chirm is meant to be used by different organizations, each with their own users, groups, permissions and data boundaries. So tenant separation was very important.

On the product side, the first goal was to make posting and reading updates feel straightforward and low effort. A user can create a post, assign it to a group or add a topic, and others can read, react and reply. The timeline then acts as the central view where relevant items come together. This is intentionally simple.

The result is a system where the same basic model can cover different scenarios. A leadership update, a team announcement, a release note, a department-specific operational notice or a short post about a useful internal tool all use the same underlying mechanism. That matters because products become confusing quite quickly when each use case feels like a separate application.

To support all of this technically, the backend was split into services with clearly separated responsibilities, without over-complicating things. There is an API gateway in front of the whole system, an authentication layer that supports SSO and identity providers, and behind it several domain-specific services. For example, posts are handled by the post service, while timeline generation is handled by a timeline service.

I only want to sketch the architecture here at a high level. I will go into more detail about the backend design and the reasoning behind specific architectural choices in a later post. The main point I want to make here is that the service structure follows the communication model quite directly. Writing and retrieving posts, generating timelines, searching content, sending notifications and handling tenant context all have different requirements.

It also supports AI summaries, to help users get a quick overview of activity in their timeline. This becomes more useful once a timeline grows denser or when people return after being away. I wanted summarization to be a distinct capability in the architecture rather than something mixed awkwardly into the rest of the system.

Because this is built for companies and large enterprises, tenants and groups are treated as first-class concepts from the start. Tenant handling determines which organization a user belongs to and what data context is active. Groups define smaller visibility domains inside that tenant. This is what makes it possible to support both broad communication and more focused internal channels without turning everything into one giant company-wide feed, and it is very easy to build company structure into the message flow structure.

The product philosophy and the architecture are therefore connected quite directly. If the goal is calmer communication, the system needs predictable timelines, good search, reliable notifications, clear visibility rules and a user interface that does not constantly fragment attention.

Summary and discussion

Chirm started from a fairly ordinary observation: Companies often lack a good place for operational communication that is visible, structured and not overly noisy.

The default answer is often to use the tools that are already there, especially chat. That works up to a point, but it also creates a mode of communication where important updates are mixed with everything else and easily get lost.

But the issue is not only that important company information gets buried. Another problem is that useful work often stays invisible outside the team that did it. A small internal tool, an improved workflow, a practical script or a process improvement may be very valuable to others, but in many organizations nobody hears about it unless it happens to come up by chance. That means useful ideas stay local when they could have spread.

This is one of the things Chirm is meant to improve. It should not only support structured top-down communication, but also make it easier for teams to share useful things with the rest of the organization in a lightweight and visible way. A short post should be enough to make others aware, let them react, ask questions and reuse what is relevant for them.

Instead of trying to replace every existing tool, Chirm focuses on asynchronous company communication through a simple timeline-based model. The technical design follows from that goal. It should make important updates easier to find, but it should also help ideas, tools and practices travel further across the company.

There is of course still a lot of room for improvement. Internal communication is one of those domains that looks simple at first, but it easily turns into a mess. Building a system that allows communication to remain clear, relevant and pleasant across different teams and organizations is the harder part.

There is also a broader product question underneath all of this. Many software tools optimize for engagement, speed and constant activity, especially classical social media. But for companies this is often the wrong optimization target. Constant noise costs time, attention and therefore money. What matters more is that the right people get the right information at the right time, and that useful knowledge does not remain trapped inside isolated teams. This is what I want to achieve with Chirm.

Next Steps

The immediate next steps for Chirm are not about adding more and more features, but about showing what this communication model can do in practice, both for important operational updates and for lightweight sharing across teams.

For that I continue improving the experience around targeted updates, making it easier to curate relevant information, expanding the document-sharing side and refining the timeline experience further. It also means validating the product in realistic organizational settings.

The overall direction is clear: Building a system that helps companies communicate important things clearly, without adding even more noise to the day.

If you want to see the current product direction or try the demo instance, you can find Chirm at https://chirm.app/en.

Screenshot of Chirm

Using SDL2 in Rust

2023-09-16T22:00:00+00:00

Introduction

Simple DirectMedia Layer or just simple SDL is a cross-platform library used for accessing video, audio, input devices like keyboard, mouse or joysticks, in addition to also providing some networking abstractions. Despite its age of already 25 years, it is still used extensively for games and other multimedia software as an abstraction layer, either using it to directly draw graphics and play sounds or as a lower-level library on which games engines are built.

While written mainly in C, a lot of language bindings where created and one of them is the Rust binding rust-sdl2, which we will introduce here. We will show how to open a window, draw a small thing and use the events system from SDL2 to handle keyboard inputs.

Linux setup

To get started with SDL2 in Rust, you first need the sdl2 library and headers installed on your system (the Rust crate has a bundled feature, where it compiles it from source, but we are not gonna talk about this here). Install the library using the appropriate package manager for your distribution.

Ubuntu:

sudo apt-get install libsdl2-dev

Fedora:

sudo dnf install SDL2-devel

Arch:

sudo pacman -S sdl2

Windows setup

We assume you are using MSVC as your C++ compiler environment. if you are using MINGW, please see the crate documentation on how to continue.

Download the MSVC version of SDL2 from http://www.libsdl.org/ (usually named something like SDL2-devel-2.x.x-VC.zip).
Unzip SDL2-devel-2.x.x-VC.zip.
Copy all .lib files from SDL2-2.x.x\lib\x64\ to %userprofile%\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib\rustlib\x86_64-pc-windows-msvc\lib\
Copy SDL2-2.x.x\lib\x64\SDL2.dll into your project directory or into any directory which is in your PATH. When you want to distribute the compiled application, make sure the ship SDL2.dll right next to it, or it may not run, if the user doesn’t have the SDL2.dll lying around.

Creating a Rust project

If you haven’t installed it, yet, install Rust following the “Install Rust” guide.

Create a new Rust project by typing

cargo new sdl2-example

Then change into the newly created directory and type

cargo add sdl2 -F unsafe_textures

to add th SDL2 rust bindings.

We are going to use the unsafe_textures feature, even though we are not going to use any textures. Mainly because, if you do use textures, you will notice that without that option you are getting a lost of Rust lifetime issues. However, this comes with he downside, that you have to manage the texture objects yourself and make sure to call destroy, if you do not need them any longer. For more information about this feature see here.

You can also add other features which correspond to the different optional SDL2 libraries like gfx, mixer or tff.

When this was successful, you are ready to start developing with SDL2!

Opening a window

Let’s open our main.rs file and start by creating a window inside the main function (note, we adapt the return value of the main function slightly):

use sdl2::{event::Event, keyboard::Keycode};

pub fn main() -> Result<(), String> {
    let sdl_context = sdl2::init()?;
    let video_subsystem = sdl_context.video()?;

    let window = video_subsystem
        .window("rust-sdl2 example", 800, 600)
        .opengl()
        .build()
        .map_err(|e| e.to_string())?;

    let mut event_pump = sdl_context.event_pump()?;

    'main: loop {
        for event in event_pump.poll_iter() {
            match event {
                Event::Quit { .. }
                | Event::KeyDown {
                    keycode: Some(Keycode::Escape),
                    ..
                } => {
                    break 'main
                },
                _ => {}
            }
        }
    }

    Ok(())
}

Now let’s try to run the program

cargo run

If everything compiles and runs successfully, you will see an empty window. This is fine, we are not drawing anything, yet. But you should be able to close the program with the ESC key.

Here you can already see, that SDL2 is not a full game engine, you really have to do a lot of things yourself, like maintaining an event loop and mapping the events that SDL2 captures to something meaningful, like closing the program.

Drawing a rectangle

We are going to use build-in drawing functionalities from SDL2. They are suitable for drawing simple primitives and we will use one of them to draw a rectangle.

use sdl2::{event::Event, keyboard::Keycode, pixels::Color, rect::Rect};

pub fn main() -> Result<(), String> {
    let sdl_context = sdl2::init()?;
    let video_subsystem = sdl_context.video()?;

    let window = video_subsystem
        .window("rust-sdl2 example", 800, 600)
        .opengl()
        .build()
        .map_err(|e| e.to_string())?;

    let mut event_pump = sdl_context.event_pump()?;

    let mut canvas = window.into_canvas().build().map_err(|e| e.to_string())?;

    'main: loop {
        for event in event_pump.poll_iter() {
            match event {
                Event::Quit { .. }
                | Event::KeyDown {
                    keycode: Some(Keycode::Escape),
                    ..
                } => break 'main,
                _ => {}
            }
        }

        // Set the background
        canvas.set_draw_color(Color::RGB(255, 200, 0));
        canvas.clear();

        // Draw a red rectangle
        canvas.set_draw_color(Color::RGB(255, 0, 0));
        canvas.fill_rect(Rect::new(100, 100, 600, 400))?;

        // Show it on the screen
        canvas.present();
    }

    Ok(())
}

And that’s it!

Summary

As you can see, using SDL2 in Rust is as straightforward as in C and I hope this small introduction could serve as a starting point for your SDL2 and Rust adventures.

And here is a screenshot of our running program:

Ordinary least squares linear regression in Rust

2022-03-29T16:25:00+00:00

Introduction
Linear Regression
- What is linear regression
- Solution of the equation
Implementation in Rust
Example: Diabetes dataset
Summary
Next steps
References

Introduction

In contrast to the widespread use of Python and common machine learning packages like scikit-learn [1], there is an advantage in doing things from scratch. For example, learning how things work gives you an advantage in choosing the right algorithms for the job later down the line. We will start doing that with the simplest machine learning algorithm and maybe the most commonly used one: linear regression. In this article we are going to implement the so called ordinary least squares (OLS) [2] linear regression [3] in Rust [4]. We will show that with just a few lines of code it is possible to implement this algorithm from scratch. We will then work through an example and compare it with known results. During this work we will gain a better understanding of the concept behind the algorithm and we learn about the Rust package called nalgebra [5], which will help us with our linear algebra needs.

Linear Regression

What is linear regression

Linear regression [3] is used to model the relationship between a response/target variable and (multiple) explanatory variables or parameters. It is called linear, because the coefficients in the model are linear. A linear model for the target variable $y_i$ can be written in the form

\[y_i = \beta_{0} + \beta_{1} x_{i1} + \cdots + \beta_{p} x_{ip} + \varepsilon_i \, , \qquad i = 1, \ldots, n \; ,\]

where $x_{ip}$ are the explanatory variables and $\beta_{p}$ are unknown coefficients. The $\varepsilon_i$ are called error terms or noise and they capture all the other information that we cannot explain with the linear model.

It is much easier to work with these equations if one writes them in matrix form as

\[\mathbf{y} = X\boldsymbol\beta + \boldsymbol\varepsilon \, ,\]

where all the $n$ equations are squashed together. As we will see below, this notation is useful for deriving our method of determining the parameters $\boldsymbol\beta$. Note: Here we integrated the $\beta_0$ in the $\boldsymbol\beta$ and therefore $\boldsymbol\beta$ is now a $(p+1)$-dimensional vector and $\mathbf{X}$ is now a (n, p+1)-dimensional matrix, where we include a constant first column, i.e., $x_{i0}=1$ for $i = 1, \ldots, n$.

The goal is to get values for all the $\boldsymbol\beta$ fulfilling the equation above.

Solution of the equation

Ordinary least squares (OLS) [2] is, as the name suggests, a least squares method for find the unknown parameters $\boldsymbol\beta$ for a linear regression model. The idea is to minimize the sum of squares of the differences between the observed target variable and the predicted target variable coming from the linear regression model.

For the linear case the minimization problem possesses a unique global minimum and its solution can be expressed by an explicit formula for the coefficients $\boldsymbol\beta$:

\[\boldsymbol\beta = (\mathbf{X}^\mathbf{T}\mathbf{X})^{-1}\mathbf{X}^\mathbf{T}\mathbf{y}\]

As we can see here, we have to calculate a matrix inverse and we have to make some assumptions on the input values to guarantee that the solution exists and the matrix is invertible. One of these assumptions is, for example, that the column vectors in $\mathbf{X}$ are linearly independent.

If you are interested in the derivation of this solution, please take a look at the linked Wikipedia page or any good statistics book.

Implementation in Rust

Finally, we are at the point where we can start implementing the algorithm. As it is basically given by some matrix/vector multiplications and inversions, we have two choices: either we implement the matrix/vector options ourselves or we use a library. As I want to focus on the algorithm implementation, I chose to use a library (nalgebra [5]).

Before we start, we need to bring in the nalgebra functions that we are going to use

extern crate nalgebra as na;
use std::ops::Mul;

use na::{DMatrix, DVector};

Then we define the x values and the y values that we want to fit

let x_training_values = na::dmatrix![
    1.0f64, 3.0f64;
    2.0f64, 1.0f64;
    3.0f64, 8.0f64;
];
let y_values = na::dvector![2.0f64, 3.0f64, 4.0f64];

As you can see, we use two x variables, which means we are going to have a model of the form

\[y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} \; ,\]

Fitting the model can be performed as follows:

beta = x_values
    .tr_mul(&x_values)
    .try_inverse()
    .unwrap()
    .mul(x_values.transpose())
    .mul(y_values);

This also tries to calculate the inverse of the $X^TX$ matrix. As this can fail, if the matrix is not invertible (for example because the columns in $x$ are not linearly independent), in a production environment one should handle this gracefully and don’t use unwrap. For our coding example, however, it is good enough.

Getting predictions is then as simple as

let prediction = x_values.mul(beta);

The full source code for LinearRegression can be seen here (excuse the unwraps). It contains a bit more boilerplate, but also supports fitting without the intercept term, which can be useful if the data is already centered around its mean (in which case the intercept would be zero).

use std::ops::Mul;

use na::{DMatrix, DVector};

/// Ordinary least squares linear regression
///
/// It fits a linear model of the form y = b_0 + b_1*x + w_2*x_2 + ...
/// which minimizes the residual sum of squared between the observed targets
/// and the predicted targets.
pub struct LinearRegression {
    w: Option<DVector<f64>>,
    fit_intercept: bool,
}

impl LinearRegression {
    /// Returns a linear regressor using ordinary least squares linear regression
    ///
    /// # Arguments
    ///
    /// * `fit_intercept` - Whether to fit a intercept for this model.
    ///     If false assume that the data is centered, i.e., intercept is 0.
    ///
    pub fn new(fit_intercept: bool) -> LinearRegression {
        LinearRegression {
            w: None,
            fit_intercept,
        }
    }

    /// Fit the model
    ///
    /// # Arguments
    ///
    /// * `x_values` - parameters of shape (n_samples, n_features)
    /// * `y_values` - target values of shape (n_samples)
    pub fn fit(&mut self, x_values: &DMatrix<f64>, y_values: &DVector<f64>) {
        if self.fit_intercept {
            let x_values = x_values.clone().insert_column(0, 1.0);
            self._fit(&x_values, y_values);
        } else {
            self._fit(x_values, y_values);
        }
    }
    fn _fit(&mut self, x_values: &DMatrix<f64>, y_values: &DVector<f64>) {
        self.w = Some(
            x_values
                .tr_mul(&x_values)
                .try_inverse()
                .unwrap()
                .mul(x_values.transpose())
                .mul(y_values),
        );
    }
    pub fn coef(&self) -> &Option<DVector<f64>> {
        // TODO: Do not return 0th entry if fit_intercept is active
        return &self.w;
    }
    pub fn intercept(&self) -> Result<f64, String> {
        if !self.w.is_some() {
            return Err("Model was not fitted".to_string());
        }
        if self.fit_intercept {
            return Ok(self.w.as_ref().unwrap()[0]);
        }
        return Err("Model was not fitted with intercept".to_string());
    }

    /// Returns the predictions using the provided parameters `x_values`
    ///
    /// # Arguments
    ///
    /// * `x_values` - parameters of shape (n_samples, n_features)
    pub fn predict(&self, x_values: &DMatrix<f64>) -> Result<DVector<f64>, String> {
        if let Some(w) = self.w.as_ref() {
            if self.fit_intercept {
                let x_values = x_values.clone().insert_column(0, 1.0);
                let res = x_values.mul(w);
                return Ok(res);
            } else {
                let res = x_values.mul(w);
                return Ok(res);
            }
        }
        Err("Model was not fitted.".to_string())
    }
}

This is a crude version of this implementation and assumes fixed data types. For using it in a more convenient way, a bit more work has to be invested. If I decide to upload it to my GitHub page, I will update this blog post.

Example: Diabetes dataset

As an example we will follow the scikit-learn example about OLS. There they use a feature of the diabetes dataset [6] and perform linear regression on it. In addition, they calculate the mean squared error and the $R^2$ score. We will do the same and also try to fit the model using more features and see if it increases/decreases the score.

The first thing that we try it, is to take the full dataset (442 records, 10 feature variables $x$, standardized to have mean 0 and $\sum(x_i^2)=1$ and the last column is the variable that we want to predict) and try to fit it with our code.

We decide, as in the scikit-learn example, that we want to take bmi (3rd column in the dataset) as our feature variable and we want to predict y (the last column in the dataset). After reading them into an nalgebra matrix and vector, we can fit the model with

let mut model = LinearRegression::new(true);
model.fit(&x_values, &y_values);

where the parameter true indicates that we want to fit the intercept.

The result of the fit provides us with the model parameters

Coeffs: 949.435260383949
Intercept: 152.1334841628967

We can also calculate the scores based on the true values, which gives us

MSE of fit: 3890.456585461273
R^2 of fit: 0.3439237602253802

and we can plot the regression line as seen in Fig. 1.

Figure 1: Fit to the complete diabetes dataset. The purple circles indicate the true data points and the red line indicates the linear model.

We can also try it with additional features, let’s say taking not only the bmi, but also the cholesterol values ldl and hdl. For that model we get the scores

MSE of fit: 3669.2644919453955
R^2 of fit: 0.3812250059259711

which seems to be a slight improvement and the model seems to explain a bit more of the variance in the data.

As usually, you split the data into a training and a test set, we will do the same and follow exactly the scikit-learn example. This will serve as the final test! Can we reproduce the scikit-learn results with our code? In their example, they take the first 422 records as the training set and the last 20 records as the test set. We will do the same!

After reading the data and performing the split, we train our model again using the bmi. We find as the model parameters

Coeffs: 938.2378612512634
Intercept: 152.91886182616173

and as scores

MSE of fit: 2548.072398725972
R^2 of fit: 0.472575447982271

which is exactly as in the scikit-learn example. We can also reproduce the plot that they show (see Fig. 2).

Figure 2: Fit to the complete diabetes dataset. The purple circles indicate the true data points and the red line indicates the linear model. Here we show the result of the validation data set.

So, we did it. We just implemented (I have to admit, a quite simple) machine learning algorithm ourselves and made sure that the results are exactly the same as in one of the most used Python libraries out there.

Summary

This concludes this brief excursion into linear regression and demonstrates, that some things are not that complicated as they seem and it may make sense to implement some of those algorithms from scratch to get a better understanding what they do and what’s behind all the magic.

Next steps

Possible next steps from here could be:

Implementation of (stochastic) gradient descent linear regression
Look into classification models (perceptron, k-nearest neighbors, logistic regression)
Explore unsupervised methods, like support vector machines (SVMs)

References

[1] https://scikit-learn.org/stable/
[2] https://en.wikipedia.org/wiki/Ordinary_least_squares
[3] https://en.wikipedia.org/wiki/Linear_regression
[4] https://www.rust-lang.org/
[5] https://nalgebra.org/
[6] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) “Least Angle Regression,” Annals of Statistics (with discussion), 407-499, https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

A CI/CD pipeline with GitLab and Kubernetes - the simple way

2020-12-23T11:00:00+00:00

To speed up the development process for a new project, we were investigating the possibility to integrating our Kubernetes (k8s) cluster into our GitLab instance. It turned out, that all of the examples and tutorials we found were either way to complicated (examples repos, Medium articles), or not helpful at all because they omitted crucial parts (the GitLab documentation on deployment). So we decided to write an up-to-date tutorial.

This tutorial will cover how to integrate a running k8s cluster into GitLab (as a cluster not managed by GitLab), how to install the runner and, most importantly, how to write a .gitlab-ci.yml file which builds a Docker image, pushes it into the GitLab Container registry and does the deployment. What we will not cover, is the installation of the cluster or of the GitLab instance.

Do not fear, it is much easier than you think!

Disclaimer: we do not take any responsibility for bricked GitLab instances or k8s clusters!

Requirements

An up and running Kubernetes cluster and admin rights on it.
A current installation of GitLab (tested on 13.6 and 13.7) and a user with Admin permissions.
GitLab must be able to reach the Kubernetes API port.
An example project to build and deploy on the cluster with an initial k8s deployment ready (we will also provide an example deployment yaml for k8s if you shouldn’t have one).

Connecting GitLab with the k8s cluster

The first step is to enable GitLab to speak with our k8s cluster.

The following steps assume you are suing GitLab 13.6 or 13.7.

First go to the Admin area in your GitLab instance and the navigate to the Kubernetes section.
Click on Connect cluster with certificate.
Switch to the Connect existing cluster tab.
Enter your desired name for the cluster. This name will be used through GitLab to identify the cluster.
(Optional) specify which Environment scope the cluster is used for. This lets you split testing/staging/production environments into separate k8s cluster. Keep the default “*” if you are unsure.
Enter the API URL. It usually has the form https://some_host_name_or_address:6443.
On your k8s cluster type kubectl get secrets and find the line with the name of your default token. It has the form of default-token-.

Enter

 kubectl get secret default-token- -o jsonpath="{['data']['ca\.crt']}" | base64 --decode

where you replace default-token- with what you found with the command above.

You should get an output like that:
```
 -----BEGIN CERTIFICATE-----
 A LOT OF CHARACTERS
 -----END CERTIFICATE-----
```
Copy the whole output (including the “—” lines) and past it into the CA Certificate field.

Now we have to create a service account for GitLab on the cluster. Create a file gitlab-admin-service-account.yaml with the following contents:

  apiVersion: v1
  kind: ServiceAccount
  metadata:
  name: gitlab
  namespace: kube-system
  ---
  apiVersion: rbac.authorization.k8s.io/v1beta1
  kind: ClusterRoleBinding
  metadata:
  name: gitlab-admin
  roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
  subjects:
  - kind: ServiceAccount
  name: gitlab
  namespace: kube-system

and type

kubectl apply -f gitlab-admin-service-account.yaml

to apply it. The expected output is

serviceaccount "gitlab" created
clusterrolebinding "gitlab-admin" created

Type

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep gitlab | awk '{print $1}')

to get the token for that newly created account. Paste everything from the token key into the Service Token field.

Unselect GitLab-managed cluster because we do not want GitLab to manage the cluster.
Finally click on Add Kubernetes cluster and GitLab should now be able to talk to k8s.

Install the GitLab Runner onto the k8s cluster

This step is easy: Go to Admin Area - Kubernetes and click on your clusters name. On the Applications tab search for GitLab Runner and click Install. After a few seconds you should have an installed and fully integrated shared runner in your GitLab instance.

Writing a GitLab CI/CD configuration for deployment on the k8s cluster

For the next step you need an example project which you can pack into a Docker image and deploy on your cluster. Because we want to manage our deployments and yaml files for k8s in a separate repository, We usually create the deployment for the application once by hand and use GitLab to modify the deployment to roll out the newest version of the application.

Creating the initial deployment from a yaml file

The following snippet is a deployment declaration for a simple, generic application. We omitted all the additional things you may need, like service or ingress, because this would be beyond the scope of this article.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: my-app-namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        name: my-app
    spec:
      containers:
      - image: myapp:latest
        name: my-app

This assumes there is a namespace my-app-namespace were you can deploy to and that it does not need image pull secrets (see https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/ if you should need that).

After applying this deployment, we are good to go to create the CI/CD pipeline (the central part of this article).

Creating a CI/CD GitLab pipeline including k8s deployment

In contrast to all the examples we found, it is very easy to deploy a new version via a CI/CD pipeline if you have a GitLab integrated k8s cluster, because GitLab will provide the pipeline will the necessary credentials to deploy to the cluster.

stages:
  - build_image
  - deploy

create_docker_image:
  stage: build_image
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - mkdir -p /kaniko/.docker
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

deploy_production:
  stage: deploy
  when: manual
  dependencies:
    - create_docker_image
  image:
    name: bitnami/kubectl:latest
    entrypoint: [""]
  environment:
    name: production
    url: https://my-app.com
    kubernetes:
      namespace: my-app
  script:
    - kubectl set image deployment/my-app my-app=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - kubectl rollout status deployment/my-app --timeout=10s

This pipeline definition contains two very useful examples: The first is how to build a Docker image without Docker-in-Docker, Docker-from-Docker or any bare metal Docker installation and without any superuser rights. The awesome kaniko project provides a Docker compatible way to build Docker image from a Dockerfile inside a k8s cluster without compromising security. Here it is used to build the image for our application and to automatically push it into the GitLab Container registry.

The second part is the deployment part. We are using the bitnami/kubectl image which provides us with the kubectl command. The actual deployment is just two lines! How is that possible? Well, in contrast to many of the examples we found, you do not need to worry about the k8s connection and credentials anymore, because GitLab provides a fully working KUBECONFIG as environment variable and kubectl will automatically use this to connect to the cluster. GitLab will also make sure, that you are only modifying the namespace which is defined in the environment section of your yaml file. If you should need to modify deployments in other namespaces, you will have to go through the ordeal of providing your own credentials for the cluster.

Feel free to omit the second script line or increase the timeout. It is a useful command to make sure the pipeline fails when the deployment fails. If you are using a lot of replicas, large images or other settings which would make the rollout much slower, you will have to increase the timeout or the pipeline step will fail.

Running the pipeline

When you push something to the project repository, the first part, creating the Docker image, will always run. The second part, the deployment is marked as manual, i.e., it has to be triggered by hand via GitLab (Fig. 1). This is useful for production deployments. For testing you could automatically deploy, if you want.

Figure 1: Finished first step of the pipeline, building the Docker image.

To start the deployment click on the Play symbol on the right hand side and then select the stage you want to run. In our case this is “deploy_production” (see Fig. 2). This will start the deployment on the cluster.

Figure 2: Starting the deployment.

The output of the job should look similar to the output in the following image (Fig. 3).

Figure 3: Deployment job output.

Summary

We have shown how to integrate an existing k8s cluster into GitLab and how to use it for building and deploying an application. In contrast to many believes, this is much easier than doing it, for example, on a bare metal Docker installation. K8s already has a lot of advantages and together with GitLab it becomes very simple to automate deployments and build complete CI/CD pipelines.

Teaching an AI how to play the classic game Snake

2020-11-22T19:30:00+00:00

Introduction
Snake
The Snake environment
Training of the agent
Using it to play the game
Summary
References

Introduction

In this article we are going to use reinforcement learning (RL) [1] to teach a computer to play the classic game Snake [2] (remember the good old Nokia phones?). The game is implemented from scratch using Python including a visualization with PySDL2 [3]. We are going to use TensorFlow [4] to implement the actor-critic algorithm [5] which is then used to learn playing the game.

We will show that, with moderate effort, an agent can be trained which plays the game reasonably well.

Figure 1: The agent playing the game on a 10x10 board for 500 steps.

Snake

Snake is a name for a series of video games where the player controls a growing “snake”-like line. According to Wikipedia the game concept is as old as 1976 and because it is so easy to implement (but still fun!) a ton of different implementations exist for nearly every computer platform. Many of you will probably know Snake from Nokia phones. Nokia started putting a variant of Snake onto their mobile phones in 1998, which brought a lot of new attention to this game. I, for myself, have to admit to have spent too much time trying to feed that little snake on my Nokia 6310.

The gameplay is simple: The player controls a dot, square, or something similar on a 2d world. As it moves, it leaves a tail behind, resembling a snake. Usually the length of the tail depends on the amount of food the snake ate. As the goal is to eat as much food as possible to increase your score, the length of the snake keeps increasing. The player loses when the snake runs into itself or into the screen border.

The game can easily be implemented in a few lines of Python code and when you throw in a couple more, you can even make a simple visualization in PySDL2. Therefore, we implemented the game ourselves, as relying on other implementations or looking into using them, may have taken more time than just doing it ourselves. And: It’s a fun little project to code.

We modified the traditional behavior of an automatically moving snake to a snake, which only moves one step when it gets the next action. This may make it boring if you are playing as a human, but it saves boilerplate code to discretize the game output into separate observations again. We believe, that the agent is fast enough to handle also that time constraint, if you should really want to use it in the self-moving game.

In our implementation always one piece of food is placed randomly onto an empty field of the game. The initial length of the snake is one tile, therefore, the maximum score for a given game field size of $N_x$ x $N_y$ is $N_xN_y - 1$.

The Snake environment

In addition to the game itself, it is necessary to encapsulate the game in an environment suitable for machine learning. We need to be able to tell the game what to do (which step to perform next) and we need a way to get an observation describing the current game state.

It is necessary to have four discrete actions to control the snake:

Move up
Move left
Move down
Move right

Determining the best state representation for our agent took more experimenting: The first approach was to encode all the tiles of the field with its type in a one-dimensional tensor. While this did work well, we soon found out, that this is not easily generalizable, because the observation space grows or shrinks with the size of the game board and one cannot just train one model and reuse it on other board sizes. Then we came up with a different idea: Restrict the snakes visibility range to a certain amount of tiles around its head. This helps reducing the state space dramatically, it speeds up the learning and it lifts the restriction to a fixed field size. We decided, arbitrarily, that we are going to restrict its view to four tiles in each of the possible movement directions (for an example see the blue tiles in Figure 2).

Figure 2: The agent playing the game on a 16x16 board. The tiles in the agents view are colored in blue.

For the rewards per step we first tried it with

+10 when food was eaten
-0.5 when no food was eaten
-100 when game over (hitting a wall or itself)

which turned out to be a bad decision. Most of the time the agent just tried to stay on the same spot, being “afraid” of hitting a wall or itself, because that penalty was much larger than getting the slight penalty of not eating.

We then changed the rewards per step to

+1 when food was eaten
-0.01 when no food was eaten
-1 when game over (hitting a wall or itself)

which worked reasonably well.

Training of the agent

After a bit of experimentation, we decided on a model with one hidden layer and 512 neurons. We were able to use a learning rate of $1e-3$ through the whole training without running into instabilities. The discount factor was set to $\gamma=0.995$. Usually we stopped playing an episode when the agent reached 200 steps. Then the episode would end without negative reward.

We trained the model in four phases: In the first run, we used a 4x4 field and ran for 100k episodes to see if the agent is improving. Next we continued training on the same field size for additional 500k episodes. Then we switched to a larger game field of 8x8 and trained for another 500k episodes. In the final training phase we increased the number of maximal steps for each episode to 400 and increased the field size to 10x10.

In Figure 3 the evolution of the total moves per episode is shown. The number of moves tend to go up to the maximum of 200/400, but with strong fluctuations for the first two training runs, which took place on small game field. The fluctuations reduce in the third training run, were we switched to the larger board. In the last training run a drop in the total moves can be observed after 155k episodes.

Figure 3: Total moves the agent reached during the episodes of training (note: the episode stopped automatically after 200 moves in the first 3 runs and after 400 moves in the last run). Plots taken from TensorBoard. The different colors of the lines correspond to the different phases of training: orange - first run, dark blue - second run, red - third run and light blue - final run. The data is smoothed using the TensorBoard smoothing value of 0.9.

The total reward per episode (Figure 4) shows a clear trend of improving during training. but also that training beyond a certain number of episodes does not increase the total rewards any further, due to the saturation on the small game field. Increasing the game field allowed a rise in the total rewards again. The final run, on an even larger field, seemed to show the same behavior up to about 155k episodes, but then a massive drop in total rewards could be observed.

Figure 4: Total reward the agent reached during the episodes of training. Plots taken from TensorBoard. The different colors of the lines correspond to the different phases of training: orange - first run, dark blue - second run, red - third run and light blue - final run.

The final plot shows the running reward over the course of the training (Figure 5). The running reward $R$ at episode $i$ is calculated as

\[R_{i} = 0.01R^{(e)}_i + 0.99 R_{i}\]

where $R^{(e)}_i$ is the reward of the current episode $i$. Due to its definition its much smoother and a better indicator for the training progress of the model. The saturation due to limited field size is much clearer here and it seems to saturate between 11 and 12 for the 4x4 field, around 16 for the 8x8 field and around 21 for the 10x10 field. Judging from the fact, that the maximum score for the agent on a 4x4 field is 15 (taking into account the initial size of 1 tile for the snake), this is a good value, especially as our food placement is completely random and does not take into account if the food can be reached by the snake or not.

Figure 5: Running reward during training. Plots taken from TensorBoard. The different colors of the lines correspond to the different phases of training: orange - first run, dark blue - second run, red - third run and light blue - final run.

The most noticeable event happened during the fourth run. It seems we encountered the effect of catastrophic forgetting after approximately 150k episodes. Because of that we decided to take the model at the 150kth episode as the final model.

Using it to play the game

Already from the training it was clear that the agent learned to play the game quite well. It learned to cycle the game field and when it encounters food in its vision range it tries to catch it and it is taking into account that it is not allowed to move over its own tail.

Figure 6: A video of the agent playing snake on a 16x16 game field.

However, for larger game fields the agent can become stuck in a loop while searching for food, because if it cycles across the field without encountering food in his view, it will just continue cycling. But up to 16x16 boards, the agent works quite well.

Summary

We showed that for the simple game Snake a well working agent can be trained using the simple actor-critic algorithm with one hidden layer and 512 neurons. Once trained, the agent can play on different game field sizes from 4x4 up to 16x16 fields. Increasing the field sizes further, may lead to the agent getting stuck in a loop, depending on the (random) placement of the foods on the field.

Ways to improve the agent would be changing the view to not be just a cross, but a square, so that also food diagonally away from the agent can be seen. For larger fields an easy way to increase its efficiency is to increase the view range to values larger than the currently used four tiles in each direction. However, that will require longer training. Maybe a better solution can be found which lets the agent explore the field more efficiently.

Other ideas for continuing on that project would be to introduce more than one food on the field or walls inside the field. Both are easy to implement and, with enough training, the agent should be able to overcome these additional difficulties.

If you want to give it a try, the code is available on GitHub: https://github.com/torlenor/rlsnake

References

[1] https://en.wikipedia.org/wiki/Reinforcement_learning
[2] https://en.wikipedia.org/wiki/Snake_(video_game_genre)
[3] https://github.com/marcusva/py-sdl2
[4] https://www.tensorflow.org/
[5] A. Barto, R. Sutton, and C. Anderson, Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13 (1983), pp. 835–846.

How to install Linux Mint 20 on a Dell XPS 13 (9310)

2020-10-31T11:00:00+00:00

The new late-2020 Dell XPS 13 (9310) is one of the first notebooks based on Intel’s new Evo platform with Intel Core ix-11xx series which features the new integrated graphics architecture Iris Xe. Like the older Dell XPS 13 models, also this new one can be ordered preinstalled with Ubuntu Linux. This time, even in Austria! It comes preinstalled with Ubuntu 20.04, which may be already a good fit for many people. In my case, however, I got fond of the Cinnamon desktop and, therefore, I wanted to install Linux Mint on my new workhorse. It turned out to work quite smoothly, also due to the fact, that Ubuntu did a good job providing packages to support the new Intel platform.

Specs of my model:

Intel(R) Core(TM) i7-1165G7 (12 MB Cache, up to 4,7 GHz)
16 GB RAM
1 TB M.2-PCIe-NVMe-SSD
Killer(TM) Wi-Fi 6 AX1650 and Bluetooth 5.1 1
Non-glare InfinityEdge-Display without touch, 13,4” FHD+ (1.920 x 1.200) and 500 cd/m²

Preparations

First boot of the Dell XPS 13 (9310) Developer Edition with Ubuntu preinstalled.

Before I went to install Mint 20, I booted up the installed Ubuntu 20.04 and made sure everything was working. The system was pre-configured quite well and after answering questions about my name and the desired user name, I got logged in and it was ready to be used. I made sure wireless lan was working, updated everything to their latest versions and generated the recovery image for the pre-installed Ubuntu via the Dell Recovery application.

In addition, I backed up the directories

/etc
/usr/local
/opt

Especially backing up /etc is useful, because it contains the enabled package repositories, which we are going to use later.

Also downloading the current Linux Mint image and creating a USB flash drive were part of my preparations.

Important is also to go to http://archive.ubuntu.com/ubuntu/pool/main/l/linux-meta-oem-5.6/ and http://archive.ubuntu.com/ubuntu/pool/main/l/linux-firmware/ to download the packages

linux-headers-oem-20.04_5.6.0.1032.28_amd64.deb
linux-image-oem-20.04_5.6.0.1032.28_amd64.deb
linux-oem-20.04_5.6.0.1032.28_amd64.deb
linux-oem-5.6-headers-5.6.0-1032_5.6.0-1032.33_all.deb
linux-firmware_1.187.3_all.deb

or their respective latest versions. We are going to need them to get WiFi working.

Installing Linux Mint

Plug in the USB flash drive, if necessary using the adapter to USB-C delivered with your XPS 13, and boot up the laptop. Shortly after pressing the power-on button, keep hammering on the F12 key to get into the boot menu. Select your USB drive and boot up the Linux Mint live system. Then install Linux Mint either by replacing Ubuntu or, as I did, alongside Ubuntu 20.04. During the installation Linux Mint will ask you if you want to install 3rd party multimedia libraries. If you do, you will have to enter a secure boot password which will be asked from you the next time you reboot. Do that and remember the password.

When the installation is finished reboot and you should be, after acknowledging the secure boot changes by entering the password you set during installation, prompted with the Linux Mint login.

Getting WiFi to work and install OEM components

The first step will be to get WIFi to work, because what use is the best laptop if you cannot watch cat videos on YouTube with it?

Now the previously downloaded files come into play: Get the downloaded files onto your laptop, for example via USB flash drive, and open a terminal. Change into the directory were you copied the files to and type

sudo apt install ./linux-*

to install all the packages. When the installation is finished reboot the laptop and when it comes back on you should have wireless lan.

After connecting to your WiFi you should add a few OEM repositories coming from Canonical to enable full support of the new hardware. If you copied the etc directory of your Ubuntu installation, or if you still have the Ubuntu installation on a separate partition, copy the files

somerville-dla-team-ubuntu-ppa-bionic.list
focal-oem.list
oem-somerville-bulbasaur-meta.list

from _your_etc_backup_/apt/sources.list.d/ to /etc/apt/sources.list.d/.

If you do not have the files backed up, no problem, here is their content:

/etc/apt/sources.list.d/somerville-dla-team-ubuntu-ppa-bionic.list:

 deb http://ppa.launchpad.net/somerville-dla-team/ppa/ubuntu bionic main
 # deb-src http://ppa.launchpad.net/somerville-dla-team/ppa/ubuntu bionic main
 # deb-src http://ppa.launchpad.net/somerville-dla-team/ppa/ubuntu bionic main

/etc/apt/sources.list.d/focal-oem.list:

 deb http://oem.archive.canonical.com/ focal oem
 #deb-src http://oem.archive.canonical.com/ focal oem

/etc/apt/sources.list.d/oem-somerville-bulbasaur-meta.list:

 deb http://dell.archive.canonical.com/ focal somerville
 # deb-src http://dell.archive.canonical.com/ focal somerville
 deb http://dell.archive.canonical.com/ focal somerville-bulbasaur
 # deb-src http://dell.archive.canonical.com/ focal somerville-bulbasaur

When you have copied or created the files, type

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F992900E3BBF9275 F9FDA6BED73CDC22 F9FDA6BED73CDC22 78BD65473CB3BD13
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install ubuntu-oem-keyring oem-somerville-meta oem-somerville-bulbasaur-meta

to install the rest of the OEM packages for your laptop and update everything to the latest version.

For good measures reboot once more and your laptop should be ready to go with Linux Mint 20 and everything, including WiFi and support for the Iris Xe graphics, should now work as it did with the pre-installed Ubuntu 20.04 installation.

Enjoy!

My finished Linux Mint 20 desktop on the Dell XPS 13 (9310).

Tackling the game Kalah using reinforcement learning - Part 1

2020-10-23T08:00:00+00:00

Update (2020-11-03): The code is now available on GitHub: https://github.com/torlenor/kalah

Introduction
Kalah
Classic agents
Reinforcement learning agents
- REINFORCE algorithm
- Actor-critic algorithm
Training of the RL agents
Comparison
Summary
Outlook
References

In this article series we are going to talk about reinforcement learning (RL) [1], an exciting and one of the three major parts of machine learning, besides supervised (see Predicting the outcome of a League of Legends match for an example) and unsupervised learning. The idea behind RL is to train a model, usually called an agent, to take actions in an environment so that the cumulative reward over time, which must not necessarily mean real time, is maximized. In contrast to supervised learning, in RL the agent is not fed with labels and is not told what is the “correct” move, but the idea is, that the agent learns by itself in the given environment solely by providing an observation/state of the environment and the gained/lost reward after a taken action.

Here we will use this approach to tackle the game Kalah [2]. To mix things up a little, this time we are going to use PyTorch [3] as our library of choice.

We will show that it is possible to train an RL agent to play better than hard-coded approaches. In the last section we will give an outlook on improvements to the algorithms and what other approaches we could use.

Introduction

Reinforcement learning (RL) is one of the exiting fields of machine learning (ML) which gained popularity over the last years do to advances in computer performance, algorithms and because of the involvement of big technology companies. The research on learning ATARI games by Google’s DeepMind Technologies [4,5] and their subsequent proof that RL can beat humans in go [6], chess and shogi showed that RL can be a powerful tool which will find its way out of toy models into real world applications sooner or later. Even learning complex computer games, like Dota 2 [7] or Starcraft II [8], are no longer just visions, but under certain controlled conditions this is already possible.

Here we will first introduce the game Kalah, followed by implementations of some classic agents for the game, which will serve as our baseline and as a sparing partner for our machine learning models. Afterwards, we will present two simple RL agents for Kalah, show how to train them and compare them to the classic agents. We will show, that it is possible to have simple reinforcement learning models to learn the game Kalah and outperform classic agents, though currently restricted to smaller game boards.

Kalah

Kalah [2] is a two-player game in the Mancala family invented by William Julius Champion, Jr. in 1940.

The game is played on a board and with a number of “seeds”. The board has a certain number of small pits, called houses, on each side (usually 6, but we will also use 4) and a big pit, called the end zone, at each end. The objective of the game is to capture more seeds than your opponent.

Figure 1: A Kalah board $(6,6)$ in the start configuration.

There are various rule sets available and we will take the rule set which is considered standard. It is summarized in the following:

1) You start with 4 or 6 (or whatever you agree on) number of seeds in every of the players pits.

2) The players take turns “sowing” their seeds. The current player takes all the seeds from one of its pits and places them, one by one counter-clockwise into each of the following pits, including its own end zone pit, but not in the opponents end zone pit.

3) If the last sown seed lands in an empty house owned by the current player, and if the opposite house contains seeds, all the seeds in the pit where he placed the last seed and the seeds in the opposite pit belongs to the player and shall be placed into its end zone.

4) If the last sown seed lands in the player’s end zone, the player can take an additional move.

5) When a player does not have any more seeds in its pits, the game ends and the opposing player can take all its remaining seeds and place it in its end zone.

6) The player with the most seeds in its end zone wins.

For many variants of the game it was shown that the first player has a strong advantage when both are playing a perfect game. However, for the $(N_\text{pits}, N_\text{seeds}) = (6,6)$ variant, this is not yet that clear how big the advantage is. There are also additional rules which can mitigate that advantage, but we will not go into detail and if you are interested in that, feel free to consult Wikipedia.

In this article we are going to play with the $(4,4)$, $(6,4)$ and $(6,6)$ variants.

Classic agents

Before we talk about reinforcement learning approaches to playing Kalah, we first present classic agents which will serve as our baseline in the comparisons and which will be used for training. For most of its variations Kalah is a solved game were the first player would win if we would use pre-computed move databases with perfect moves, but we are not going to use them here.

Random agent

This agent, as the name suggests, will randomly choose a move out of all valid moves. This is the simplest approach we can take and it can be implemented essentially with just one line of Python code.

MaxScore agent

The idea behind this agent is, that it will always take the move which gives him the highest score. This can either be a move which will let him sow a seed into its own end zone, or, ideally, it will be a move were it can steel the opponents seeds by hitting an empty pit on its own side of the board.

MaxScoreRepeat agent

The base strategy for this agent is the same as the MaxScore agent. The difference is, that it will prefer a move were it will hit its own end zone with its last seed, meaning that it can take another move. This is implemented in such a way to exploit the possibility of having more than one additional move if the board permits that. This can easily be implemented by always taking a look at the possible moves starting from the left of the board going right and picking the first where a repeating play is possible.

Minimax agent

The minimax algorithm [9] is a very common decision rule in game theory, statistics and many other fields. One tries to minimize the possible loss for a worst case (maximum loss) scenario.

The pseudo code for the algorithm (take from Wikipedia) is given by:

function minimax(node, depth, maximizingPlayer) is
    if depth = 0 or node is a terminal node then
        return the heuristic value of node
    if maximizingPlayer then
        value := −∞
        for each child of node do
            value := max(value, minimax(child, depth − 1, FALSE))
        return value
    else (* minimizing player *)
        value := +∞
        for each child of node do
            value := min(value, minimax(child, depth − 1, TRUE))
        return value

If not otherwise specified, we will use a minimax depth of $D_{max}=4$. In addition we implement alpha-beta pruning [10] to speed up the calculations.

Reinforcement learning agents

Figure 2: Reinforcement learning. Courtesy of Wikipedia.

Reinforcement learning (RL) is a branch of machine learning dealing with the maximization of cumulative rewards in a given environment. When talking about RL models running in such an environment one is usually talking about agents, a notion we already introduced in the sections above. Reinforcement learning does not need labelled inputs/outputs and the environment is typically sketched as a Markov decision process (MDP).

Usually the way RL works is shown in Figure 2: An agent takes action in a given environment, the action leads to a reward (positive or negative) and a representation of the state of the environment (in our case the Kalah board). The reward and the state are fed back into the agent model.

REINFORCE algorithm

There are many different approaches to reinforcement learning. In our case, we will take, in my opinion, the most straightforward and easy to gasp approach: Policy gradients.

In the policy gradient method, we are directly trying to find the best policy (something which tells us what action to choose in each step of the problem). The algorithm we are going to apply is named REINFORCE and was described in [11] and a good explanation and implementation can be found in [12]. Additionally, a good overview of different algorithms, including REINFORCE, is presented at: https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#reinforce

Here we are going to briefly outline the idea behind the algorithm:

1) Initialize the network with random weights.

2) Play an episode and save its $(s, a, r, s’)$ transition.

3) For every step $t=1,2,…,T$: Calculate the discounted reward/return

\[Q_t=\sum^\infty_{k=0}\gamma^kR_{t+k+1}\]

where $\gamma$ is the discount factor. $\gamma = 1$ means no discount, all time steps count the same, and $\gamma < 1$ means higher discounts.

4) Calculate the loss function

\[L=-\sum_tQ_t\ln(\pi(s_t,a_t))\]

5) Calculate the gradients, use stochastic gradient decent and update the weights of the model, minimizing the loss (therefore, we need the minus sign in step 4 in front of the sum).

6) Repeat from step 2 until problem is considered solved.

$s$ is a state, $s’$ is the new state after taking action $a$ and $r$ is the reward obtained at a specific time step.

An example implementation in PyTorch can be found here, solving the CartPole problem.

Actor-critic algorithm

Figure 3: Sketch of the actor-critic model structure.

In case of the actor-critic algorithm [13] a value functions in learned in addition of the policy. This helps reducing the gradient variance. Actor-critic methods consist of two models, which may optionally share parameters:

The Critic updates the value function $V_\omega$ parameters $\omega$.
The Actor updates the policy parameters $\theta$ for $\pi_\theta(s,a)$ in the direction suggested by the critic.

An example implementation in PyTorch can be found here.

Training of the RL agents

Training the RL agents turned out to be a challenge. After tuning $\gamma$, learning rate and rewards we were finally able to get an improving REINFORCE agent with win rates over 80%. Usually the agent had no problem to learn what moves are invalid and it usually had invalid moves below $5%$, but it had troubles learning a good policy for actually winning games against the classic agents. With the actor-critic agent it was easier to find parameters for which the algorithm converged, at least on $(4,4)$ boards.

Figure 4: Example for the evolution of win rate as a function of the training episode during training of the actor-critic agent on a $(4,4)$ board.

For the rewards we settled in the end with

Get number seeds placed into own house as rewards minus 0.1 (to make it less favorable to gain no points)
For a win get +10
For a loss get -10
For an invalid move get -5 and the game is over

It also turned out that it was hard to train against the random agent. Training worked best against the MaxScore and MaxScoreRepeat agents and in the end we settled with the MaxScoreRepeat agent for training of the AC and REINFORCE agents.

Training on larger boards/boards with more seeds, i.e., $(6,4)$ and $(6,6)$, did not lead to a high enough win rate with, neither the AC, nor the REINFORCE agent, even after tuning the parameters or after trying with various random seeds. We may need improvements to the models, which we are going to discuss in the Outlook section and hopefully we will be able to produce well trained agents for the larger boards, too.

Comparison

For the comparison we let every agent play $N=1000$ games against every other agent, including itself, with the exception of the RL agents, as currently it can only play as player 1. Updating the environment, so that it is possible to play as player 2 is part of the planed improvements. Draws are not taken into account when calculating the win rate.

In Table 1 we compare the classic agents against the RL agents on a $(4,4)$ board. From the classic agents the random agent performed worst, but a slight advantage for player 1 can be seen there, which may be related to the advantage the player 1 has in Kalah. The MaxScore agent performed already reasonably well with just a few lines of code. It can easily beat random chance and if played against itself also a slight advantage for player 1 is visible. The MaxScoreRepeat agent improved the scores even further and is only beaten more often by the Minimax agent. The Minimax agent clearly is the best classic agent, winning most of the games against the other agents. The reinforcement agents did perform reasonably well themselves. Especially the AC agent was able to outperform the classic agents including the Minimax agent.

Table 1: Comparison of classic and RL agents on a $(4,4)$ board. Shown is the average win percentage of player 1 (rows) vs. player 2 (columns) after playing $N=1000$ games.
vs	Random	MaxScore	MaxScoreRepeat	Minimax
Random	50.54	21.37	15.13	17.63
MaxScore	82.84	53.02	23.68	19.58
MaxScoreRepeat	87.94	84.71	67.56	48.23
Minimax	86.57	82.87	74.68	59.15
Reinforce	84.01	87.50	77.64	39.77
ActorCritic	89.60	90.60	88.88	64.36

The comparison on the larger board with six bins each side and four seeds in each bin, i.e., $(6,4)$ in our notation, must be done without the RL agents, because, as we discussed in the previous section, we were not able to train a well-performing RL agent for larger boards. However, we are still comparing the classic agents for the larger boards. The biggest difference to the smaller board is that player 1 has a much higher win rate in case of the first three agent types. For minimax it is not so clear and the performance seems to be en-par with the performance on the smaller board, with the exception of the matchup against the MaxScoreRepeat agent, where the Minimax agent performed worse, but still winning more than half of the games.

Table 2: Comparison of classic agents on a $(6,4)$ board. Shown is the average win percentage of player 1 (rows) vs. player 2 (columns) after playing $N=1000$ games.
vs	Random	MaxScore	MaxScoreRepeat	Minimax
Random	50.05	3.68	0.51	3.13
MaxScore	96.74	56.94	5.25	12.83
MaxScoreRepeat	99.30	97.78	74.05	73.57
Minimax	98.19	91.91	59.69	64.05
Reinforce	N/A	N/A	N/A	N/A
ActorCritic	N/A	N/A	N/A	N/A

On the $(6,6)$ the results, Table 3, look more similar to the $(4,4)$ board again, except for the random agent. The Minimax agent was still performing well against the other agents despite the depth of only $D_{max}=4$ which we used.

To see how a larger depth for the minimax algorithm changes things, we did another calculation on a $(6,6)$ board, but this time with $D_{max}=6$. This version of the Minimax agent is depicted as “Minimax 6” in Table 3. It drastically increased the calculation time, but did further improve the win percentage of the Minimax agent.

Table 3: Comparison of classic agents on a $(6,6)$ board. Shown is the average win percentage of player 1 (rows) vs. player 2 (columns) after playing $N=1000$ games. Minimax 6 uses a maximum depth of 6 for the minimax algorithm.
vs	Random	MaxScore	MaxScoreRepeat	Minimax	Minimax 6
Random	49.95	2.83	1.30	0.70	1.80
MaxScore	97.39	53.06	14.81	14.51	14.42
MaxScoreRepeat	99.20	93.47	64.42	50.05	42.20
Minimax	97.90	91.47	76.51	65.13	N/A
Minimax 6	99.40	94.24	82.47	N/A	67.43
Reinforce	N/A	N/A	N/A	N/A	N/A
ActorCritic	N/A	N/A	N/A	N/A	N/A

Summary

In this first part of what’s, hopefully, going to be a series of posts, we discussed how to play the board game Kalah with classic agents and how reinforcement learning can be used to successfully learn the game and win against classic agents, at least on small enough game boards. We also found that the training of a reinforcement model tends to be hard and a lot of hyperparameters tuning, i.e., fiddling with parameters, including the discount factor, rewards and learning rates, can be necessary. However, even though we implemented only two simple reinforcement algorithms, REINFORCE and actor-critic, it worked quite well. There is also lots of room for improvements, which may lead to even better performing RL agents.

Outlook

The next step will be implementing improved versions of REINFORCE. Especially we want to batch together episodes in the update step which should reduce the variance, i.e., should allow for a much more stable model over training time, and hopefully will lead to an improved performance and an easier trainable model. In addition, we will look into improvements to the actor-critic method. Especially we will see how advantage actor-critic (A2C) and asynchronous advantage actor-critic (A3C) models are implemented and how they perform in comparison to our current agents.

For the training process itself we are considering moving away from training always against one type of classic agent more to a heterogeneous approach were we train against various types of agents, which should hopefully improve the overall performance of the RL agents.

From the implementation point of view improvements to the environment and to the Kalah board will be made to allow the RL agent to play as the second player. We should also refactor the current code base so that it is easier to plot various metrics of the machine learning process and also to make it easier exporting these plots.

A distant goal will also be to make this implementation more user friendly so that a human player can easily play against the agents, or even better, a graphical/web interface….

References

[1] https://en.wikipedia.org/wiki/Reinforcement_learning
[2] https://en.wikipedia.org/wiki/Kalah
[3] https://pytorch.org/
[4] Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; Graves, Alex; Antonoglou, Ioannis; Wierstra, Daan; Riedmiller, Martin (19 December 2013). “Playing Atari with Deep Reinforcement Learning”. arXiv:1312.5602.
[5] Adrià Puigdomènech Badia; Piot, Bilal; Kapturowski, Steven; Sprechmann, Pablo; Vitvitskyi, Alex; Guo, Daniel; Blundell, Charles (30 March 2020). “Agent57: Outperforming the Atari Human Benchmark”. arXiv:2003.13350.
[6] Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis (28 January 2016). “Mastering the game of Go with deep neural networks and tree search”. Nature. 529 (7587): 484–489. https://doi.org/10.1038/nature16961.
[7] Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, C., Pachocki, J.W., Petrov, M., Pinto, H.P., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., & Zhang, S. (2019). Dota 2 with Large Scale Deep Reinforcement Learning. ArXiv:1912.06680.
[8] Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019). https://doi.org/10.1038/s41586-019-1724-z
[9] https://en.wikipedia.org/wiki/Minimax
[10] https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning
[11] Williams, Ronald J. “Simple statistical gradient-following algorithms for connectionist reinforcement learning.” Reinforcement Learning. Springer, Boston, MA, 1992. 5-32.
[12] Lapan, Maxim. “Deep Reinforcement Learning Hands-On”, Second Edition, Packt, Birmingham, UK, 2020, 286-308.
[13] A. Barto, R. Sutton, and C. Anderson, Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13 (1983), pp. 835–846.

Predicting the outcome of a League of Legends match 10 minutes into the game with 70% accuracy

2020-07-11T14:00:00+00:00

Introductory remarks
Retrieving data from MongoDB and data preprocessing
Model definition and fitting
Predicting Win/Loss
Summary and discussion
Appendix
References

Predicting the outcome of a sports match just after a few minutes into the game is an intriguing topic. Wouldn’t it be great to know for certain that your favorite football team will win before the game is finished playing? Or betting on your Formula 1 driver to win and being right about it most of the time could earn you a lot of money. While this is not so easily done for regular sports, it can be done for games which heavily depend on the history of the current match, i.e., on things which happened before in the match. Our most promising candidate to develop a model for such a study is Riot Games’ League of Legends (LoL).

In this article we will show how to access data from a MongoDB which was fetched from Riot’s LoL API, how to process it so that it is usable for modeling, define useful features and how to train an eXtreme Gradient Boosting (XGBoost) model [1]. Using that model we will show that it is possible to predict the outcome of 5v5 Solo Queue match played on Summoner’s Rift after just 10 minutes into the match with 70 % accuracy. We will use data from game version 10.13.

The article is written in a hands-on way and we are going to show code examples. As a language of choice we took Python 3 with the libraries matplotlib [2], numpy [3], pandas [4], pymongo [5], seaborn [6], sklearn [7], xgboost [8].

Introductory remarks

League of Legends (LoL) is a multiplayer online battle arena (MOBA) game. Players compete in matches (in the mode we are looking at 5 vs 5) which can last from 20 to 50 minutes on average (see Appendix for more details on that). Teams have to work together to achieve victory by destroying the core building (called the Nexus) of the enemy team. Until they get there, they have to destroy towers and get past the defense lines of the enemy team without falling victim of losing their own Nexus in the process.

The players control characters called champions which are picked at the beginning of a match from a rich pool of different champions with their own set of unique abilities. During the match the champions will level up and gain additional abilities. They also have to accumulate gold to buy equipment. If a champion is killed it will not permanently die, but just removed from the battle field for a certain amount of time (which grows longer the longer the match is running).

To fetch the data we are using alolstats which provides functionality to fetch match data from Riot’s API and to store it in a MongoDB collection. It would also feature basic statistical calculations and provide a convenient REST API, but for this project only the ability to fetch and store match data is from importance.

The match data, besides other information, contains timeline information in 0-10 min, 10-20 min, 20-30 min and 30-end min slots for each participant (10 in total, 5 for each team) and we are going to use this data in the modeling approach as features. The prediction target is going to be if team 1 wins the game or now.

As a regression model we are going to use the XGBoost model [1] on approx. 50,000 matches.

Retrieving data from MongoDB and data preprocessing

Fetching data from a MongoDB is really simple with Python. With just a few lines of code you are receiving a cursor pointing to the data which can be used to iterate through the results. We are taking the results and put them directly into Pandas DataFrames, which may not be the best if we would have a very large collection, but it will do for our data set.

Fetching meta information about the matches from the MongoDB collection (we are filtering for the correct mapid, queueid and gameversion here) can be done via:

import pymongo
import pandas as pd

game_version = "10.13.326.4870"

connection = pymongo.MongoClient("mongodb://[redacted]:[redacted]@localhost/alolstats")
db = connection.alolstats

matches_meta = db.matches.aggregate([
    { "$match": {"gameversion": game_version, "mapid": 11, "queueid": 420}},
    { "$unset": ["teams","participants", "participantidentities"] },
])

df_matches_meta = pd.DataFrame(list(matches_meta))
df_matches_meta = df_matches_meta.set_index("gameid")

We will perform the same for the timeline data, but this needs a bit more effort as we have to flatten the embedded documents that we are receiving from our MongoDB collection:

def flatten_nested_json_df(df):
    # Thanks to random StackOverflow user for that piece of code
    df = df.reset_index()

    # search for columns to explode/flatten
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()

    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            # explode dictionaries horizontally, adding new columns
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            # explode lists vertically, adding new columns
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        # check if there are still dict o list fields to flatten
        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
        
    return df

df_matches_participant = []
for i in range(0,10,1):
    print("Fetching general infos for participant " + str(i+1) + " of 10")
    m = db.matches.aggregate([
        { "$match": {"gameversion": game_version, "mapid": 11, "queueid": 420}},
        { "$addFields": { "participants.gameid": "$gameid" } },
        { "$replaceRoot": { "newRoot": {"$arrayElemAt": [ "$participants", i] }  }  },
        { "$sort" : { "gameid" : 1, "participantid": 1 } },
    ], allowDiskUse = True )
    df_matches_participant.append(flatten_nested_json_df(pd.DataFrame(list(m))).set_index("gameid"))

We are ending up with data for each participant of the match, which we can further process to filter out only required columns and limit our features to only timeline fields for 0-10 min:

# Join all participants data into columns so that we have one line per game
X_participants = df_matches_participant[0].join(df_matches_participant[1], lsuffix="_p0", rsuffix="_p1")
for p in range(2,10,1):
    X_participants = X_participants.join(df_matches_participant[p], rsuffix="_p"+str(p))

X_participants_timeline_0_10 = X_participants.filter(regex=("teamid|timeline.*0-10.*"))

# Drop all Diffs between the players on the same lane, we do not want them
X_participants_timeline_0_10 = X_participants_timeline_0_10[X_participants_timeline_0_10.columns.drop(list(X_participants_timeline_0_10.filter(regex='diff')))]

y = pd.DataFrame(df_matches_team1[df_matches_team1["teamid"] == 100]["win"])
y.rename(columns={"win": "team1_did_win"}, inplace=True)
Xy = pd.concat([X_participants_timeline_0_10, y], axis=1)
Xy = X_participants_timeline_0_10.join(y)
Xy = Xy[Xy["team1_did_win"].isnull() == False]

# Final data set for prediction variable...
y_final= Xy["team1_did_win"]
# ... and for features, we drop all data sets were we do now know who one
X_final = Xy.drop('team1_did_win', axis=1)

The final data sets look like this now:

X_final (first 5 lines):

	teamid_p0	timeline.creepspermindeltas.0-10_p0	timeline.xppermindeltas.0-10_p0	timeline.goldpermindeltas.0-10_p0	timeline.damagetakenpermindeltas.0-10_p0	teamid_p1	timeline.creepspermindeltas.0-10_p1	timeline.xppermindeltas.0-10_p1	timeline.goldpermindeltas.0-10_p1	timeline.damagetakenpermindeltas.0-10_p1	teamid	timeline.creepspermindeltas.0-10	timeline.xppermindeltas.0-10	timeline.goldpermindeltas.0-10	timeline.damagetakenpermindeltas.0-10	teamid_p3	timeline.creepspermindeltas.0-10_p3	timeline.xppermindeltas.0-10_p3	timeline.goldpermindeltas.0-10_p3	timeline.damagetakenpermindeltas.0-10_p3	teamid_p4	timeline.creepspermindeltas.0-10_p4	timeline.xppermindeltas.0-10_p4	timeline.goldpermindeltas.0-10_p4	timeline.damagetakenpermindeltas.0-10_p4	teamid_p5	timeline.creepspermindeltas.0-10_p5	timeline.xppermindeltas.0-10_p5	timeline.goldpermindeltas.0-10_p5	timeline.damagetakenpermindeltas.0-10_p5	teamid_p6	timeline.creepspermindeltas.0-10_p6	timeline.xppermindeltas.0-10_p6	timeline.goldpermindeltas.0-10_p6	timeline.damagetakenpermindeltas.0-10_p6	teamid_p7	timeline.creepspermindeltas.0-10_p7	timeline.xppermindeltas.0-10_p7	timeline.goldpermindeltas.0-10_p7	timeline.damagetakenpermindeltas.0-10_p7	teamid_p8	timeline.creepspermindeltas.0-10_p8	timeline.xppermindeltas.0-10_p8	timeline.goldpermindeltas.0-10_p8	timeline.damagetakenpermindeltas.0-10_p8	teamid_p9	timeline.creepspermindeltas.0-10_p9	timeline.xppermindeltas.0-10_p9	timeline.goldpermindeltas.0-10_p9	timeline.damagetakenpermindeltas.0-10_p9
gameid
317415113	100	1.5	230.2	157.1	318.2	100	0.0	377.4	324.2	849.2	100	6.1	266.1	230.2	341.0	100	6.3	368.7	281.9	663.4	100	7.2	431.5	258.7	482.1	200	5.4	398.7	233.3	492.3	200	7.3	511.6	388.1	450.8	200	7.8	330.8	443.3	351.9	200	0.1	335.3	342.0	776.1	200	1.3	328.2	235.2	205.3
317416566	100	7.6	338.3	263.9	205.6	100	0.3	274.1	155.2	171.7	100	4.6	354.8	200.6	419.5	100	0.2	260.6	230.4	441.6	100	8.3	499.8	394.3	385.4	200	7.0	299.2	240.6	157.8	200	0.1	278.3	132.2	153.1	200	3.7	371.2	201.5	502.1	200	8.3	517.9	335.8	211.8	200	0.5	298.1	258.8	694.8
317418523	100	5.9	314.5	215.4	397.8	100	7.4	481.4	259.2	146.6	100	0.0	310.0	307.0	758.9	100	0.7	218.8	153.1	241.2	100	5.0	398.7	196.5	434.8	200	7.3	489.0	304.2	399.2	200	1.3	263.4	201.9	269.0	200	6.8	425.8	301.1	287.3	200	0.6	364.5	365.7	576.5	200	9.0	352.6	353.6	259.6
317419849	100	7.3	423.2	249.0	395.0	100	0.0	276.4	144.3	24.8	100	6.9	298.6	270.8	268.5	100	0.4	311.2	358.8	864.9	100	5.9	461.4	400.6	519.4	200	1.2	241.1	167.3	219.3	200	7.5	344.2	292.1	252.3	200	3.9	348.6	237.9	635.3	200	0.2	373.8	366.0	616.8	200	6.1	435.1	315.6	148.0
317425382	100	0.2	183.0	177.8	250.7	100	7.5	499.0	304.2	436.4	100	0.2	344.9	348.8	671.2	100	7.0	444.3	235.5	310.3	100	4.0	341.9	385.6	438.5	200	0.4	362.0	269.1	676.1	200	0.8	271.6	286.5	334.9	200	5.7	273.7	290.1	550.2	200	3.8	365.2	177.4	485.9	200	8.4	539.3	294.1	284.2

y_final (first 5 lines):

  gameid
  317415113    Fail
  317416566    Fail
  317418523    Fail
  317419849    Fail
  317425382    Fail
  Name: team1_did_win, dtype: object

We want to split our data set into training, validation and test sets, to validate the model and to later on test the model on test data. This can be easily accomplished with sklearn train_test_split function:

from sklearn.model_selection import train_test_split
X_tmp, X_test, y_tmp, y_test = train_test_split(X_final, y_final, train_size=0.8, test_size=0.2, random_state = 0)
X_train, X_valid, y_train, y_valid = train_test_split(X_tmp, y_tmp, train_size=0.8, test_size=0.2, random_state = 0)
del X_tmp, y_tmp

No data set is perfect and there are NaN values in the data sets and we have to fill them (the other possibility would be to drop the columns entirely, but we would lose a lot of data). It turns out that Riot seems to set certain fields to NaN if they could not determine certain metrics for a player in that time frame. It is clear that it will not be normally distributed data and we should not use the mean to fill the missing data points. It would be a possibility to take the median to fill the data, but even better works to just set the value to zero. We will use the sklearn SimpleImputer to perform this step:

from sklearn.impute import SimpleImputer

my_imputer = SimpleImputer(strategy='constant', fill_value=0.0)
imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))
imputed_X_test = pd.DataFrame(my_imputer.fit_transform(X_test))

# Imputation removed column names; put them back
imputed_X_train.columns = X_train.columns
imputed_X_valid.columns = X_valid.columns
imputed_X_test.columns = X_test.columns

# Imputation removed indices; put them back
imputed_X_train.index = X_train.index
imputed_X_valid.index = X_valid.index
imputed_X_test.index = X_test.index

The last step left to do is to encode our prediction target, which is “Win” or “Fail”, to something numeric which can be used in our model. We perform this encoding with the LabelEncoder:

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
label_y_train = label_encoder.fit_transform(y_train)
label_y_valid = label_encoder.transform(y_valid)

Model definition and fitting

The features we are going to use for the model are now

Feature	Description
Team id	The Team ID of that participant (either 100 or 200).
Creeps per minute 0-10min	The NPC creatures killed per minute during the time of 0 to 10 minutes into the game.
Gold per minute 0-10min	The gold earned per minute during the time of 0 to 10 minutes into the game.
Damage taken per minute 0-10min	Damage taken per minute during the time of 0 to 10 minutes into the game.

These features will occur 10 times in our data set, once for each player in the match.

We define two functions, one for the model and one for judging the quality of the model:

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor

def fit_xgboost_model(X_train, y_train, X_valid, y_valid, learning_rate=0.01, n_estimators=500, early_stopping_rounds=5):
    model = XGBRegressor(n_estimators=n_estimators, learning_rate=learning_rate, n_jobs=8)
    model.fit(X_train, y_train,
              early_stopping_rounds=early_stopping_rounds, 
              eval_set=[(X_valid, y_valid)], 
              verbose=False
             )
    return model

def score_dataset(model, X_valid, y_valid):
    preds = model.predict(X_valid)
    return mean_absolute_error(y_valid, preds)

To find the best XGB parameters we are looping over number of estimators (n_estimators) and learning rate (learning_rate) and find the parameters which minimize the mean absolute error.

import numpy as np

best_learning_rate = 0
best_n_estimators = 0
best_mae = 100000
for learning_rate in np.arange(0.004, 0.05, 0.002):
    for n_estimators in np.arange(400, 1600, 200):
        model = fit_xgboost_model(imputed_X_train, label_y_train, imputed_X_valid, label_y_valid, learning_rate, n_estimators)
        mae = score_dataset(model, imputed_X_valid, label_y_valid)
        if mae < best_mae:
            best_learning_rate = learning_rate
            best_n_estimators = n_estimators
            best_mae = mae

The best parameters for the data set used in this study were learning_rate = 0 and n_estimators = 0 and we kept the early stopping rounds at a value of 5.

Using these parameters we perform one final fit of the model which we are going to use for prediction on the test set.

model = fit_xgboost_model(imputed_X_train, label_y_train, imputed_X_valid, label_y_valid, best_learning_rate, best_n_estimators)

Predicting Win/Loss

As we know the real outcome of the matches in the test set, we can compare the predictions

preds_test = model.predict(imputed_X_test)

with the actual results

output = pd.DataFrame({"gameid": imputed_X_test.index, "team1_did_win": preds_test})
output["team1_did_win"] = output["team1_did_win"] > 0.5
output = output.set_index("gameid")

and calculate the accuracy of our predictions

check = output.join(y_test, lsuffix="_pred", rsuffix="_test")
check["team1_did_win_test"] = (check["team1_did_win_test"] == "Win")

check["equal"] = check["team1_did_win_test"] == check["team1_did_win_pred"]
test_pred_accuracy_percent = len(check[check["equal"] == True]) / len(check) * 100
print("The accuracy on the test set is " + str(test_pred_accuracy_percent) + " %")

The accuracy on the test set is 70.67495737639153 %

As can be seen, we are able to reach an accuracy of 70 % with that simple modeling approach by just taking into account data from the first 10 minutes of the match. This is quite remarkable as it means we are able to correctly predict the outcome of a match after the first 10 minutes more than 2 out of 3 times.

Summary and discussion

We used the Riot Games API to fetch match data for approx. 50,000 matches for game version 10.13 5v5 Solo Queue on Summoner’s Rift. From the fetched match data we extracted features for the first 10 minutes of the match. Using these features we were able to train a model which can predict the winner of the game to an accuracy of 70%, by just looking at the first 10 minutes of that match.

Appendix

We will take a look at additional metrics which are part of the match data we fetched. We are especially interested in the win rate based on firsts in the match.

Towers are important defensive structures in the game and losing one opens up the possibilities for the other team. As can be seen in the next figure, it can indeed be relevant to lose the first tower.

Figure 1: First Tower Win vs Fail for different regions

Now it can of course be also interpreted that the result of all what happened before in the match tilted the match into one teams favor making it easier for them to take out a tower. Nevertheless, it can be shown that the win chance and the first tower kill is highly correlated.

In addition, Baron is a very important objective in the game. Not only does it provide a large boost for the team taking it, but it usually also indicates that the match is already going in the favor of that team. In the Win/Fail rate for first Baron kill there is a clear tendency to Win for the team taking the Baron and it is nearly impossible to turn a match if the opposite team takes a Baron.

Figure 2: First Baron Win vs Fail for different regions

Note: Here the Win/Fail rates do not sum up to 100%, as there are games were neither team takes the Baron. In an extended analysis one should only take matches were the Baron has been taken into consideration for that plot. Nevertheless, it is clear from these numbers, that it is quite hard to turn a game if the opposing team was able to secure the first Baron of the match.

The dragon is an earlier objective and usually not that important in the outcome of a match. Nevertheless, also here it can be seen that getting the first dragon indicates that that team is on a good way to end the match victorious:

Figure 3: First Dragon Win vs Fail for different regions

The first blood of the match (who killed the a champion of the enemy team first) can happen quite early in a match and from all the investigated metrics this one is the one which does not indicate if a team will win or not that clearly. It seems to be still an indication what team is going to win, however:

Figure 4: First Blood Win vs Fail for different regions

The last analysis we are going to perform is the average game length. Using the data of all the matches we fetched for game version 10.13 played on Summoner’s Rift and in solo 5v5 games we find that, independent of the region, the game duration is approximately close to 30 minutes. We also find the interesting phenomenon that shortly after it is possible to surrender a match, more matches end, which indicates that teams already consider it lost after just 15 minutes into the match. But as we are able to predict the outcome of a game with just 10 minutes of data, those players may indeed be right that they consider the match lost already after just 15 minutes into the match.

Figure 5: Distribution of game length for various regions

References

[1] Chen, Tianqi, and Carlos Guestrin. “XGBoost.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), https://arxiv.org/abs/1603.02754.
[2] https://matplotlib.org/
[3] https://numpy.org/
[4] https://pandas.pydata.org/
[5] https://pymongo.readthedocs.io/en/stable/
[6] https://seaborn.pydata.org/
[7] https://scikit-learn.org/stable/
[8] https://xgboost.readthedocs.io/en/latest/

Lili’s Quest | Week 5

2020-06-27T09:45:00+00:00

This week was heavily driven by refactoring. A lot more entity properties got moved into components, which only hold data. Currently the references to these components are still part of the entity, so for now the entity is still more than just an ID as one would usually have in a strict entity-component-system sense.

During that refactoring also the parsing of the entity definitions (JSON files) got much easier. Utilizing Go’s awesome marshalling/unmarshalling interface the parse functions got shortened and types like MutationEffect now know themselves how to unmarshal from a JSON string into an actually MutationEffect.

In terms of game play I thought of some Mutations I want to add in a first iteration and which functionalities the game has to have for them. E.g., it would be nice to be able to dig through a wall, but for that I need to have destructible walls. In a similar way I want to have force fields, there I need to construct walls!

Except of that I added a lot of TODOs. The next step will be to prioritize them. As game play should be my focus, I will probably start implementing the functionalities needed for the mutations first. UI will then be next on the list.

Lili’s Quest | Week 4

2020-06-21T06:25:00+00:00

During this week I worked on refactoring the input handling, to make it easier adding new keyboard shortcuts. This also has the advantage, that I can disentangle the SDL events from the actual input handling in the program. Later on I have to think about a system on how to have different input key bindings/behaviors depending on the state of the game (main menu, options menu, inventory modal open, for example), but this is something I have to think of when I actually have additional game states.

In addition, I was not happy with the rendering. I wanted to have the ability to render directly onto a grid and therefore I added a console. For now there is only MatrixConsole which can work with square or rectangle fonts (e.g., 12x12 or 6x12) that form a grid. In the same manner as with libtcod you can then put chars onto those grids and customize their foreground and background color. The game map is using this now, which makes rendering much simpler. Due to that refactoring/rewrite I also moved the rendering of the entities into the game map, which makes more sense to me than in the actual game logic.

A simple main menu is now implemented. It can only start the game or quit the application and has placeholders for Options and Load Game. It uses the same MatrixConsole as the map, just with a different font texture. Ideally I can use a Text Console later on, but for now it is good enough to get the logic of the menus set up.

Oh, and here is a new GIF: