Thesis projects

Would you like to do your bachelor’s or master’s thesis research in our group?

We list thesis topics below, organised by the senior supervisor. Most projects will be offered with another group member as a daily supervisor. Reach out to the senior supervisor offering the project if you are interested. In your email, please mention which project you’re interested in and include a CV and grade list.

You are also welcome to propose thesis topics yourself, or ask if we can define a project for you!

Our main research includes the following topics:

Automated Machine Learning (AutoML): automatic design and configuration of machine learning models and algorithms. For instance, hyperparameter optimisation and neural architecture search.
Neural Network Verification: ensuring that trained neural networks are robust and reliable
Earth Observation: designing new algorithms for Earth Observation data, such as satellite images.
Causal Machine Learning: machine learning models and data mining techniques that go beyond statistical correlations and venture into causality
Time-Series Data: time-series classification, forecasting, and other tasks involving a temporal dimension (e.g., ECG data, temperature measurements, economic indicators, etc)
Spatial- and Spatio-Temporal Data: classification, regression, interpolation and other tasks on data with a spatial and/or temporal dimension (e.g., Earth Observation data, air quality data, ecological data)
Trajectory Data: data mining and machine learning techniques applied to data containing the movement paths of objects through space over time (e.g., road traffic data, social network data)

Master’s thesis topics

Overview

Currently available projects with Mitra Baratchi:

Currently available projects with Jan van Rijn:

TBD (for now, contact directly)

Mitra Baratchi

Data augmentation for satellite images

Tags: Earth Observation data

Supervisors: Julia Wąsala, Mitra Baratchi

Problem description: Data augmentations are transformations we apply to images to train models that are less sensitive to data variations. For example, images can be flipped, discoloured, or parts can even be erased. Many deep learning techniques, such as contrastive or self-supervised learning, depend on these operations to help the model learn robust representations. However, some data augmentations do not make sense for satellite data. For instance, rotating satellite images is meaningless because there is no inherent up or down. Other augmentations may work for some tasks but not for others.

Project aim: Design a new data augmentation that overcomes an issue with augmentations designed for natural images. To do that, you’ll first analyse the effectiveness of data augmentations for different types of satellite images and tasks to understand how different augmentations designed for natural images affect satellite image tasks.

Denoising satellite data

Tags: Earth Observation data

Supervisors: Laurens Arp, Mitra Baratchi

Problem description: We keep track of the status of the environment using data measured constantly by satellites orbiting the Earth. However, these satellites are essentially fancy cameras, and cannot directly measure what we are actually interested in: variables describing our environment, such as the health of ecosystems or the amount of pollution. The light data measured by satellites must, therefore, be processed to produce any meaningful insights we can do something about. The ecological research community has been putting much effort lately into applying ever more complex AI models to map this light data to meaningful variables. However, recent research implies that this mapping is not actually that challenging, and can be achieved nearly perfectly with perfect data. Instead, flaws (especially noise) in the light data itself result in the biggest uncertainties and highest errors for this task, meaning that AI methods improving data quality could be much more impactful.

Project aim: Apply state-of-the-art methods in machine learning and deep learning (e.g., denoising autoencoders) to this problem to reduce noise in satellite data and improve data quality. You will first familiarise yourself with a typical setup to derive environmental variables from satellite data, and see the impact of noise for yourself. After this, you can proceed by training a (self-supervised) denoising model, and aim to improve the performance of the environmental monitoring task.

Note: This project would likely involve an additional supervisor from the Environmental Institute (CML) of Leiden University, who have expertise on the domain side of this problem.

Scientifically Relevant Representation Learning for Satellite Data

Tags: Earth Observation data

Supervisors: Laurens Arp, Mitra Baratchi

Problem description: Recent advances in Deep Learning have caused a boom in representation learning for the Earth Observation (EO) data collected by satellites. Representation learning is a technique in deep learning where the model learns to automatically extract features from raw data without need ground truth data. Although these techniques result in powerful models that can be used for a variety of tasks, the learned features can usually only be used for computer vision-like tasks. However, in many applications using EO data, we need features that are relevant for scientific tasks. Therefore, we need a method to automatically extract scientifically relevant features, which can then be used for a variety of important tasks related to the environment, climate, or disaster monitoring.

Project aim: Develop a self-supervised representation learning network to learn scientifically relevant features. First, you would work on the technical challenges of this approach: selecting a suitable base architecture, formulating suitable proxy tasks, and successfully training the model on your custom proxy tasks. Next, you can evaluate your model by checking whether your custom features are more effective for scientific applications using EO data than the features learned through conventional representation learning techniques.

Note: This project would likely involve an additional supervisor from the Environmental Institute (CML) of Leiden University, who have expertise on the domain side of this problem.

Normalisation of multimodal satellite data for computer vision

Tags: Earth Observation data

Supervisors: Julia Wąsala, Mitra Baratchi

Problem description: Standard RGB images have predefined value ranges: the values of pixels in each channel range between 0-255. Therefore, it is common in computer vision to simply use the ImageNet normalisation strategy because all RGB images have the same possible values. However, pixels in satellite images can have very different values (reaching in the thousands), and bands or channels of a single image do not necessarily have the same range. This problem becomes even worse if we combine different satellite images, like optical and radar (sentinel-2 and sentinel-1), because the value ranges of the pixels do not need to match at all and problems like long-tail distributions make it difficult to design good image normalisation. As a result, we need to come up with new normalisation schemes for satellite images, especially when fusing data from different sensors.

Project aim: investigate the effect of different normalisations on different satellite datasets and tasks, ideally including multi-modal tasks. Identify a research question/knowledge gap in the normalisation of satellite images for deep learning (e.g. for a specific task or dataset) and design a new normalisation procedure. Examples: blog, dataset with custom normalisation.

Rashomon Sets for Reliable Machine Learning

Tags: machine learning, optimisation, reliability

Supervisors: Mitra Baratchi, Laurens Arp, Elena Raponi

Project description: Generally, the algorithms used for training machine learning models return only a single model. However, for a given dataset, many models exist that perform equally well. Identifying models with similar performance provides the opportunity to select better models for a given task (e.g., faster, more explainable). The set of almost optimal models with similar performance is called a Rashomon set. Identifying the Rashomon set for a given problem is not a trivial problem, as Rashomon sets can be large in size and complicated in nature. This makes it extremely challenging to find this set for non-linear models such as neural networks. Existing research has designed strategies for identifying the Rashomon set of decision tree algorithms. In this research, we want to explore the use of efficient optimisation techniques, such as Bayesian optimisation, for extracting the Rashomon set of potentially a larger class of machine learning models.

Related reference:

Xin, Rui, et al. “Exploring the whole rashomon set of sparse decision trees.” Advances in neural information processing systems 35 (2022): 14071-14084.
Rudin, Cynthia, et al. “Amazing things come from having many good models.” arXiv preprint arXiv:2407.04846 (2024).

Jan van Rijn

Neural network verification

Tags: Neural Network Verification

Bachelor thesis topics

We welcome bachelor students. This page will be updated soon. Please reach out to one of the supervisors if you’d like to do a project with us but there is nothing here!

TBA

Master’s thesis topics

Overview

Mitra Baratchi

Data augmentation for satellite images

Denoising satellite data

Scientifically Relevant Representation Learning for Satellite Data

Normalisation of multimodal satellite data for computer vision

Rashomon Sets for Reliable Machine Learning

Jan van Rijn

Neural network verification

Bachelor thesis topics

Share this: