Oxford Young Statisticians Seminar
Department of Statistics, University of Oxford
Seminar's schedule Google Calendar Mailing ListThe Oxford Young Statisticians Seminar (OxYSS) is a series of junior seminars in which PhD students and postdocs from the University of Oxford working on statistics and machine learning present their research in an accessible manner. The aim is to provide insight into each other’s work and foster potential collaborations.
Seminar’s Schedule
For any question or request please contact Valentin Kilian.
You can also subscribe to our Google Calendar.
2026
| Section | Date | Time | Location | Title | Speaker |
|---|---|---|---|---|---|
| Distinguished speaker | 19/01 | 16:00 | Small Lecture Theatre, Department of Statistics |
Score-based generative emulation of impact-relevant Earth system model outputs
Policy targets evolve faster than the Couple Model Intercomparison Project cycles, complicating adaptation and mitigation planning that must often contend with outdated projections. Climate model output emulators address this gap by offering inexpensive surrogates that can rapidly explore alternative futures while staying close to Earth System Model (ESM) behavior. We focus on emulators designed to provide inputs to impact models. Using monthly ESM fields of near-surface temperature, precipitation, relative humidity, and wind speed, we show that deep generative models have the potential to model jointly the distribution of variables relevant for impacts. The specific model we propose uses score-based diffusion on a spherical mesh and runs on a single mid-range graphical processing unit. We introduce a thorough suite of diagnostics to compare emulator outputs with their parent ESMs, including their probability densities, cross-variable correlations, time of emergence, or tail behavior. We evaluate performance across three distinct ESMs in both pre-industrial and forced regimes. The results show that the emulator produces distributions that closely match the ESM outputs and captures key forced responses. They also reveal important failure cases, notably for variables with a strong regime shift in the seasonal cycle. Although not a perfect match to the ESM, the inaccuracies of the emulator are small relative to the scale of internal variability in ESM projections. We therefore argue that it shows potential to be useful in supporting impact assessment. We discuss priorities for future development toward daily resolution, finer spatial scales, and bias-aware training.
|
Shahine Bouabid (MIT) |
| Oxford Young Statistician | 04/02 | 16:30 | Large Lecture Theatre, Department of Statistics | Guidance for Diffusion Sampling with Applications to Black Hole Imaging | Christopher Williams |
| Distinguished speaker | 11/02 | 16:00 | Small Lecture Theatre, Department of Statistics |
From $1/\sqrt{n}$ to $1/n$: Accelerating SDE Simulation with Cubature Formulae
Monte Carlo sampling is the standard approach for estimating properties of solutions to stochastic differential equations (SDEs), but its error decays only as 1/√n, requiring huge sample sizes. Lyons and Victoir (2004) proposed replacing independently sampled Brownian driving paths with "cubature formulae", deterministic weighted sets of paths that match Brownian "signature moments" up to some degree D. They prove that cubature formulae exist for arbitrary D, but explicit constructions are difficult and have only reached D=7, too small for practical use. We present an algorithm that efficiently and automatically constructs cubature formulae of arbitrary degree, reproducing D=7 in seconds and reaching D=19 within hours on modest hardware. In simulations across multiple SDEs, our cubature formulae achieve an error roughly of order 1/n, orders of magnitude smaller than Monte Carlo with the same number of paths.
|
Peter Koepernik (OpenAI) |
| 20/02 | 16:00 | Small Lecture Theatre, Department of Statistics |
Efficient Two-Sample Instrumental Variable Estimation and Over-Identification Testing
Two-sample instrumental variables estimation arises when the first-stage relationship between endogenous regressors and instruments, and the reduced-form equation for the outcome, are observed in different samples - a setting that is increasingly common in empirical work combining multiple data sources. Using the two-sample IV framework of Inoue and Solon (2010), I compare one-step and two-step GMM estimators based on two-sample moment conditions. I show that the standard two-sample 2SLS estimator is not generally efficient, while a two-step GMM estimator that uses the correct moment variance achieves asymptotic efficiency.
I then develop a valid overidentification test for the two-sample setting. I derive a Hansen–Sargan–type J-statistic that accounts for sampling variability from both samples and show that it converges to a chi-square distribution with degrees of freedom equal to the number of overidentifying restrictions. The results clarify how classical IV testing procedures extend to two-sample designs and highlight the importance of using the appropriate asymptotic variance for valid inference.
|
Fatima Kasenally | |
| 20/02 | 16:30 | Small Lecture Theatre, Department of Statistics |
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
We propose a general-purpose approach for improving the ability of large language models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian experimental design with large language models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated (and then estimated) in a principled way using a probabilistic model derived from the LLM's predictive distributions and provide detailed insights into key decisions in its construction and updating procedure. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20 Questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.
|
Deepro Choudhury | |
| 04/03 | 15:00 | Second floor Open Research Area, Department of Statistics |
A Friendly Talk on Bandit Convex Optimization in Changing Environments
We present Bandit Convex Optimization in non-stationary environments, where the learner selects actions from a continuous domain and observes losses at only one point per round. We aim to minimize regret under three natural non-stationarity measures: the number of switches S, total variation V, and path-length P. We propose TEWA-SE (Tilted Exponentially Weighted Average with Sleeping Experts), a polynomial-time algorithm adapting the sleeping experts framework to the bandit setting. For strongly convex losses, TEWA-SE achieves minimax-optimal regret with respect to S and V, with matching upper and lower bounds. For general convex losses, we introduce cExO (clipped Exploration by Optimization), which achieves minimax-optimal regret for S and V, and improves existing bounds for P. Time permitting, we will also discuss related parameter-free approaches based on coin betting. We will keep the focus on intuition and high-level ideas, with the goal of distilling principles that are broadly applicable to sequential decision-making under non-stationarity.
|
Xiaoqi (Shirley) Liu | |
| 04/03 | 15:30 | Second floor Open Research Area, Department of Statistics |
Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs
We introduce time-to-unsafe-sampling, a novel safety measure for generative models, defined as the number of generations required by a large language model (LLM) to trigger an unsafe (e.g., toxic) response. While providing a new dimension for prompt-adaptive safety evaluation, quantifying time-to-unsafe-sampling is challenging: unsafe outputs are often rare in well-aligned models and thus may not be observed under any feasible sampling budget. To address this challenge, we frame this estimation problem as one of survival analysis. We build on recent developments in conformal prediction and propose a novel calibration technique to construct a lower predictive bound (LPB) on the time-to-unsafe-sampling of a given prompt with rigorous coverage guarantees. Our key technical innovation is an optimized sampling-budget allocation scheme that improves sample efficiency while maintaining distribution-free guarantees. Experiments on both synthetic and real data support our theoretical results and demonstrate the practical utility of our method for safety risk assessment in generative AI models.
|
Hen Davidov | |
| OxCSML x OxYSS | 06/03 | 15:30 | Small Lecture Theatre, Department of Statistics |
Recent Advances in Conformal Prediction with E-Values
Conformal prediction has become a versatile framework for distribution-free uncertainty quantification, offering coverage guarantees under minimal assumptions. Traditionally, these methods rely on p-values to ensure marginal coverage when all data are exchangeable. More recently, e-values have emerged as a powerful and flexible tool in statistics. Their integration into conformal prediction has opened the door to constructing valid prediction sets in more complex and challenging settings. In this talk, I will provide an overview of these advances, explain the key ideas behind using e-values in conformal prediction, and highlight examples that demonstrate both their promise and the open questions they raise.
|
Etienne Gauthier (INRIA) |
| Statistics meet Mathematics |
11/03 | 16:00 | Small Lecture Theatre, Department of Statistics |
A hierarchical modelling approach for Bayesian Causal Forests on longitudinal data
Imaging-Derived Phenotypes (IDPs), such as brain volume change, provide sensitive longitudinal markers of disease progression and treatment response across clinical trials. However, drawing causal conclusions from longitudinal IDPs is statistically challenging: follow-up times are irregular, repeated measurements induce within-individual correlation, and scanner-related variability introduces substantial non-biological heterogeneity. Bayesian Additive Regression Trees (BART) and their extension, Bayesian Causal Forests (BCF), provide flexible, nonparametric tools for estimating heterogeneous treatment effects in complex settings. Yet, both models are inherently cross-sectional, assuming independence across observations and therefore failing to account for within-individual correlation over time. Motivated by the NO.MS dataset, the largest and most comprehensive clinical trial dataset in Multiple Sclerosis (MS), we develop BCFLong, a hierarchical extension of BCF for longitudinal analysis, which preserves the flexibility of BART while explicitly modelling irregular follow-up and scanner-related heterogeneity. Inspired by BCF, we decompose the fixed effect mean into two components, using the former to isolate non-biological scanner effects and the latter to model the treatment effect, and we introduce individual-specific random effects, including random intercept and time-dependent slope, with a sparsity-inducing horseshoe prior. Simulations confirm BCFLong’s superior performance and robustness to sparsity, and on the NO.MS dataset, BCFLong reveals clinically meaningful longitudinal treatment effects on brain volume change that cannot be recovered by existing cross-sectional or trial-level methods.
|
Emma Prevot (Statistics) |
| 11/03 | 16:30 | Small Lecture Theatre, Department of Statistics |
Advances in Neural Controlled Differential Equations
Many real-world systems evolve continuously, yet most machine learning models interpret time series as discrete sequences. Continuous-time approaches instead treat time series as samples from an underlying input path, a formulation that naturally accommodates irregularly sampled or oversampled data. Among these, Neural Controlled Differential Equations (NCDEs) are a maximally expressive class of models that parametrise a vector field using a neural network and evolve their hidden state by solving a dynamical system driven by the input path. This talk presents three contributions that improve the training, scalability, and interpretability of NCDEs. First, building on neural rough differential equations, Log-NCDEs apply the Log-ODE method to efficiently approximate an NCDE's solution during training, improving both computational speed and empirical performance. Second, Linear NCDEs replace the non-linear vector field with a linear one, enabling closed-form solutions and parallel-in-time computation without sacrificing theoretical expressivity. Third, Structured Linear NCDEs use structured linear vector fields to further enhance efficiency while maintaining theoretical expressiveness and empirical performance. Collectively, these methods reduce the time per training step for NCDEs by up to three orders of magnitude while achieving state-of-the-art performance across diverse time series benchmarks.
| Benjamin Walker (Mathematical Institute) |
2025
| Section | Date | Time | Location | Title | Speaker |
|---|---|---|---|---|---|
| Keynote talk | 05/03 | 11:00 | Second floor Open Research Area, Department of Statistics | The predictive approach to uncertainty quantification | Vik Shirvaikar |
| Random Graphs | 12/03 | 16:00 | Roy Griffiths Room, Keble College | Introduction to parameter estimation on random graphs | Adrian Fischer |
| 12/03 | 16:30 | Roy Griffiths Room, Keble College | Modelling Extremely Sparse Networks with Random Measures | Valentin Kilian | |
| Diffusion Models | 26/03 | 17:00 | Ground floor Social Area, Department of Statistics | Diffusing through life, mindless and careless | Chris Williams |
| 26/03 | 17:30 | Ground floor Social Area, Department of Statistics | Understanding generalisation in diffusion models | Tyler Farghly | |
| Keynote talk | 01/05 | 11:00 | Second floor Open Research Area, Department of Statistics | The interplay of scaling and generalization | Amitis Shidani |
| Distinguished speaker | 30/05 | 17:00 | Ground floor Social Area, Department of Statistics | Beating the odds: flexible models for predicting football scores | Nick Zhang (University College Dublin) |
| Statistics meet Probability |
18/06 | 11:30 | Second floor Open Research Area, Department of Statistics | Introduction to the Parabolic Anderson Model | Léo Tyrpak (Probability) |
| 18/06 | 12:00 | Second floor Open Research Area, Department of Statistics | Spectral Clustering for Directed Graphs | Ning Zhang (Statistics) | |
| Distinguished speaker: Causal Inference |
09/07 | 17:00 | Ground floor Social Area, Department of Statistics | A vibes based introduction to causal inference | Daniel Manela |
| 09/07 | 17:30 | Ground floor Social Area, Department of Statistics | A latent causal inference framework for ordinal variables | Martina Scauda (University of Cambridge) | |
| Distinguished speaker | 21/07 | 16:00 | Second floor Open Research Area, Department of Statistics | Partial order hierarchies | Jessie Jiang (Google) |
| Distinguished speaker | 20/08 | 11:00 | Second floor Open Research Area, Department of Statistics |
Testing Symmetry on the Torus: Le Cam Theory Meets Stein’s Method
Several complex real-world data can be viewed as points on the hyper-torus, which is the cartesian product of circles. Over the past few years, this has motivated new proposals of distributions on the torus, both (pointwise) symmetric and sine-skewed asymmetric. In practice, it is relevant to know whether one should use the simpler symmetric models or the more convoluted yet more general asymmetric ones. So far, only parametric likelihood ratio tests have been defined to distinguish between a symmetric density and its sine-skewed counterpart. In this talk, optimal tests for symmetry on the hyper-dimensional torus are presented, which are built leveraging Le Cam’s methodology.
Both scenarios where the center of symmetry is known and where it is unknown are addressed. These tests are not only valid under a given parametric hypothesis but instead under a very broad class of symmetric distributions. The asymptotic behavior of the proposed tests is studied both under the null hypothesis and local alternatives, and a focus is given on the derivation of quantitative bounds on the distributional distance between the exact (unknown) distribution of the test statistic and its asymptotic counterpart using Stein’s method. The finite-sample performance of the tests is evaluated through simulation studies, and their practical utility is demonstrated via an application to protein folding data.
This is joint work with A. Anastasiou and C. Ley.
|
Sophia Loizidou (University of Luxembourg) |
| Keynote talk | 22/10 | 17:00 | Large Lecture Theatre, Department of Statistics |
Attention to Experimentation Is All You Need: What To Do When You Run Out Of Data to Train Your Models With
Machine Learning, on average, is a miracle for it identifies patterns in data with high-dimensionality, and yields inter- and extrapolation often considered better than human jugdement. Big data was an era which hold the promise that we can almost indefinitely mine all collected data for the patterns and make ever better predictions. While this worked out well for ad-targeting, meme recommendations, and possibly finance, most crucial and pressing issues simply don’t have an abundance of data. This talk explores the question of what do when you have run out of data to train on, or, when you are starting in a low-data regime in the first place. We discuss Active Learning, Bayesian Optimisation and how these are part of a growing revolution in applied science across the UK and the globe.
|
Jakob Zeitler |
| Keynote talk | 05/11 | 16:30 | Large Lecture Theatre, Department of Statistics |
How to train your
|
Silvia Sapora |
| Keynote talk | 19/11 | 16:30 | Large Lecture Theatre, Department of Statistics |
Optimising Optimisation: A New
|
Kevin Lam |