Introduction

My research is concerned with the theoretical and applied foundations of causal machine learning — the problem of identifying and estimating causal effects from data using flexible, data-driven methods. A central motivation is that observational data, which is the dominant substrate for empirical work across the social, environmental, and health sciences, rarely satisfies the assumptions classical causal inference requires. Confounding is high-dimensional, treatment assignment is complex, and the population of interest is seldom homogeneous. Machine learning offers tools to relax these assumptions, but doing so rigorously — in ways that preserve valid statistical inference rather than merely improving predictive fit — is the core methodological challenge I address.

My work spans three interconnected research directions: causal effect estimation in sequential and streaming settings, heterogeneous treatment effect (HTE) estimation in large-scale experimental data, and synthetic counterfactual methods for policy evaluation in observational time series.

Sequential Causal Inference

The dominant paradigm in causal ML assumes access to a complete, i.i.d. dataset. This assumption is incompatible with a wide class of practically important problems — including adaptive clinical trials, dynamic pricing experiments, and environmental monitoring — where observations arrive sequentially and decisions must be made before the full data are observed.

Extending causal inference to the sequential setting raises fundamental challenges that do not arise in the static case. Standard online learning objectives, such as minimising cumulative outcome prediction error, are misaligned with causal targets: the quantity of interest is not the conditional mean of the outcome but the conditional average treatment effect (CATE), and the two diverge whenever effect heterogeneity exists. This misalignment propagates through tree-based methods, where the splitting criterion used in standard Hoeffding trees or random forests selects features that predict outcomes well rather than features along which treatment effects vary — producing biased CATE estimates even when outcome prediction is accurate.

My work in this area develops online estimators — including streaming variants of IPW, Augmented IPW, Overlap Weighting, and meta-learner frameworks — alongside a novel Causal Hoeffding Tree with a split criterion designed to maximise between-child CATE variance. A further complication is distributional shift: in long-running experiments, both the propensity model and the outcome model may become stale, and the resulting bias in effect estimates is difficult to detect without purpose-built diagnostics. I address this through adaptive drift detection and sequential balance monitoring as integral components of the estimation procedure, rather than external checks.

Heterogeneous Treatment Effects in Experimental Data

Average treatment effect estimation has a mature theoretical basis. The practical demand, however, has shifted toward heterogeneous treatment effects — understanding for whom a treatment works, how the effect varies across observed covariates, and how this variation can inform targeting decisions. This shift creates both statistical and infrastructural challenges.

Statistically, the identification of CATEs requires stronger assumptions than ATE estimation and is subject to the curse of dimensionality in high-dimensional covariate spaces. Methods such as Double Machine Learning (DML) and Causal Forests provide semiparametrically efficient estimates under appropriate regularity conditions, but their finite-sample behaviour — particularly the calibration of individual-level CATE confidence intervals — remains an active area of research.

Infrastructurally, experiment data in applied settings is typically stored in analytical data warehouses and is subject to governance constraints that make extraction for offline analysis impractical. My work explores how causal ML pipelines can be designed to operate within the warehouse layer — executing data preparation and feature construction via SQL while reserving the statistical estimation step for a lightweight downstream layer — without sacrificing the validity of the causal estimates. This design imposes non-trivial constraints on which pre-aggregations are safe under identification assumptions, a question that has received little formal treatment in the literature.

Synthetic Counterfactual Methods for Policy Evaluation

A recurring problem in program evaluation is the estimation of treatment effects in settings where neither randomisation nor a clean control group is available, and where the outcome of interest is a time series. The synthetic control method addresses this by constructing a weighted combination of untreated units whose pre-treatment trajectory closely matches that of the treated unit, and using the post-treatment divergence as the effect estimate.

Standard synthetic control methods rely on a convex combination of donor units and are sensitive to the choice of outcome predictors. I extend this framework by replacing the donor-weighting step with a machine learning model — specifically a modified robust random forest that accounts for the autocorrelation structure of the data through block bootstrapping. This allows the counterfactual to incorporate a richer feature space than the standard approach while maintaining temporal validity. Uncertainty quantification is handled through quantile regression forest, yielding marginal prediction intervals on the counterfactual trajectory and, by extension, on the estimated treatment effect at each post-treatment time point.

This methodology has been applied to the evaluation of environmental interventions and climate policies, where the absence of randomisation and the non-stationarity of climate time series make conventional difference-in-differences estimators unreliable.

Open Questions

Several problems cut across these research directions and remain incompletely resolved:

Sensitivity analysis for sequential estimators. Classical tools for assessing sensitivity to unmeasured confounding — such as Rosenbaum bounds — do not extend naturally to online settings where the propensity model is updated incrementally. Developing analogous sensitivity analyses for streaming causal estimators is an open problem.
Coverage guarantees for individual CATEs. Causal forest implementations typically provide asymptotically valid confidence intervals for the ATE or for conditional effects averaged over subgroups. Reliable marginal coverage for individual CATE estimates — which is what personalised decision-making requires — is not yet well understood in either finite samples or under model misspecification.
Causal identifiability under data aggregation. Pre-aggregating individual-level data is a standard warehouse engineering practice, but many aggregations are not innocuous under causal identification assumptions. Formalising which aggregations preserve identification and which introduce bias is an underexplored question at the boundary of causal inference and database theory.