🎙️Blog

A Fresh Perspective on Treatment Effects - Beyond the Average and Into the Tails

Most evaluation of interventions — policies, programs, experiments — centers on the Average Treatment Effect (ATE). Did the treated group do better on average than the…

All About MLflow

If you’ve spent any time doing machine learning seriously, you’ve run into this problem: you trained a model last week that performed better than anything you have now, and…

Contextual Multi-Armed Bandit: Maximizing Rewards with Intelligent Decision-Making

Picture a row of slot machines. Each has its own payout probability, and you don’t know any of them upfront. Your goal is simple: walk away with as much money as possible.…

Data Science Books

Books I actually recommend to people, with honest takes on what each one is good for.

Data Science project Boilerplate

Every time I start a new data science project, I go through the same setup steps. Create folders, set up the virtual environment, add a .gitignore, write the Dockerfile. It…

Developing in a Docker container

I develop inside Docker containers. Not because it’s trendy, but because it solves a real problem: my local machine stays clean, the project environment is reproducible, and…

Embracing Change: Incremental vs. Batch Machine Learning

Most machine learning tutorials train a model, evaluate it, and stop. That’s fine for a homework assignment. In production, it’s usually not how things work. Real systems…

From ATE to Uplift Modeling

In a randomised controlled trial (RCT) the standard output is the Average Treatment Effect (ATE): one number telling you how much the treatment moved the outcome on average.…

How I Created R-Genius: A Journey into Empowering R Users with AI

R is one of the most powerful tools in data science. It’s also one of the most unforgiving. Error messages that reveal nothing, package ecosystems that overlap in confusing…

Logistic Regression and Marginal Effects

Logistic regression is everywhere in applied data science — binary outcomes, classification problems, probability estimation. Most people know how to fit one. Fewer know how…

Machine Learning, Copula and Synthetic Data

Synthetic data is one of those ideas that sounds like it shouldn’t work — if the data is fake, how can a model trained on it generalize to real data? The answer is that…

On Learning Methods in the Age of AI

In recent years, something subtle has changed in how students approach programming. What used to begin with a blank script and a vague idea now often begins with a prompt. A…

Online Uplift Modelling with River

The usual workflow is:

Probability Box with Kernel Density Estimation

Weather forecasts got me thinking about data. A simple historical table — temperature, humidity, rain — and the question: given all this data, what can I actually say about…

Quantile Random Forest

Most regression models give you a single number: the expected value of the target given the inputs. That’s often what you want, but it throws away information. Quantile…

The 3 + 1 pillars of data science

A few weeks ago, one of my students posed a question I wasn’t expecting:

The Beauty of Soft Decision Trees

Decision trees work by making hard choices at each node: if \(X_1 > 5\), go left; otherwise, go right. It’s clean, interpretable, and brittle. A point sitting right on the…

The R `rgcapi` library

Algorithmic trading has always interested me. The abundance of time-series data, the clear feedback loop, the challenge of building and testing strategies — it’s a domain…

Understanding Stationary: Concepts, Implications, and Approaches

Time series analysis runs through economics, finance, engineering, and the natural sciences — any domain where observations are indexed in time and the ordering matters.…

Unraveling the Power of Causal Machine Learning

Standard machine learning is very good at finding patterns. Given enough data, a model can identify that A and B tend to co-occur, that X is predictive of Y, that a certain…

Updating knowledge with Bayes

I’ve explained Bayesian updating many times over the years — to students, to colleagues, to people with no statistics background at all. The examples I find work best are…

When in doubt, just model it. Modelling uncertainty

There’s a pattern I’ve noticed across projects: when something is hard to measure, people tend to either ignore it or collapse it into a single number. Both choices feel…

pbox: Exploring Multivariate Spaces with Probability Boxes

In a previous post I introduced the idea of a “probability box” — turning a dataset into a queryable probability space using Kernel Density Estimation. That was the…