Event study designs estimate period-specific treatment effects β̂_t with known standard errors from two-way fixed effects regressions. We treat the β̂_t sequence as observations from a local linear trend state-space model and apply the Rauch–Tung–Striebel (Kalman) smoother to recover the treatment effect trajectory and its derivative. The approach uses the known, heteroskedastic regression standard errors as observation noise — a structural advantage over generic smoothing methods.
| Metric | Improvement |
|---|---|
| Level MSE (β̂_t vs true β_t) | 80% reduction vs raw estimates |
| Derivative MSE (Δβ̂_t vs true Δβ_t) | 98% reduction vs raw estimates |
| Parallel trends power (anticipation) | 13.3% vs 9.0% (raw Wald), bootstrap-calibrated at 5% |
| Parallel trends power (small pretrend) | 6.3% vs 4.8% (raw Wald) |
| Size under null | 4.2% (correctly sized with bootstrap calibration) |
-
The Kalman gain adapts to local precision. When a period's β̂_t has a large SE (few observations, noisy outcome), the smoother trusts the trend model. When the SE is tight, it trusts the data. Fixed-window smoothers (Savitzky–Golay, splines) cannot do this.
-
Joint estimation of level and derivative. The state vector is [β_t, Δβ_t]. The smoother gives you the rate of change of the treatment effect for free — and with 98% lower MSE than finite-differencing the raw estimates.
-
The derivative-based parallel trends test. Rather than testing whether pre-treatment β̂_t are jointly zero (low power because the estimates are noisy), we test whether the smoothed Δβ̂_t are jointly zero. The test achieves correct size and modest power improvements over the raw Wald test.
ms/
smoother_trends_sharper_tests.tex # LaTeX source
smoother_trends_sharper_tests.pdf # Compiled paper (14 pages, 4 figures, 3 tables)
figs/
paper_fig1.pdf # Fig 1: Illustrative examples (4 DGPs)
paper_fig2.pdf # Fig 2: MSE reduction summary
paper_fig3.pdf # Fig 3: Size and power
paper_fig4.pdf # Fig 4: Sensitivity to Q
scripts/
paper_simulation.py # Reproduces all tables and figures
pip install numpy pandas scipy matplotlib scikit-learn
No R dependencies. No PyTorch. Everything runs in base scientific Python.
cd scripts
python paper_simulation.pyTakes ~5 minutes. Produces:
tabs/table1_mse.csv,tabs/table2_size_power.csv,tabs/table3_sensitivity.csvfigs/paper_fig1.pdfthroughfigs/paper_fig4.pdf
To compile the paper:
cd ms
pdflatex smoother_trends_sharper_tests.tex
bibtex smoother_trends_sharper_tests
pdflatex smoother_trends_sharper_tests.tex
pdflatex smoother_trends_sharper_tests.texState: x_t = [β_t, Δβ_t]'
Transition: x_{t+1} = F x_t + w_t, w_t ~ N(0, Q)
Observation: β̂_t = H x_t + v_t, v_t ~ N(0, σ̂²_t)
F = [[1, 1], [0, 1]] # local linear trend
H = [1, 0] # observe level only
Q = diag(q_ℓ, q_s) # process noise (tuning parameter)
R_t = σ̂²_t # KNOWN from TWFE regression
The Kalman smoother induces serial correlation in the smoothed estimates, invalidating chi-squared critical values. We use a parametric bootstrap under H₀: β_t = 0:
- Draw β̂_t^(b) ~ N(0, σ̂²_t) for b = 1, ..., B
- Apply Kalman smoother to each draw
- Compute test statistic for each draw
- Use the (1-α) quantile as the critical value
This yields exact size control because the null DGP is fully specified.
This paper sits at the intersection of two literatures that haven't talked to each other:
Event study / DiD methods: Roth (2022, AER:I) shows pre-trend tests have low power. Rambachan & Roth (2023, RES) propose sensitivity analysis for bounded violations. Borusyak, Jaravel & Spiess (2024, RES) derive efficient imputation estimators. None use state-space methods.
State-space econometrics: Harvey (1989), Durbin & Koopman (2012) develop the Kalman filter/smoother framework. Harvey (1985) shows the HP filter is a special case. None apply it to event study coefficient sequences.
Our approach is complementary to Rambachan–Roth: they ask "how sensitive are conclusions to bounded violations?" We provide better estimates of the treatment effect trajectory. The Kalman-smoothed estimates could serve as inputs to their sensitivity framework.
-
Smoothness assumption. The local linear trend model assumes β_t evolves smoothly. For sharp, immediate treatment effects, the smoother attenuates the jump and propagates some post-treatment signal backward. MSE is still reduced (70% for levels), but researchers should be aware of this.
-
Independence across periods. We assume β̂_t are independent across t, which holds under standard TWFE with independent clusters. Serial correlation in the errors would require extending the observation noise model.
-
Process noise Q. The choice of Q = diag(q_ℓ, q_s) is a tuning parameter analogous to bandwidth in nonparametric regression. We provide recommended defaults and urge sensitivity analysis. Marginal likelihood or cross-validation selection of Q is a natural extension.
-
Test power is modest. With proper pre-only smoothing (required to avoid backward propagation of post-treatment signal), the Kalman-based tests achieve only modest power improvements over the raw Wald test. The main contribution is in estimation, not hypothesis testing.
This project started from exploring whether the incline package (Savitzky–Golay and spline smoothing for noisy time series) could improve neural network gradient estimation. That exploration led to a systematic comparison of smoothing methods (Kalman, Savitzky-Golay, local polynomial) for online vs. retrospective estimation, which revealed that the Kalman filter's adaptive noise weighting is critical for problems where the signal-to-noise ratio varies — and that applied econometrics, despite being full of such problems, barely uses it.
MIT