EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning by igerber · Pull Request #230 · igerber/diff-diff

igerber · 2026-03-22T13:40:21Z

Summary

Add cluster-robust SEs: analytical Liang-Zeger from EIF values + cluster-level multiplier bootstrap
Add control_group="last_cohort" fallback for datasets with no never-treated units
Add hausman_pretest() classmethod for PT-All vs PT-Post assumption selection (Theorem A.1, full covariance)
Add small cohort warnings when cohort has < 2 units or < 1% of sample
Extract shared _cluster_aggregate() / _compute_se_from_eif() helpers; pass cluster info explicitly (no mutable instance state)

Methodology references (required if estimator / math changes)

Method name(s): EfficientDiD (Chen, Sant'Anna & Xie 2025)
Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). "Efficient Difference-in-Differences and Event Study Estimators"
Any intentional deviations from the source (and why):
- Cluster-robust SEs use Liang-Zeger on EIF values (documented in REGISTRY.md)
- Hausman pretest uses pseudoinverse for non-PSD variance-difference matrix (documented in REGISTRY.md)
- Last-cohort-as-control reclassifies last cohort as pseudo-never-treated (documented in REGISTRY.md)

Validation

Tests added/updated: tests/test_efficient_did.py (+355 lines, 21 new tests across TestHausmanPretest, TestClusterRobustSE, TestLastCohortControl, TestSmallCohortWarning)
Existing validation unchanged: tests/test_efficient_did_validation.py (HRS replication still passes)
Local AI review: ✅ Looks good (no P0/P1 findings on final review)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-03-22T13:49:03Z

Overall Assessment

⚠️ Needs changes

Executive Summary

The PR’s three stated methodology deviations are all documented in the Methodology Registry, so those deviations themselves are not defects. docs/methodology/REGISTRY.md:L677-L679
Analytical clustered SEs are mis-scaled in the new shared helper: cluster-summed EIF variance is divided by n_units instead of an n_units^2-scaled term, so clustered inference from fit() is wrong for multi-unit clusters. diff_diff/efficient_did.py:L81-L98
control_group="last_cohort" is only applied when there are no never-treated units, but the results object still reports last_cohort on mixed samples. That is an undocumented control-group mismatch. diff_diff/efficient_did.py:L386-L416
hausman_pretest() reconstructs n_units from n_obs // len(time_periods), which becomes wrong after last_cohort trims the usable time periods and silently mis-scales the Hausman covariance/test statistic. diff_diff/efficient_did.py:L1288-L1297
The new tests do not cover the multi-unit cluster formula, invalid cluster inputs, mixed-sample last_cohort semantics, or hausman_pretest(..., control_group="last_cohort"), so they would not catch the issues above. tests/test_efficient_did.py:L760-L799

Methodology

Severity: P3 Impact: The PR’s intentional deviations are properly documented in the registry: Liang-Zeger-on-EIF clustered SEs, pseudoinverse Hausman inversion, and last-cohort pseudo-control. Per the review rules, I am treating those as informational only, not defects. Concrete fix: None. docs/methodology/REGISTRY.md:L667-L679
Severity: P1 Impact: _compute_se_from_eif() does not implement the clustered EIF variance implied by the registry’s analytical SE definition. The unclustered path uses the expected mean(EIF^2) / n = sum(EIF^2) / n^2, but the clustered path computes mean(psi_c^2) / n, which is too large by roughly the average cluster size in balanced data. This affects ATT(g,t), overall ATT, event-study, and group analytical clustered SEs. The same file’s Hausman covariance helper uses the correct n_units**2 denominator, so the PR now has two inconsistent clustered inference formulas. Concrete fix: change the clustered branch to an n_units^2-scaled cluster-sum variance, e.g. correction * np.sum(centered**2) / (n_units**2) (or the algebraically equivalent correction * n_clusters * np.mean(centered**2) / (n_units**2)), then add a hand-computed Liang-Zeger regression test with clusters containing multiple units. diff_diff/efficient_did.py:L81-L98, diff_diff/efficient_did.py:L750-L752, diff_diff/efficient_did.py:L1016-L1018, diff_diff/efficient_did.py:L1353-L1356, docs/methodology/REGISTRY.md:L593-L602, docs/methodology/REGISTRY.md:L677-L679
Severity: P1 Impact: control_group="last_cohort" is not actually honored when never-treated units exist. The code only warns and keeps the never-treated design, but the fitted result still stores control_group="last_cohort" and the summary prints that label. That is an undocumented mismatch between the reported identification strategy and the one actually used. Concrete fix: either implement last-cohort reclassification whenever control_group="last_cohort" is requested, or reject/coerce the mixed-sample case and record/report the actual control group used. Add a test that inspects result.groups, n_control_units, and result.control_group on a sample that contains both never-treated units and a last cohort. diff_diff/efficient_did.py:L386-L416, diff_diff/efficient_did.py:L877-L900, diff_diff/efficient_did_results.py:L189-L190, docs/methodology/REGISTRY.md:L623-L624, docs/methodology/REGISTRY.md:L679-L679
Severity: P1 Impact: hausman_pretest() derives n_units as result_all.n_obs // len(result_all.time_periods). That only works when the stored time_periods still match the original balanced panel. Under control_group="last_cohort", fit() trims time_periods but still stores n_obs=len(df) from the untrimmed panel, so the Hausman covariance matrices are divided by too large an n^2, which inflates H and deflates p-values with no warning. Concrete fix: recover n_units from the EIF vectors themselves (len(next(iter(eif_all.values())))) or from n_treated_units + n_control_units, and add a hausman_pretest(..., control_group="last_cohort") regression test on an all-treated panel. diff_diff/efficient_did.py:L398-L403, diff_diff/efficient_did.py:L884-L886, diff_diff/efficient_did.py:L1288-L1297

Code Quality

Severity: P1 Impact: The new clustered path lacks basic input validation. The cluster column is not included in required_cols; missing cluster IDs are not rejected; .groupby(unit)[cluster].first() silently discards within-unit cluster changes; and a single cluster degenerates to se=0.0 instead of a hard failure. Those are missing assumption checks in a methodology-sensitive inference path. Concrete fix: add shared validation in both fit() and hausman_pretest() for column existence, non-missing values, per-unit constancy, and n_clusters >= 2 before factorizing cluster IDs. diff_diff/efficient_did.py:L317-L320, diff_diff/efficient_did.py:L422-L429, diff_diff/efficient_did.py:L1334-L1341, diff_diff/triple_diff.py:L466-L468, diff_diff/triple_diff.py:L907-L914

Performance

No findings.

Maintainability

Severity: P2 Impact: Cluster preprocessing/covariance logic is duplicated between fit() and hausman_pretest(), and the two copies have already drifted into different formulas/validation behavior. That duplication is what made the clustered fit() SEs and clustered Hausman covariance internally inconsistent in this PR. Concrete fix: after fixing correctness, extract one shared helper for validated unit-level cluster mapping and one shared helper for clustered EIF covariance/SE computation. diff_diff/efficient_did.py:L81-L98, diff_diff/efficient_did.py:L422-L429, diff_diff/efficient_did.py:L1334-L1356

Tech Debt

No new deferrable finding. TODO.md does not track the P1 correctness issues above, and TODO tracking would not mitigate them anyway. TODO.md:L44-L55

Security

No findings.

Documentation/Tests

Severity: P2 Impact: The added tests are too weak to validate the new inference math. test_single_unit_clusters_match_unclustered() only exercises the special case where the current clustered-scaling bug cancels out, and test_clustered_se_at_least_as_large() only checks direction. There is also no test for invalid cluster inputs, no assertion that mixed-sample control_group="last_cohort" actually changes the control composition, and no Hausman test coverage for last_cohort. Concrete fix: add (1) a manual multi-unit cluster Liang-Zeger test, (2) one-cluster / NA-cluster / time-varying-cluster validation tests, (3) a mixed-sample last-cohort semantics test that inspects the actual control composition, and (4) a Hausman+last_cohort regression test. tests/test_efficient_did.py:L760-L799, tests/test_efficient_did.py:L879-L937, tests/test_efficient_did.py:L579-L701

Path to Approval

Fix _compute_se_from_eif() so clustered analytical SEs use an n^2-scaled cluster-sum variance, and add a hand-checked multi-unit cluster test.
Add cluster input validation in both fit() and hausman_pretest() for existence, non-missing IDs, per-unit constancy, and at least 2 clusters.
Make control_group="last_cohort" semantics truthful on mixed samples: either implement it fully there, or reject/coerce the combination and report the actual control group used.
Fix hausman_pretest() to recover n_units from EIF/result counts rather than n_obs // len(time_periods), and add an all-treated last_cohort Hausman regression test.

… cluster validation Address all P1/P2 findings from PR #230 review: P1 fixes: - Fix clustered SE: sum(centered^2)/n^2 instead of mean(centered^2)/n - Fix control_group="last_cohort": always apply reclassification (not just when no never-treated exist) - Fix hausman_pretest n_units: derive from EIF vector length, not n_obs/n_periods (which breaks under last_cohort trimming) - Add cluster input validation: column existence, no NaN, per-unit constancy, n_clusters >= 2 P2 fixes: - Extract shared _validate_and_build_cluster_mapping() used by both fit() and hausman_pretest() - Add 6 new tests: manual Liang-Zeger, cluster validation (4 cases), Hausman+last_cohort, mixed-sample reclassification Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-22T14:22:33Z

/ai-review

github-actions · 2026-03-22T14:30:28Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2cda8b9848c24806865c24eb8ea1ddfa4526ad64

Overall Assessment

⚠️ Needs changes

Executive Summary

The PR’s three methodology deviations are documented in the Methodology Registry, so I am treating them as informational only, not defects. docs/methodology/REGISTRY.md:L667-L679
The prior review’s main correctness findings appear resolved: clustered EIF SEs now use the n^2 scaling, last_cohort is actually applied, Hausman now derives n_units from stored EIFs, and cluster validation is centralized. diff_diff/efficient_did.py:L48-L76, diff_diff/efficient_did.py:L112-L129, diff_diff/efficient_did.py:L416-L431, diff_diff/efficient_did.py:L1307-L1368
control_group="last_cohort" still lacks empty-treatment handling: on an all-never-treated sample, the new branch reaches max(treatment_groups) and fails with an unhelpful runtime error instead of a targeted validation error. diff_diff/efficient_did.py:L416-L421
The new last_cohort behavior is not tested across the aggregation and bootstrap code it now feeds, so regressions in aggregate="event_study", aggregate="group", aggregate="all", or n_bootstrap > 0 would not currently be caught. diff_diff/efficient_did.py:L788-L840, diff_diff/efficient_did.py:L1038-L1202, tests/test_efficient_did.py:L951-L1027
Minor result-container drift remains under last_cohort: n_obs still reports the untrimmed input size, and the influence_functions docs still describe the old ndarray contract. diff_diff/efficient_did.py:L423-L426, diff_diff/efficient_did.py:L901-L917, diff_diff/efficient_did_results.py:L82-L109

Static review only: I could not execute the test suite here because pytest is not installed in this environment.

Methodology

Severity: P3 Impact: The PR’s intentional deviations are properly documented: Liang-Zeger clustered EIF SEs, pseudoinverse Hausman inversion, and last-cohort pseudo-control. Per the review rules, those are informational only and do not count as defects. Concrete fix: None. docs/methodology/REGISTRY.md:L667-L679

Code Quality

Severity: P1 Impact: The new control_group="last_cohort" branch assumes there is at least one treated cohort and immediately calls max(treatment_groups). If a filtered dataset contains only never-treated units, this new path raises max() arg is an empty sequence instead of a clear ValueError. That is the empty-result-set failure mode the review checklist calls out for new code paths. Concrete fix: add if not treatment_groups: before max(...), raise a targeted validation error, and add a regression test for an all-never-treated panel using control_group="last_cohort". diff_diff/efficient_did.py:L416-L421

Performance

No findings.

Maintainability

Severity: P2 Impact: Under control_group="last_cohort", estimation trims time_periods, but the result object still stores n_obs=len(df) from the untrimmed panel. That breaks the result container’s own units x periods contract and makes the summary internally inconsistent. Concrete fix: store the effective estimation sample size after trimming, or add separate raw/effective observation fields and update the summary/docstrings accordingly. diff_diff/efficient_did.py:L423-L426, diff_diff/efficient_did.py:L901-L905, diff_diff/efficient_did_results.py:L82-L87, diff_diff/efficient_did_results.py:L181-L185

Tech Debt

Severity: P3 Impact: The previously deferred small-cohort warning item was removed from TODO.md, and the corresponding warning is now implemented in the estimator. This is properly closed tech debt, not a blocker. Concrete fix: None. TODO.md:L48-L55, diff_diff/efficient_did.py:L486-L503

Security

No findings.

Documentation/Tests

Severity: P1 Impact: The new control_group="last_cohort" parameter is only exercised on the analytical aggregate=None path. The added tests do not cover its interaction with aggregate="event_study", aggregate="group", aggregate="all", or n_bootstrap > 0, even though the PR threads trimmed treatment_groups and time_periods through those exact aggregation/bootstrap paths. By the review checklist’s parameter-interaction rule, this is insufficient coverage for a new estimator control. Concrete fix: add regression tests for EfficientDiD(control_group="last_cohort") with each aggregate mode and with bootstrap enabled, asserting that the reclassified last cohort never appears in treated outputs and that aggregated/bootstrap inference remains finite. diff_diff/efficient_did.py:L788-L840, diff_diff/efficient_did.py:L1038-L1202, tests/test_efficient_did.py:L951-L1027
Severity: P3 Impact: The public docs are only partially updated for the new API surface. fit() now accepts store_eif, and EfficientDiDResults.influence_functions now stores a dict keyed by (g, t), but the docstrings still describe the old ndarray-style storage. Concrete fix: document store_eif in EfficientDiD.fit() and update the result-container attribute docs/types to match the new dict contract. diff_diff/efficient_did.py:L293-L304, diff_diff/efficient_did_results.py:L108-L109, diff_diff/efficient_did_results.py:L149-L152

Path to Approval

Add an explicit empty-treatment guard in the last_cohort branch before max(treatment_groups), and cover it with a regression test on an all-never-treated panel.
Add control_group="last_cohort" regression tests for aggregate="event_study", aggregate="group", aggregate="all", and n_bootstrap > 0, verifying both treated-group membership and finite inference outputs.

igerber · 2026-03-22T14:52:52Z

/ai-review

github-actions · 2026-03-22T15:00:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 71bf84a1de5da6582a57aab68205dc164c7fd6fd

Overall Assessment

⚠️ Needs changes

Executive Summary

The three explicit methodology deviations in this PR are documented in the Methodology Registry, so they are informational only rather than defects: clustered Liang-Zeger EIF variance, Moore-Penrose inversion in the Hausman test, and control_group="last_cohort". docs/methodology/REGISTRY.md:667 docs/methodology/REGISTRY.md:677
The prior last_cohort blockers appear resolved: the all-never-treated branch now raises a targeted error, and the new tests cover event-study/all/bootstrap interactions for last_cohort. diff_diff/efficient_did.py:418 tests/test_efficient_did.py:1029 tests/test_efficient_did.py:1035 tests/test_efficient_did.py:1069
P1: hausman_pretest() still mishandles partially empty result sets. fit() stores “no valid comparison pairs” cells as effect=np.nan with zero EIFs, but the pretest filters only on EIF finiteness, so a shared NaN-effect cell can poison delta and turn the whole Hausman statistic into NaN even when other common cells are estimable. diff_diff/efficient_did.py:600 diff_diff/efficient_did.py:1321 diff_diff/efficient_did.py:1340 diff_diff/efficient_did.py:1414
P1: the new clustered inference path is only tested on the overall ATT and non-aggregated bootstrap. The event-study/group aggregation code it now changes is still untested. diff_diff/efficient_did.py:808 diff_diff/efficient_did.py:821 diff_diff/efficient_did.py:833 tests/test_efficient_did.py:774
Static review only: I could not run the test suite here because the environment is missing pytest and core runtime dependencies.

Methodology

Severity: P1. Impact: hausman_pretest() does not drop common (g,t) cells whose effect estimate is already non-finite. Because fit() records no-valid-pairs cells as effect=np.nan with a zero EIF vector, those cells survive the EIF-only filter and can make delta, H, and p_value become NaN instead of testing the remaining estimable common cells. This is an unhandled empty-result-set edge case in the new methodology path. Concrete fix: before forming delta, exclude any common (g,t) where either fit has a non-finite effect, warn about the dropped cells, and return an explicit inconclusive result only if no finite common cells remain. diff_diff/efficient_did.py:600 diff_diff/efficient_did.py:1289 diff_diff/efficient_did.py:1321 diff_diff/efficient_did.py:1340
Severity: P3. Impact: The PR’s intended deviations from the paper are documented in REGISTRY.md, so they are informational only under the review policy. Concrete fix: None. docs/methodology/REGISTRY.md:677 docs/methodology/REGISTRY.md:678 docs/methodology/REGISTRY.md:679

Code Quality

Severity: P2. Impact: the _nan_result() helper returns reject=False while defaulting recommendation="pt_post", so inconclusive Hausman outputs can tell different stories depending on which field downstream code reads. Concrete fix: add an explicit inconclusive state, or keep reject and recommendation internally consistent on all NaN-return paths. diff_diff/efficient_did.py:1294

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: the broader EfficientDiD public docs/tutorial debt is still tracked in TODO.md, so it is non-blocking under the review rules. Concrete fix: None required in this PR unless you want to close that tracked item too. TODO.md:53

Security

No findings.

Documentation/Tests

Severity: P1. Impact: the new clustered inference path is not exercised through the event-study/group aggregation code it now modifies. Current tests cover clustered overall ATT and clustered non-aggregated bootstrap, but not aggregate="event_study", aggregate="group", or clustered bootstrap on aggregated outputs. Per the parameter-interaction rule in the review checklist, that is insufficient coverage for a new inference path. Concrete fix: add clustered regression tests for aggregate="event_study", aggregate="group", aggregate="all", and clustered bootstrap on aggregated outputs; at minimum assert finite aggregated inference and that the clustered path is actually exercised (for example, single-unit clusters match unclustered aggregates, while multi-unit clustered DGPs change aggregated SEs). diff_diff/efficient_did.py:808 diff_diff/efficient_did.py:821 diff_diff/efficient_did.py:833 tests/test_efficient_did.py:774
Severity: P3. Impact: fit() now exposes store_eif, and the failure modes now include control_group='last_cohort', but the fit() docstring still describes the older contract. Concrete fix: document store_eif and update the Raises text to match the current API. diff_diff/efficient_did.py:293 diff_diff/efficient_did.py:334

Path to Approval

In hausman_pretest(), filter out common (g,t) cells with non-finite effect estimates before building delta/covariances, warn when any are dropped, and make the inconclusive return state internally consistent.
Add clustered aggregation tests for aggregate="event_study", aggregate="group", aggregate="all", and clustered bootstrap on aggregated outputs.

…ohort warning Complete the EfficientDiD roadmap with four features: - Cluster-robust SEs: analytical Liang-Zeger from EIF + cluster-level bootstrap - Last-cohort-as-control: control_group="last_cohort" for all-treated datasets - Hausman pretest: classmethod with full covariance (Theorem A.1) for PT assumption selection - Small cohort warning: warn when cohort < 2 units or < 1% of sample Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Subset the cluster index vector with the row_finite mask before computing clustered covariance matrices in hausman_pretest(). Without this fix, np.bincount would receive mismatched array lengths when non-finite EIF rows are dropped. Add regression test for clustered Hausman pretest. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… covariance, update docstrings - Remove _cluster_indices, _n_clusters, _store_eif instance attributes; pass cluster info explicitly through aggregation methods and store_eif as a fit() parameter - Extract shared _cluster_aggregate() helper used by both _compute_se_from_eif (scalar SE) and hausman_pretest (covariance matrix) - Update class docstring to document cluster, control_group parameters Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… cluster validation Address all P1/P2 findings from PR #230 review: P1 fixes: - Fix clustered SE: sum(centered^2)/n^2 instead of mean(centered^2)/n - Fix control_group="last_cohort": always apply reclassification (not just when no never-treated exist) - Fix hausman_pretest n_units: derive from EIF vector length, not n_obs/n_periods (which breaks under last_cohort trimming) - Add cluster input validation: column existence, no NaN, per-unit constancy, n_clusters >= 2 P2 fixes: - Extract shared _validate_and_build_cluster_mapping() used by both fit() and hausman_pretest() - Add 6 new tests: manual Liang-Zeger, cluster validation (4 cases), Hausman+last_cohort, mixed-sample reclassification Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… n_obs Address rerun review findings: - Guard against empty treatment_groups before max() in last_cohort branch - Fix n_obs to report effective sample size (n_units * n_periods) after trimming - Add last_cohort tests for aggregate=event_study/all and bootstrap - Add all-never-treated + last_cohort regression test - Update influence_functions docstring to match dict contract Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Address rerun review findings: - Filter common (g,t) cells with non-finite effect estimates before building Hausman delta/covariance (prevents NaN poisoning from no-valid-pairs cells) - Make _nan_result return recommendation="inconclusive" for consistency (reject=False + "pt_post" was misleading) - Add clustered SE tests for aggregate=event_study and aggregate=all (exercises cluster path through aggregation methods) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-22T16:18:30Z

/ai-review

github-actions · 2026-03-22T16:27:55Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 94be0c0a75f979dbf573efca162e3204df542be3

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous empty-result-set blocker in hausman_pretest() looks fixed: non-finite common cells are now filtered before delta is built, and inconclusive paths return an explicit recommendation. diff_diff/efficient_did.py:L1478, diff_diff/efficient_did.py:L1518
The three explicit EfficientDiD deviations called out in the PR body are documented in the Methodology Registry, so they are informational only: Liang-Zeger EIF clustering, Moore-Penrose inversion for non-PSD V, and control_group="last_cohort". docs/methodology/REGISTRY.md:L682, docs/methodology/REGISTRY.md:L683, docs/methodology/REGISTRY.md:L684
P1: hausman_pretest() still does not implement the cited Theorem A.1. The paper’s theorem is written for the post-treatment event-study vector ES(e), while the code forms the statistic over all common (g,t) cells and therefore also pulls in pre-treatment placebo cells from group_time_effects. diff_diff/efficient_did.py:L682, diff_diff/efficient_did.py:L1457, diff_diff/efficient_did.py:L1518 citeturn2view0turn3view0
P1: cluster is silently ignored when survey_design is present; the new clustered SE path never threads cluster IDs into the survey design and _eif_se() always takes the survey-TSL branch once _unit_resolved_survey exists. diff_diff/efficient_did.py:L351, diff_diff/efficient_did.py:L503, diff_diff/efficient_did.py:L1092, docs/methodology/REGISTRY.md:L1929
P1: the earlier coverage blocker is only partially closed. New tests cover analytical clustered aggregation and overall clustered bootstrap, but not clustered bootstrap on aggregated outputs. diff_diff/efficient_did.py:L952, diff_diff/efficient_did_bootstrap.py:L98, tests/test_efficient_did.py:L806, tests/test_efficient_did.py:L827
Static review only: I could not execute tests here because this environment is missing pytest and core runtime dependencies.

Methodology

Severity: P1. Impact: hausman_pretest() claims Theorem A.1, but Theorem A.1 in Chen, Sant’Anna, and Xie is defined on the post-treatment event-study vector ES(e). The implementation instead intersects all stored group_time_effects, and because fit() estimates every (g,t) cell including pre-treatment placebos, the Hausman statistic is built on raw common (g,t) cells rather than the theorem’s event-study vector. That is an undocumented methodology mismatch from the cited source. Concrete fix: either aggregate to post-treatment ES(e) before constructing H, or explicitly document and rename this as a footnote-22 ATT(g,t) variant and restrict it to post-treatment ATT cells. diff_diff/efficient_did.py:L682, diff_diff/efficient_did.py:L1457, diff_diff/efficient_did.py:L1490, docs/methodology/REGISTRY.md:L683 citeturn2view0turn3view0
Severity: P1. Impact: the new clustered inference path does not interact correctly with survey inference. fit() resolves survey_design first, builds a unit-level resolved survey design with no cluster injection, and _eif_se() then always uses survey TSL whenever _unit_resolved_survey is set. So cluster is silently ignored for survey_design fits, which yields the wrong variance/df behavior for weights-only designs and skips the PSU-vs-cluster resolution behavior already used elsewhere in the library. Concrete fix: follow the existing pattern in diff_diff/twfe.py:L125, diff_diff/twfe.py:L173, and diff_diff/twfe.py:L187 by threading cluster IDs through _resolve_effective_cluster() and _inject_cluster_as_psu() before constructing the unit-level survey design, or raise NotImplementedError for cluster + survey_design until supported. diff_diff/efficient_did.py:L351, diff_diff/efficient_did.py:L503, diff_diff/efficient_did.py:L1092, docs/methodology/REGISTRY.md:L1929
Severity: P3. Impact: the PR’s explicit EfficientDiD deviations are documented in the registry, so they are non-blocking under the review policy. Concrete fix: none. docs/methodology/REGISTRY.md:L682, docs/methodology/REGISTRY.md:L683, docs/methodology/REGISTRY.md:L684

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: clustered bootstrap coverage is still incomplete for the new aggregation/inference combinations. The code now passes cluster_indices into _run_multiplier_bootstrap() and writes bootstrap SE/CI/p-values back into event_study_effects and group_effects, but the new tests stop at analytical clustered aggregation and a non-aggregated clustered bootstrap fit. Per the parameter-interaction rule, that is still insufficient coverage for the new clustered inference path. Concrete fix: add at least one cluster=..., n_bootstrap>0, aggregate="all" regression test that asserts finite bootstrap-updated event_study_effects and group_effects, plus a control case such as cluster_id = unit showing the aggregated clustered bootstrap collapses to the unclustered result. diff_diff/efficient_did.py:L952, diff_diff/efficient_did.py:L978, diff_diff/efficient_did_bootstrap.py:L98, tests/test_efficient_did.py:L806, tests/test_efficient_did.py:L827
Severity: P3. Impact: the public fit() docstring still omits the new store_eif parameter and still describes the old never-treated error condition, even though control_group="last_cohort" now changes that behavior. Concrete fix: document store_eif, update the Raises text, and describe the new control-group behavior in the fit() API docs. diff_diff/efficient_did.py:L294, diff_diff/efficient_did.py:L305, diff_diff/efficient_did.py:L343

Static review only; I could not run the suite because pytest and runtime dependencies are unavailable in this environment.

Path to Approval

Rework hausman_pretest() to match Theorem A.1 on post-treatment ES(e) estimates, or explicitly document and rename the current routine as a footnote-22 ATT(g,t) variant and restrict it to post-treatment ATT cells.
Make cluster and survey_design interact correctly by resolving effective clusters / injecting PSUs before unit-level TSL, or reject that combination with NotImplementedError.
Add clustered n_bootstrap>0 regression coverage for aggregated outputs, preferably one aggregate="all" test that checks bootstrap-updated event-study and group inference fields.

Address rerun-3 review findings: P1 fixes: - Hausman pretest now aggregates to post-treatment event-study ES(e) before computing test statistic, matching Theorem A.1 (was using raw (g,t) cells including pre-treatment placebos) - Raise NotImplementedError when both cluster and survey_design are set (cluster was silently ignored under survey TSL dispatch) - Add clustered bootstrap+aggregate='all' test covering bootstrap-updated event-study and group effect SEs P2: Update REGISTRY.md Hausman note to describe ES(e) aggregation. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-22T16:49:15Z

/ai-review

github-actions · 2026-03-22T16:58:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bf427af3a37d4266d642136afe6db15bf2b7d568

Overall Assessment

⚠️ Needs changes

Executive Summary

This rer-review closes the previous Hausman scope blocker: the pretest now works on event-study horizons rather than raw (g,t) cells.
The previous cluster + survey_design blocker is also closed by an explicit NotImplementedError.
P1: the new Hausman covariance is still mis-specified. hausman_pretest() aggregates ES(e) effects with cohort-share weights, but its ES(e) influence function omits the weight-influence-function correction that the library already uses for analytical ES aggregation. That can make the Hausman statistic, p-value, and PT-All/PT-Post recommendation wrong.
The earlier clustered-bootstrap aggregation coverage gap looks addressed by the new clustered aggregate="all" regression test.
Minor docs remain out of sync around store_eif, inconclusive, and last-cohort wording.

Methodology

Severity: P1. Impact: hausman_pretest() now aggregates to ES(e), but it does not use the full ES(e) EIF when building the Hausman covariance. The analytical ES path adds _compute_wif_contribution() at diff_diff/efficient_did.py:L1298 because cohort-size weights are estimated. In contrast, hausman_pretest() rebuilds ES(e) at diff_diff/efficient_did.py:L1499, accumulates es_eif as a plain weighted sum at diff_diff/efficient_did.py:L1530, and then forms cov_all, cov_post, and V at diff_diff/efficient_did.py:L1577 and diff_diff/efficient_did.py:L1584. The registry also says the Hausman covariance is computed from aggregated ES(e)-level EIF values at docs/methodology/REGISTRY.md:L683. Because the paper’s Hausman test is defined on the ES vector and its covariance matrices, omitting the WIF term makes H, its p-value, and the recommendation unreliable whenever multiple cohorts load into the same post-treatment horizon. Concrete fix: factor out a shared ES aggregation helper and reuse the WIF-adjusted ES(EIF) construction inside hausman_pretest() before computing cov_all, cov_post, and V. citeturn2search12turn0search12
Severity: P3. Impact: the PR’s explicit cluster-robust SE path, pseudoinverse handling for non-PSD Hausman V, and last_cohort control behavior are documented in the Methodology Registry at docs/methodology/REGISTRY.md:L682, docs/methodology/REGISTRY.md:L683, and docs/methodology/REGISTRY.md:L684, so I did not treat those as defects. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the public docs for the new surface area are still inconsistent. fit() adds store_eif at diff_diff/efficient_did.py:L305 but the parameter block starting at diff_diff/efficient_did.py:L307 does not document it; HausmanPretestResult says recommendation is only "pt_all" or "pt_post" at diff_diff/efficient_did_results.py:L42, but the code also returns "inconclusive" at diff_diff/efficient_did.py:L1468; and REGISTRY.md says last-cohort control works by “dropping last time period” at docs/methodology/REGISTRY.md:L625, while the later note at docs/methodology/REGISTRY.md:L684 and the paper describe dropping periods from when the last cohort is treated onward. Concrete fix: document store_eif, document the inconclusive return state, and align the last-cohort wording across the registry and API docs. citeturn1search12

Path to Approval

Update hausman_pretest() to build ES(e)-level EIFs with the same WIF adjustment used by _aggregate_event_study(), then recompute cov_all, cov_post, V, H, and the recommendation from those corrected EIFs.
Add a regression test on a multi-cohort staggered panel where at least two cohorts contribute to the same post-treatment horizon, and assert the Hausman covariance/statistic matches a direct WIF-adjusted ES(EIF) construction.
Update the public docs to cover store_eif, inconclusive, and the exact last-cohort trimming rule.

Static review only: I could not run the targeted suite here because pytest is unavailable and the local python environment is missing numpy.

P1: hausman_pretest ES(e) EIF now includes the weight influence function correction matching _aggregate_event_study's WIF pattern. Without this, the Hausman covariance was underestimated when multiple cohorts load into the same post-treatment horizon. P3 doc fixes: - Document store_eif parameter in fit() docstring - Document "inconclusive" return state in HausmanPretestResult - Fix REGISTRY.md last-cohort wording to say "from treatment onset onward" instead of "last time period" Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-22T17:13:02Z

/ai-review

github-actions · 2026-03-22T17:20:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 56beb422bcf69b2985a41583b0870e9db1bac3ab

Overall Assessment

✅ Looks good

Executive Summary

This re-review closes the prior Hausman blocker: hausman_pretest() now aggregates to ES(e), adds the cohort-weight WIF correction, and builds the covariance difference from ES-level EIFs in diff_diff/efficient_did.py:L1298-L1307 and diff_diff/efficient_did.py:L1503-L1599.
The new clustered SE path, last-cohort pseudo-control, and Moore-Penrose Hausman handling are documented in the Methodology Registry, so I treated them as documented deviations rather than defects.
I did not find new inline-inference or partial-NaN anti-patterns in the changed EfficientDiD paths; the new inference code consistently goes through safe_inference.
The new edge cases are meaningfully covered in the diff: clustered inference, invalid cluster metadata, last-cohort fallback, Hausman output states, and small-cohort warnings are all exercised in tests/test_efficient_did.py:L580-L920.
Static review only: I could not run the targeted test subset here because pytest is not installed in this environment.

Methodology

Severity: P3. Impact: The prior P1 on Hausman covariance specification is resolved. The implementation now constructs the post-treatment ES(e) vector, applies the cohort-weight WIF correction at the ES level, and then forms V = Cov(ES_post) - Cov(ES_all) in diff_diff/efficient_did.py:L1503-L1599. That matches the paper’s Theorem A.1 framing of the Hausman test on event-study aggregations rather than raw (g,t) cells. Concrete fix: none. (web3.arxiv.org)
Severity: P3. Impact: The PR’s main departures from the paper are explicitly documented in docs/methodology/REGISTRY.md:L669-L684: clustered Liang-Zeger EIF-based SEs with cluster-level multiplier bootstrap, pseudoinverse/effective-rank handling for non-PSD Hausman V, and control_group="last_cohort". Under the review rules, those are informational, not defects. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The earlier documentation mismatches are fixed in the changed files. store_eif is now documented in diff_diff/efficient_did.py:L305-L339, the "inconclusive" Hausman state is documented in diff_diff/efficient_did_results.py:L20-L45, and the last-cohort wording is aligned in docs/methodology/REGISTRY.md:L625-L684. Concrete fix: none.
No blocking test gaps in the changed EfficientDiD surface. The added coverage in tests/test_efficient_did.py:L580-L920 is appropriate for the new behavior. Concrete fix: none.

…e-existing Hausman bug, add inference tests - Add REGISTRY.md deviation note: CS per-cell ATT(g,t) SEs use influence-function variance (matching R's did), not full TSL with strata/PSU/FPC - Add TODO entries: CS per-cell TSL SE (Medium), Hausman stale n_cl from #230 (Medium) - Add 3 CS survey inference validation tests (scale invariance, per-cell ATT change, survey df effect) - Remove unused survey_weight_type in ImputationDiD Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The clustered SE formula fix shipped in PR #230 (v2.7.4). The remaining stale-indices issue is still tracked in TODO.md and not fixed in this release. Co-Authored-By: Claude Opus 4.5 <[email protected]>

igerber and others added 6 commits March 22, 2026 11:24

igerber force-pushed the edid-roadmap branch from 71bf84a to 94be0c0 Compare March 22, 2026 16:13

igerber merged commit 937dae6 into main Mar 22, 2026
14 checks passed

igerber deleted the edid-roadmap branch March 22, 2026 17:57

igerber mentioned this pull request Mar 22, 2026

Phase 4: Survey support for ImputationDiD, TwoStageDiD, CallawaySantAnna #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning#230

EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning#230
igerber merged 8 commits intomainfrom
edid-roadmap

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 22, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

igerber commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant