Skip to content

EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning#230

Merged
igerber merged 8 commits intomainfrom
edid-roadmap
Mar 22, 2026
Merged

EfficientDiD: cluster-robust SEs, last-cohort control, Hausman pretest, small cohort warning#230
igerber merged 8 commits intomainfrom
edid-roadmap

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 22, 2026

Summary

  • Add cluster-robust SEs: analytical Liang-Zeger from EIF values + cluster-level multiplier bootstrap
  • Add control_group="last_cohort" fallback for datasets with no never-treated units
  • Add hausman_pretest() classmethod for PT-All vs PT-Post assumption selection (Theorem A.1, full covariance)
  • Add small cohort warnings when cohort has < 2 units or < 1% of sample
  • Extract shared _cluster_aggregate() / _compute_se_from_eif() helpers; pass cluster info explicitly (no mutable instance state)

Methodology references (required if estimator / math changes)

  • Method name(s): EfficientDiD (Chen, Sant'Anna & Xie 2025)
  • Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). "Efficient Difference-in-Differences and Event Study Estimators"
  • Any intentional deviations from the source (and why):
    • Cluster-robust SEs use Liang-Zeger on EIF values (documented in REGISTRY.md)
    • Hausman pretest uses pseudoinverse for non-PSD variance-difference matrix (documented in REGISTRY.md)
    • Last-cohort-as-control reclassifies last cohort as pseudo-never-treated (documented in REGISTRY.md)

Validation

  • Tests added/updated: tests/test_efficient_did.py (+355 lines, 21 new tests across TestHausmanPretest, TestClusterRobustSE, TestLastCohortControl, TestSmallCohortWarning)
  • Existing validation unchanged: tests/test_efficient_did_validation.py (HRS replication still passes)
  • Local AI review: ✅ Looks good (no P0/P1 findings on final review)

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link

Overall Assessment

⚠️ Needs changes

Executive Summary

  • The PR’s three stated methodology deviations are all documented in the Methodology Registry, so those deviations themselves are not defects. docs/methodology/REGISTRY.md:L677-L679
  • Analytical clustered SEs are mis-scaled in the new shared helper: cluster-summed EIF variance is divided by n_units instead of an n_units^2-scaled term, so clustered inference from fit() is wrong for multi-unit clusters. diff_diff/efficient_did.py:L81-L98
  • control_group="last_cohort" is only applied when there are no never-treated units, but the results object still reports last_cohort on mixed samples. That is an undocumented control-group mismatch. diff_diff/efficient_did.py:L386-L416
  • hausman_pretest() reconstructs n_units from n_obs // len(time_periods), which becomes wrong after last_cohort trims the usable time periods and silently mis-scales the Hausman covariance/test statistic. diff_diff/efficient_did.py:L1288-L1297
  • The new tests do not cover the multi-unit cluster formula, invalid cluster inputs, mixed-sample last_cohort semantics, or hausman_pretest(..., control_group="last_cohort"), so they would not catch the issues above. tests/test_efficient_did.py:L760-L799

Methodology

  • Severity: P3 Impact: The PR’s intentional deviations are properly documented in the registry: Liang-Zeger-on-EIF clustered SEs, pseudoinverse Hausman inversion, and last-cohort pseudo-control. Per the review rules, I am treating those as informational only, not defects. Concrete fix: None. docs/methodology/REGISTRY.md:L667-L679

  • Severity: P1 Impact: _compute_se_from_eif() does not implement the clustered EIF variance implied by the registry’s analytical SE definition. The unclustered path uses the expected mean(EIF^2) / n = sum(EIF^2) / n^2, but the clustered path computes mean(psi_c^2) / n, which is too large by roughly the average cluster size in balanced data. This affects ATT(g,t), overall ATT, event-study, and group analytical clustered SEs. The same file’s Hausman covariance helper uses the correct n_units**2 denominator, so the PR now has two inconsistent clustered inference formulas. Concrete fix: change the clustered branch to an n_units^2-scaled cluster-sum variance, e.g. correction * np.sum(centered**2) / (n_units**2) (or the algebraically equivalent correction * n_clusters * np.mean(centered**2) / (n_units**2)), then add a hand-computed Liang-Zeger regression test with clusters containing multiple units. diff_diff/efficient_did.py:L81-L98, diff_diff/efficient_did.py:L750-L752, diff_diff/efficient_did.py:L1016-L1018, diff_diff/efficient_did.py:L1353-L1356, docs/methodology/REGISTRY.md:L593-L602, docs/methodology/REGISTRY.md:L677-L679

  • Severity: P1 Impact: control_group="last_cohort" is not actually honored when never-treated units exist. The code only warns and keeps the never-treated design, but the fitted result still stores control_group="last_cohort" and the summary prints that label. That is an undocumented mismatch between the reported identification strategy and the one actually used. Concrete fix: either implement last-cohort reclassification whenever control_group="last_cohort" is requested, or reject/coerce the mixed-sample case and record/report the actual control group used. Add a test that inspects result.groups, n_control_units, and result.control_group on a sample that contains both never-treated units and a last cohort. diff_diff/efficient_did.py:L386-L416, diff_diff/efficient_did.py:L877-L900, diff_diff/efficient_did_results.py:L189-L190, docs/methodology/REGISTRY.md:L623-L624, docs/methodology/REGISTRY.md:L679-L679

  • Severity: P1 Impact: hausman_pretest() derives n_units as result_all.n_obs // len(result_all.time_periods). That only works when the stored time_periods still match the original balanced panel. Under control_group="last_cohort", fit() trims time_periods but still stores n_obs=len(df) from the untrimmed panel, so the Hausman covariance matrices are divided by too large an n^2, which inflates H and deflates p-values with no warning. Concrete fix: recover n_units from the EIF vectors themselves (len(next(iter(eif_all.values())))) or from n_treated_units + n_control_units, and add a hausman_pretest(..., control_group="last_cohort") regression test on an all-treated panel. diff_diff/efficient_did.py:L398-L403, diff_diff/efficient_did.py:L884-L886, diff_diff/efficient_did.py:L1288-L1297

Code Quality

Performance

  • No findings.

Maintainability

  • Severity: P2 Impact: Cluster preprocessing/covariance logic is duplicated between fit() and hausman_pretest(), and the two copies have already drifted into different formulas/validation behavior. That duplication is what made the clustered fit() SEs and clustered Hausman covariance internally inconsistent in this PR. Concrete fix: after fixing correctness, extract one shared helper for validated unit-level cluster mapping and one shared helper for clustered EIF covariance/SE computation. diff_diff/efficient_did.py:L81-L98, diff_diff/efficient_did.py:L422-L429, diff_diff/efficient_did.py:L1334-L1356

Tech Debt

  • No new deferrable finding. TODO.md does not track the P1 correctness issues above, and TODO tracking would not mitigate them anyway. TODO.md:L44-L55

Security

  • No findings.

Documentation/Tests

  • Severity: P2 Impact: The added tests are too weak to validate the new inference math. test_single_unit_clusters_match_unclustered() only exercises the special case where the current clustered-scaling bug cancels out, and test_clustered_se_at_least_as_large() only checks direction. There is also no test for invalid cluster inputs, no assertion that mixed-sample control_group="last_cohort" actually changes the control composition, and no Hausman test coverage for last_cohort. Concrete fix: add (1) a manual multi-unit cluster Liang-Zeger test, (2) one-cluster / NA-cluster / time-varying-cluster validation tests, (3) a mixed-sample last-cohort semantics test that inspects the actual control composition, and (4) a Hausman+last_cohort regression test. tests/test_efficient_did.py:L760-L799, tests/test_efficient_did.py:L879-L937, tests/test_efficient_did.py:L579-L701

Path to Approval

  1. Fix _compute_se_from_eif() so clustered analytical SEs use an n^2-scaled cluster-sum variance, and add a hand-checked multi-unit cluster test.
  2. Add cluster input validation in both fit() and hausman_pretest() for existence, non-missing IDs, per-unit constancy, and at least 2 clusters.
  3. Make control_group="last_cohort" semantics truthful on mixed samples: either implement it fully there, or reject/coerce the combination and report the actual control group used.
  4. Fix hausman_pretest() to recover n_units from EIF/result counts rather than n_obs // len(time_periods), and add an all-treated last_cohort Hausman regression test.

igerber added a commit that referenced this pull request Mar 22, 2026
… cluster validation

Address all P1/P2 findings from PR #230 review:

P1 fixes:
- Fix clustered SE: sum(centered^2)/n^2 instead of mean(centered^2)/n
- Fix control_group="last_cohort": always apply reclassification (not
  just when no never-treated exist)
- Fix hausman_pretest n_units: derive from EIF vector length, not
  n_obs/n_periods (which breaks under last_cohort trimming)
- Add cluster input validation: column existence, no NaN, per-unit
  constancy, n_clusters >= 2

P2 fixes:
- Extract shared _validate_and_build_cluster_mapping() used by both
  fit() and hausman_pretest()
- Add 6 new tests: manual Liang-Zeger, cluster validation (4 cases),
  Hausman+last_cohort, mixed-sample reclassification

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 2cda8b9848c24806865c24eb8ea1ddfa4526ad64


Overall Assessment

⚠️ Needs changes

Executive Summary

Static review only: I could not execute the test suite here because pytest is not installed in this environment.

Methodology

  • Severity: P3 Impact: The PR’s intentional deviations are properly documented: Liang-Zeger clustered EIF SEs, pseudoinverse Hausman inversion, and last-cohort pseudo-control. Per the review rules, those are informational only and do not count as defects. Concrete fix: None. docs/methodology/REGISTRY.md:L667-L679

Code Quality

  • Severity: P1 Impact: The new control_group="last_cohort" branch assumes there is at least one treated cohort and immediately calls max(treatment_groups). If a filtered dataset contains only never-treated units, this new path raises max() arg is an empty sequence instead of a clear ValueError. That is the empty-result-set failure mode the review checklist calls out for new code paths. Concrete fix: add if not treatment_groups: before max(...), raise a targeted validation error, and add a regression test for an all-never-treated panel using control_group="last_cohort". diff_diff/efficient_did.py:L416-L421

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity: P3 Impact: The previously deferred small-cohort warning item was removed from TODO.md, and the corresponding warning is now implemented in the estimator. This is properly closed tech debt, not a blocker. Concrete fix: None. TODO.md:L48-L55, diff_diff/efficient_did.py:L486-L503

Security

  • No findings.

Documentation/Tests

  • Severity: P1 Impact: The new control_group="last_cohort" parameter is only exercised on the analytical aggregate=None path. The added tests do not cover its interaction with aggregate="event_study", aggregate="group", aggregate="all", or n_bootstrap > 0, even though the PR threads trimmed treatment_groups and time_periods through those exact aggregation/bootstrap paths. By the review checklist’s parameter-interaction rule, this is insufficient coverage for a new estimator control. Concrete fix: add regression tests for EfficientDiD(control_group="last_cohort") with each aggregate mode and with bootstrap enabled, asserting that the reclassified last cohort never appears in treated outputs and that aggregated/bootstrap inference remains finite. diff_diff/efficient_did.py:L788-L840, diff_diff/efficient_did.py:L1038-L1202, tests/test_efficient_did.py:L951-L1027
  • Severity: P3 Impact: The public docs are only partially updated for the new API surface. fit() now accepts store_eif, and EfficientDiDResults.influence_functions now stores a dict keyed by (g, t), but the docstrings still describe the old ndarray-style storage. Concrete fix: document store_eif in EfficientDiD.fit() and update the result-container attribute docs/types to match the new dict contract. diff_diff/efficient_did.py:L293-L304, diff_diff/efficient_did_results.py:L108-L109, diff_diff/efficient_did_results.py:L149-L152

Path to Approval

  1. Add an explicit empty-treatment guard in the last_cohort branch before max(treatment_groups), and cover it with a regression test on an all-never-treated panel.
  2. Add control_group="last_cohort" regression tests for aggregate="event_study", aggregate="group", aggregate="all", and n_bootstrap > 0, verifying both treated-group membership and finite inference outputs.

@igerber
Copy link
Owner Author

igerber commented Mar 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 71bf84a1de5da6582a57aab68205dc164c7fd6fd


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

Code Quality

  • Severity: P2. Impact: the _nan_result() helper returns reject=False while defaulting recommendation="pt_post", so inconclusive Hausman outputs can tell different stories depending on which field downstream code reads. Concrete fix: add an explicit inconclusive state, or keep reject and recommendation internally consistent on all NaN-return paths. diff_diff/efficient_did.py:1294

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3. Impact: the broader EfficientDiD public docs/tutorial debt is still tracked in TODO.md, so it is non-blocking under the review rules. Concrete fix: None required in this PR unless you want to close that tracked item too. TODO.md:53

Security

  • No findings.

Documentation/Tests

  • Severity: P1. Impact: the new clustered inference path is not exercised through the event-study/group aggregation code it now modifies. Current tests cover clustered overall ATT and clustered non-aggregated bootstrap, but not aggregate="event_study", aggregate="group", or clustered bootstrap on aggregated outputs. Per the parameter-interaction rule in the review checklist, that is insufficient coverage for a new inference path. Concrete fix: add clustered regression tests for aggregate="event_study", aggregate="group", aggregate="all", and clustered bootstrap on aggregated outputs; at minimum assert finite aggregated inference and that the clustered path is actually exercised (for example, single-unit clusters match unclustered aggregates, while multi-unit clustered DGPs change aggregated SEs). diff_diff/efficient_did.py:808 diff_diff/efficient_did.py:821 diff_diff/efficient_did.py:833 tests/test_efficient_did.py:774
  • Severity: P3. Impact: fit() now exposes store_eif, and the failure modes now include control_group='last_cohort', but the fit() docstring still describes the older contract. Concrete fix: document store_eif and update the Raises text to match the current API. diff_diff/efficient_did.py:293 diff_diff/efficient_did.py:334

Path to Approval

  1. In hausman_pretest(), filter out common (g,t) cells with non-finite effect estimates before building delta/covariances, warn when any are dropped, and make the inconclusive return state internally consistent.
  2. Add clustered aggregation tests for aggregate="event_study", aggregate="group", aggregate="all", and clustered bootstrap on aggregated outputs.

igerber and others added 6 commits March 22, 2026 11:24
…ohort warning

Complete the EfficientDiD roadmap with four features:
- Cluster-robust SEs: analytical Liang-Zeger from EIF + cluster-level bootstrap
- Last-cohort-as-control: control_group="last_cohort" for all-treated datasets
- Hausman pretest: classmethod with full covariance (Theorem A.1) for PT assumption selection
- Small cohort warning: warn when cohort < 2 units or < 1% of sample

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Subset the cluster index vector with the row_finite mask before computing
clustered covariance matrices in hausman_pretest(). Without this fix,
np.bincount would receive mismatched array lengths when non-finite EIF
rows are dropped.

Add regression test for clustered Hausman pretest.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… covariance, update docstrings

- Remove _cluster_indices, _n_clusters, _store_eif instance attributes;
  pass cluster info explicitly through aggregation methods and store_eif
  as a fit() parameter
- Extract shared _cluster_aggregate() helper used by both _compute_se_from_eif
  (scalar SE) and hausman_pretest (covariance matrix)
- Update class docstring to document cluster, control_group parameters

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… cluster validation

Address all P1/P2 findings from PR #230 review:

P1 fixes:
- Fix clustered SE: sum(centered^2)/n^2 instead of mean(centered^2)/n
- Fix control_group="last_cohort": always apply reclassification (not
  just when no never-treated exist)
- Fix hausman_pretest n_units: derive from EIF vector length, not
  n_obs/n_periods (which breaks under last_cohort trimming)
- Add cluster input validation: column existence, no NaN, per-unit
  constancy, n_clusters >= 2

P2 fixes:
- Extract shared _validate_and_build_cluster_mapping() used by both
  fit() and hausman_pretest()
- Add 6 new tests: manual Liang-Zeger, cluster validation (4 cases),
  Hausman+last_cohort, mixed-sample reclassification

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… n_obs

Address rerun review findings:
- Guard against empty treatment_groups before max() in last_cohort branch
- Fix n_obs to report effective sample size (n_units * n_periods) after trimming
- Add last_cohort tests for aggregate=event_study/all and bootstrap
- Add all-never-treated + last_cohort regression test
- Update influence_functions docstring to match dict contract

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Address rerun review findings:
- Filter common (g,t) cells with non-finite effect estimates before
  building Hausman delta/covariance (prevents NaN poisoning from
  no-valid-pairs cells)
- Make _nan_result return recommendation="inconclusive" for consistency
  (reject=False + "pt_post" was misleading)
- Add clustered SE tests for aggregate=event_study and aggregate=all
  (exercises cluster path through aggregation methods)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 94be0c0a75f979dbf573efca162e3204df542be3


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

  • Severity: P1. Impact: clustered bootstrap coverage is still incomplete for the new aggregation/inference combinations. The code now passes cluster_indices into _run_multiplier_bootstrap() and writes bootstrap SE/CI/p-values back into event_study_effects and group_effects, but the new tests stop at analytical clustered aggregation and a non-aggregated clustered bootstrap fit. Per the parameter-interaction rule, that is still insufficient coverage for the new clustered inference path. Concrete fix: add at least one cluster=..., n_bootstrap>0, aggregate="all" regression test that asserts finite bootstrap-updated event_study_effects and group_effects, plus a control case such as cluster_id = unit showing the aggregated clustered bootstrap collapses to the unclustered result. diff_diff/efficient_did.py:L952, diff_diff/efficient_did.py:L978, diff_diff/efficient_did_bootstrap.py:L98, tests/test_efficient_did.py:L806, tests/test_efficient_did.py:L827
  • Severity: P3. Impact: the public fit() docstring still omits the new store_eif parameter and still describes the old never-treated error condition, even though control_group="last_cohort" now changes that behavior. Concrete fix: document store_eif, update the Raises text, and describe the new control-group behavior in the fit() API docs. diff_diff/efficient_did.py:L294, diff_diff/efficient_did.py:L305, diff_diff/efficient_did.py:L343

Static review only; I could not run the suite because pytest and runtime dependencies are unavailable in this environment.

Path to Approval

  1. Rework hausman_pretest() to match Theorem A.1 on post-treatment ES(e) estimates, or explicitly document and rename the current routine as a footnote-22 ATT(g,t) variant and restrict it to post-treatment ATT cells.
  2. Make cluster and survey_design interact correctly by resolving effective clusters / injecting PSUs before unit-level TSL, or reject that combination with NotImplementedError.
  3. Add clustered n_bootstrap>0 regression coverage for aggregated outputs, preferably one aggregate="all" test that checks bootstrap-updated event-study and group inference fields.

Address rerun-3 review findings:

P1 fixes:
- Hausman pretest now aggregates to post-treatment event-study ES(e)
  before computing test statistic, matching Theorem A.1 (was using raw
  (g,t) cells including pre-treatment placebos)
- Raise NotImplementedError when both cluster and survey_design are set
  (cluster was silently ignored under survey TSL dispatch)
- Add clustered bootstrap+aggregate='all' test covering bootstrap-updated
  event-study and group effect SEs

P2: Update REGISTRY.md Hausman note to describe ES(e) aggregation.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: bf427af3a37d4266d642136afe6db15bf2b7d568


Overall Assessment

⚠️ Needs changes

Executive Summary

  • This rer-review closes the previous Hausman scope blocker: the pretest now works on event-study horizons rather than raw (g,t) cells.
  • The previous cluster + survey_design blocker is also closed by an explicit NotImplementedError.
  • P1: the new Hausman covariance is still mis-specified. hausman_pretest() aggregates ES(e) effects with cohort-share weights, but its ES(e) influence function omits the weight-influence-function correction that the library already uses for analytical ES aggregation. That can make the Hausman statistic, p-value, and PT-All/PT-Post recommendation wrong.
  • The earlier clustered-bootstrap aggregation coverage gap looks addressed by the new clustered aggregate="all" regression test.
  • Minor docs remain out of sync around store_eif, inconclusive, and last-cohort wording.

Methodology

  • Severity: P1. Impact: hausman_pretest() now aggregates to ES(e), but it does not use the full ES(e) EIF when building the Hausman covariance. The analytical ES path adds _compute_wif_contribution() at diff_diff/efficient_did.py:L1298 because cohort-size weights are estimated. In contrast, hausman_pretest() rebuilds ES(e) at diff_diff/efficient_did.py:L1499, accumulates es_eif as a plain weighted sum at diff_diff/efficient_did.py:L1530, and then forms cov_all, cov_post, and V at diff_diff/efficient_did.py:L1577 and diff_diff/efficient_did.py:L1584. The registry also says the Hausman covariance is computed from aggregated ES(e)-level EIF values at docs/methodology/REGISTRY.md:L683. Because the paper’s Hausman test is defined on the ES vector and its covariance matrices, omitting the WIF term makes H, its p-value, and the recommendation unreliable whenever multiple cohorts load into the same post-treatment horizon. Concrete fix: factor out a shared ES aggregation helper and reuse the WIF-adjusted ES(EIF) construction inside hausman_pretest() before computing cov_all, cov_post, and V. citeturn2search12turn0search12
  • Severity: P3. Impact: the PR’s explicit cluster-robust SE path, pseudoinverse handling for non-PSD Hausman V, and last_cohort control behavior are documented in the Methodology Registry at docs/methodology/REGISTRY.md:L682, docs/methodology/REGISTRY.md:L683, and docs/methodology/REGISTRY.md:L684, so I did not treat those as defects. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Update hausman_pretest() to build ES(e)-level EIFs with the same WIF adjustment used by _aggregate_event_study(), then recompute cov_all, cov_post, V, H, and the recommendation from those corrected EIFs.
  2. Add a regression test on a multi-cohort staggered panel where at least two cohorts contribute to the same post-treatment horizon, and assert the Hausman covariance/statistic matches a direct WIF-adjusted ES(EIF) construction.
  3. Update the public docs to cover store_eif, inconclusive, and the exact last-cohort trimming rule.

Static review only: I could not run the targeted suite here because pytest is unavailable and the local python environment is missing numpy.

P1: hausman_pretest ES(e) EIF now includes the weight influence
function correction matching _aggregate_event_study's WIF pattern.
Without this, the Hausman covariance was underestimated when multiple
cohorts load into the same post-treatment horizon.

P3 doc fixes:
- Document store_eif parameter in fit() docstring
- Document "inconclusive" return state in HausmanPretestResult
- Fix REGISTRY.md last-cohort wording to say "from treatment onset
  onward" instead of "last time period"

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 56beb422bcf69b2985a41583b0870e9db1bac3ab


Overall Assessment

✅ Looks good

Executive Summary

  • This re-review closes the prior Hausman blocker: hausman_pretest() now aggregates to ES(e), adds the cohort-weight WIF correction, and builds the covariance difference from ES-level EIFs in diff_diff/efficient_did.py:L1298-L1307 and diff_diff/efficient_did.py:L1503-L1599.
  • The new clustered SE path, last-cohort pseudo-control, and Moore-Penrose Hausman handling are documented in the Methodology Registry, so I treated them as documented deviations rather than defects.
  • I did not find new inline-inference or partial-NaN anti-patterns in the changed EfficientDiD paths; the new inference code consistently goes through safe_inference.
  • The new edge cases are meaningfully covered in the diff: clustered inference, invalid cluster metadata, last-cohort fallback, Hausman output states, and small-cohort warnings are all exercised in tests/test_efficient_did.py:L580-L920.
  • Static review only: I could not run the targeted test subset here because pytest is not installed in this environment.

Methodology

  • Severity: P3. Impact: The prior P1 on Hausman covariance specification is resolved. The implementation now constructs the post-treatment ES(e) vector, applies the cohort-weight WIF correction at the ES level, and then forms V = Cov(ES_post) - Cov(ES_all) in diff_diff/efficient_did.py:L1503-L1599. That matches the paper’s Theorem A.1 framing of the Hausman test on event-study aggregations rather than raw (g,t) cells. Concrete fix: none. (web3.arxiv.org)
  • Severity: P3. Impact: The PR’s main departures from the paper are explicitly documented in docs/methodology/REGISTRY.md:L669-L684: clustered Liang-Zeger EIF-based SEs with cluster-level multiplier bootstrap, pseudoinverse/effective-rank handling for non-PSD Hausman V, and control_group="last_cohort". Under the review rules, those are informational, not defects. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings.

Security

No findings.

Documentation/Tests

@igerber igerber merged commit 937dae6 into main Mar 22, 2026
14 checks passed
@igerber igerber deleted the edid-roadmap branch March 22, 2026 17:57
igerber added a commit that referenced this pull request Mar 22, 2026
…e-existing Hausman bug, add inference tests

- Add REGISTRY.md deviation note: CS per-cell ATT(g,t) SEs use influence-function
  variance (matching R's did), not full TSL with strata/PSU/FPC
- Add TODO entries: CS per-cell TSL SE (Medium), Hausman stale n_cl from #230 (Medium)
- Add 3 CS survey inference validation tests (scale invariance, per-cell ATT change,
  survey df effect)
- Remove unused survey_weight_type in ImputationDiD

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
igerber added a commit that referenced this pull request Mar 23, 2026
…e-existing Hausman bug, add inference tests

- Add REGISTRY.md deviation note: CS per-cell ATT(g,t) SEs use influence-function
  variance (matching R's did), not full TSL with strata/PSU/FPC
- Add TODO entries: CS per-cell TSL SE (Medium), Hausman stale n_cl from #230 (Medium)
- Add 3 CS survey inference validation tests (scale invariance, per-cell ATT change,
  survey df effect)
- Remove unused survey_weight_type in ImputationDiD

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
igerber added a commit that referenced this pull request Mar 24, 2026
The clustered SE formula fix shipped in PR #230 (v2.7.4). The remaining
stale-indices issue is still tracked in TODO.md and not fixed in this release.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant