Add doubly robust covariates path to EfficientDiD by igerber · Pull Request #225 · igerber/diff-diff

igerber · 2026-03-21T18:48:54Z

Summary

Add doubly robust (DR) covariates path to EfficientDiD estimator (Phase 2 of Chen, Sant'Anna & Xie 2025)
New module efficient_did_covariates.py with pure-function DR math: outcome regression via OLS, propensity score ratios via logit, per-unit generated outcomes (Eq 4.4), unconditional Omega*, and EIF computation
When covariates=["x1", "x2"] is passed to fit(), automatically uses the DR path; covariates=None (default) uses the existing no-covariates closed-form path unchanged
Add pscore_trim parameter with overlap diagnostics via _check_propensity_diagnostics()
Add estimation_path field to EfficientDiDResults ("nocov" or "dr")
Extend DGP helper with covariate generation (including confounding for DR testing)
17 new tests across 5 classes: basic correctness, validation, PT assumptions, edge cases, bootstrap

Methodology references (required if estimator / math changes)

Method name(s): Efficient Difference-in-Differences (doubly robust with covariates path)
Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators. (https://arxiv.org/abs/2506.17729), Equations 4.3-4.4, Section 4
Any intentional deviations from the source (and why):
- Propensity score ratios via standard logistic regression instead of sieve-based convex minimization (Eq 4.1-4.2). Valid under DR property; sieve approach deferred.
- Unconditional Omega* for efficient weights instead of kernel-smoothed conditional covariance Omega*(X). Gives valid DR estimator with correct SEs but not the full semiparametric efficiency bound. Conditional weights deferred.

Validation

Tests added/updated: tests/test_efficient_did.py (17 new tests), tests/helpers/edid_dgp.py (extended)
All 85 EDID tests pass including 7 HRS empirical validation tests (unchanged)
Backtest / simulation / notebook evidence: DGP with confounding validates DR path produces different (corrected) estimates vs nocov path

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implements Phase 2 of the Chen, Sant'Anna & Xie (2025) methodology: when covariates are provided, uses outcome regression (OLS) and propensity score ratios (logit) for doubly robust ATT estimation. The estimator is consistent if either the outcome model or the propensity model is correctly specified. New module efficient_did_covariates.py contains the DR math functions. The no-covariates closed-form path remains unchanged as the default. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions · 2026-03-21T18:54:19Z

Overall Assessment

⛔ Blocker

Executive Summary

Affected method: EfficientDiD’s new doubly robust covariate path, specifically the propensity-ratio piece used inside the Eq. 4.4 generated outcomes and the downstream EIF/SE calculations.
Severity P0: estimate_propensity_ratio() does not align treatment labels with the selected covariate rows, so r_hat_{g,g'}(X) is silently mis-estimated whenever units are not cohort-contiguous in the internal ordering. That can corrupt ATTs and inference with no warning. See docs/methodology/REGISTRY.md:L543-L552 and diff_diff/efficient_did_covariates.py:L132-L166.
The same helper also sends valid PT-All same-cohort ratios r_{g,g}(X) through an impossible logit fit instead of returning the exact ratio 1, relying on exception fallback rather than the method definition. See diff_diff/efficient_did_covariates.py:L128-L159.
Severity P1: the new pscore_trim estimator parameter is exposed in __init__/get_params() but not preserved in EfficientDiDResults, which trips the project’s “new init param missing downstream” anti-pattern. See diff_diff/efficient_did.py:L89-L143, diff_diff/efficient_did.py:L665-L690, and diff_diff/efficient_did_results.py:L80-L107.
Severity P1: the new covariate path is not tested across all existing aggregation/bootstrap paths, and the new DGP keeps cohorts in contiguous unit blocks, which is exactly the setup that hides the P0 row-order bug. See tests/helpers/edid_dgp.py:L44-L48, tests/test_efficient_did.py:L1093-L1300, and diff_diff/efficient_did.py:L580-L662.
The two declared paper deviations in this PR are properly documented in the methodology registry and tracked in TODO.md, so I am not treating them as defects. See docs/methodology/REGISTRY.md:L656-L672 and TODO.md:L56-L57.

Methodology

Severity P0. Impact: Eq. 4.4 requires r_hat_{g,g'}(X) = p_g(X) / p_{g'}(X), but the implementation builds X_combined in original unit order and builds D as ones(n_g) + zeros(n_gp). Those are only aligned if all g units happen to appear before all g' units. For interleaved/shuffled unit ids, the fitted logit is estimating the wrong binary classification problem, so the DR generated outcomes, ATT(g,t), EIF, and both analytical and multiplier-bootstrap inference can be wrong with no diagnostic. Concrete fix: in diff_diff/efficient_did_covariates.py:L132-L166, either stack X_g then X_gp explicitly or define D = mask_g[combined_mask].astype(float) so labels follow the selected rows; also short-circuit mask_g == mask_gp to return an all-ones ratio for r_{g,g}(X).
Severity P3. Impact: using standard logit instead of the paper’s sieve ratio estimator, and unconditional Omega* instead of conditional Omega*(X), is documented in the registry and tracked in TODO.md, so these are acceptable deferred deviations rather than review defects. Concrete fix: none required for this PR.

Code Quality

No additional findings beyond the methodology blocker above.

Performance

No separate performance finding. Fixing the r_{g,g} special case in the P0 item will also remove repeated failed logit fits on valid PT-All same-cohort moments.

Maintainability

Severity P1. Impact: pscore_trim is a new public estimator parameter, but it is not carried into the results object. That makes EfficientDiDResults unable to fully describe the estimation configuration that produced a result, unlike the existing staggered results pattern that preserves trimming settings. See diff_diff/efficient_did.py:L89-L143, diff_diff/efficient_did.py:L665-L690, diff_diff/efficient_did_results.py:L80-L107, and the existing pattern in diff_diff/staggered_results.py:L96-L120. Concrete fix: add a pscore_trim field to EfficientDiDResults, populate it from EfficientDiD.fit(), and include it in summary/repr output if you want parity with other estimators.

Tech Debt

Severity P3. Impact: the PR properly adds TODO.md entries for the deferred sieve-based propensity ratio estimator and conditional Omega*(X) weighting, so those deferrals are tracked and non-blocking. See TODO.md:L56-L57. Concrete fix: none required for this PR.

Security

No findings.

Documentation/Tests

Severity P1. Impact: the new covariate path is only covered by smoke-style tests for plain fit() and one overall bootstrap call; it does not exercise aggregate="event_study", aggregate="group", or aggregate="all" with covariates, and it never shuffles/interleaves unit ordering. Because the helper DGP assigns cohorts in contiguous unit blocks, the current suite is predisposed to miss the row-order bug in the propensity helper. See tests/helpers/edid_dgp.py:L44-L48, tests/test_efficient_did.py:L1093-L1300, and diff_diff/efficient_did.py:L580-L662. Concrete fix: add covariate-path regression tests that:
1. shuffle or relabel units so cohorts are interleaved,
2. cover PT-All and PT-Post,
3. cover aggregate="event_study", aggregate="group", and aggregate="all",
4. include at least one bootstrap case on an aggregated covariate path.

Path to Approval

Fix diff_diff/efficient_did_covariates.py:L132-L166 so the propensity-ratio labels are aligned with the selected rows, and special-case r_{g,g}(X) = 1.
Add a regression test with shuffled/interleaved unit ordering that fails on the current implementation and passes once the label alignment bug is fixed.
Add covariate-path aggregation/bootstrapping tests for event_study, group, and all, with at least one PT-All bootstrap case.
Thread pscore_trim into EfficientDiDResults so the new estimator parameter is preserved downstream.

…rim to results P0 fix: estimate_propensity_ratio() now derives D labels from mask_g[combined_mask] instead of concatenating ones/zeros, ensuring correct alignment regardless of unit ordering. Also short-circuits r_{g,g}(X) = 1 for same-cohort comparisons under PT-All. P1 fix: pscore_trim is now preserved in EfficientDiDResults, matching the staggered_results pattern for parameter traceability. P1 fix: add shuffled-unit regression test that would have caught the label bug, plus aggregation tests (event_study, group, all) and PT-All bootstrap test for full covariate-path coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T19:09:25Z

/ai-review

github-actions · 2026-03-21T19:19:11Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 02537476753671800dbd9ef04a2777398ce013af

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected method: EfficientDiD’s new doubly robust covariate path, specifically the Section 4 nuisance estimation, Eq. 4.4 generated outcomes, and the EIF-based ATT/SE path. The paper’s staggered estimator is built from estimated ratio, regression, and covariance nuisances in Eq. 4.3-4.4, while Remark 4.1/4.2 allows alternative consistent weights and ML nuisance estimators. (cowles.yale.edu)
The previous P0 row-order bug appears fixed: the propensity labels now follow combined_mask, same-cohort r_{g,g}(X) is short-circuited to 1, and the PR adds a shuffled-unit regression test in diff_diff/efficient_did_covariates.py:L132-L145 and tests/test_efficient_did.py:L1306-L1331.
The previous downstream propagation issue is also fixed: pscore_trim now reaches get_params() and the results object via diff_diff/efficient_did.py:L132-L143, diff_diff/efficient_did.py:L665-L690, and diff_diff/efficient_did_results.py:L107-L138.
Severity P1: on propensity-model failure, the estimator falls back to the X-constant ratio n_g/n_{g'} for every unit in diff_diff/efficient_did_covariates.py:L154-L169. That is a real estimator deviation from the documented Section 4 nuisance path, and it is not recorded in docs/methodology/REGISTRY.md:L617-L625 or the new deviation notes. (cowles.yale.edu)
The two explicit deviations the PR calls out, standard logit instead of sieve ratio estimation and unconditional Omega* instead of conditional Omega*(X), are now properly documented in docs/methodology/REGISTRY.md:L656-L672 and tracked in TODO.md:L56-L57, so they are non-blocking. The paper itself presents sieve-based ratio estimation and kernel-smoothed conditional covariance as the default Section 4 construction. (cowles.yale.edu)
Review note: this was a static review only. I could not execute the test suite in this container because pytest and numpy are unavailable.

Methodology

Severity P1. Impact: estimate_propensity_ratio() switches to a constant cohort-share odds model whenever solve_logit() raises, by setting pscore_all = n_g / (n_g + n_gp) and therefore r_hat_{g,g'}(X) = n_g/n_gp for all X in diff_diff/efficient_did_covariates.py:L154-L169. Eq. 4.4 is written in terms of estimated ratio functions r_{g,g'}(X), and the paper’s Section 4 construction centers nuisance estimation on ratio estimators plus plug-in covariance estimation; this fallback is a real methodology deviation and it is not disclosed in REGISTRY.md. Concrete fix: either raise on logit failure, or add an explicit REGISTRY.md deviation note describing the constant-ratio fallback and its implications. (cowles.yale.edu)
Severity P3. Impact: the new logit-for-sieve and unconditional-Omega* substitutions are acceptable documented deviations, so they are not defects. Concrete fix: none required. (cowles.yale.edu)
Severity P3. Impact: the public class docstring still says EfficientDiD “Achieves the semiparametric efficiency bound” even when covariates routes to the documented non-efficient approximation with unconditional Omega*. That is user-facing overstatement, but it is mitigated by the registry note. Concrete fix: align diff_diff/efficient_did.py:L46-L54 with docs/methodology/REGISTRY.md:L671-L672.

Code Quality

No new findings.

Performance

No new findings.

Maintainability

No new findings.

Tech Debt

Severity P3. Impact: the remaining methodology gaps, sieve-based ratio estimation and conditional Omega*(X) weighting, are properly tracked in TODO.md:L56-L57 and documented in docs/methodology/REGISTRY.md:L656-L672, so they are non-blocking deferred work. Concrete fix: none required.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new “confounding” DGP in tests/helpers/edid_dgp.py:L88-L99 adds an x1 * time trend only for ever-treated units, so conditional parallel trends does not actually hold. The associated test in tests/test_efficient_did.py:L1139-L1158 only checks that the DR path runs and differs from nocov, so it does not validate the claimed DR recovery scenario. Concrete fix: generate confounding through X-dependent untreated trends that apply to all groups and let treatment timing depend on X; then assert the DR estimate is closer to the known ATT than the no-covariates estimate.

Path to Approval

Resolve the propensity-model failure behavior in diff_diff/efficient_did_covariates.py:L154-L169: either raise instead of falling back, or document the fallback explicitly in docs/methodology/REGISTRY.md as a deliberate deviation from the paper.
Add a regression test that forces solve_logit() failure and verifies the chosen behavior end-to-end.

…ng DGP - Document propensity-ratio constant-fallback as deviation in REGISTRY.md - Fix confounding DGP: X-dependent trends apply to all groups (not just treated), x1 distribution shifts by group; conditional PT now holds - Test asserts DR bias < nocov bias against known true ATT - Add logit-failure regression test via mock - Clarify docstring: covariate path uses unconditional Omega*, does not achieve full semiparametric efficiency bound - Fix pre-existing E402 lint: move edid_dgp import to top of file Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T20:02:48Z

/ai-review

github-actions · 2026-03-21T20:09:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 86492b0272d9d5a9613e6f25f35be9ec7c78b3a4

Overall Assessment

✅ Looks good

Executive Summary

Affected method: the PR adds a covariate-enabled doubly robust path for EfficientDiD, specifically the staggered-adoption Section 4 estimator built from Eq. 4.3 / Eq. 4.4 generated outcomes and EIF-based analytical SEs. (ar5iv.org)
Re-review result: the prior blocker on undocumented propensity fallback is addressed. The fallback is now recorded in docs/methodology/REGISTRY.md#L621, exercised in tests/test_efficient_did.py#L1311, and pscore_trim / estimation_path are wired through diff_diff/efficient_did.py#L693 and diff_diff/efficient_did_results.py#L107.
The earlier DGP/test concern is also addressed: the confounding DGP now applies the x1-dependent untreated trend to all units in tests/helpers/edid_dgp.py#L96, and the corresponding test now checks bias relative to the known ATT in tests/test_efficient_did.py#L1137.
No unmitigated P0/P1 findings remain. The two headline deviations from the paper, standard logit for propensity-ratio estimation and unconditional Omega* weights, are now documented in the registry/TODO, and Remark 4.1 / Remark 4.2 make those non-blocking implementation choices rather than methodology defects. (ar5iv.org)
Remaining issues are P3 only: the new fallback note/docstring is still slightly imprecise, and the overlap-warning test does not actually assert that a warning fired.
Static review only: pytest is not installed here, and numpy / pandas are unavailable, so I could not execute the new test slice.

Methodology

Severity P3. Impact: the intended deviations are now properly documented and tracked. The covariate path in diff_diff/efficient_did.py#L430 and diff_diff/efficient_did_covariates.py#L173 follows the paper’s Section 4 / Eq. 4.3-4.4 structure at a high level, while the substitutions to standard logit and unconditional Omega* are explicitly recorded in docs/methodology/REGISTRY.md#L672, docs/methodology/REGISTRY.md#L673, TODO.md#L56, and TODO.md#L57. Concrete fix: none required for this PR. (ar5iv.org)
Severity P3. Impact: the new fallback documentation is still not technically exact. The code at diff_diff/efficient_did_covariates.py#L154 only falls back when solve_logit() throws, and it keeps the OLS nuisance adjustments active; the note in docs/methodology/REGISTRY.md#L621 and the helper docstring at diff_diff/efficient_did_covariates.py#L107 therefore overstate both the trigger (convergence/separation) and the resulting estimator (reducing to the no-covariates generated outcome). Concrete fix: reword the note/docstring to describe an exception-triggered constant-ratio fallback with outcome-regression adjustment still present.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3. Impact: the remaining paper-parity work is properly tracked in TODO.md#L56 and TODO.md#L57, so it should not block this PR. Concrete fix: none required.

Security

No findings.

Documentation/Tests

Severity P3. Impact: tests/test_efficient_did.py#L1356 is named as a warning test, but it only records warnings and then checks for finite estimates; it never asserts that _check_propensity_diagnostics() in diff_diff/linalg.py#L1271 or the call site in diff_diff/efficient_did_covariates.py#L144 actually emitted a trimming / near-separation warning. A regression could therefore remove the diagnostic without failing CI. Concrete fix: assert that at least one captured warning message mentions trimming, propensity scores, or near-separation.
Static note: I could not run the test suite here because pytest is not installed, and numpy / pandas are unavailable.

Implements paper-faithful Section 4 nuisance estimation: Sieve-based propensity ratio (Eq 4.1-4.2): - Closed-form linear system per polynomial degree K - AIC/BIC model selection across sieve dimensions - Ratio clipping to [1/ratio_clip, ratio_clip] for positivity - Eliminates logit convergence/separation issues entirely Kernel-smoothed conditional Omega*(X) (Eq 3.12): - Nadaraya-Watson estimator with Gaussian kernel and local means - Silverman bandwidth by default, user-overridable - Per-unit efficient weights w(X_i) from Omega*(X_i) inverse - Plug-in EIF valid under Neyman orthogonality (Remark 4.2) New params: sieve_k_max, sieve_criterion, ratio_clip, kernel_bandwidth Removed: pscore_trim (replaced by ratio_clip) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T20:48:38Z

/ai-review

github-actions · 2026-03-21T20:55:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 74d7723c43c86a6c8783c74abe132d2960f6bda0

Overall Assessment

⛔ Blocker

Executive Summary

Re-review focus: the earlier fallback-note imprecision appears resolved, and the confounding DGP now applies the x1 trend to all units as intended in docs/methodology/REGISTRY.md:L617-L621 and tests/helpers/edid_dgp.py:L96-L102.
[Newly identified] P0: the new DR path computes ATT(g,t) as mean_i[w_i(X)'Y_i], but compute_eif_cov() centers by the vector of pairwise means instead of the scalar weighted estimate, so the covariate-path analytical SEs and multiplier-bootstrap inference are wrong whenever weights vary by unit. See diff_diff/efficient_did.py:L530-L537, diff_diff/efficient_did.py:L586-L589, diff_diff/efficient_did_covariates.py:L626-L669, and diff_diff/efficient_did_bootstrap.py:L102-L108.
[Newly identified] P1: compute_omega_star_conditional() uses unconditional cohort shares pi_* where Eq. 3.12 and the registry require conditional generalized propensities p_*(X), so the covariate path is not the documented conditional-efficient estimator. Compare docs/methodology/REGISTRY.md:L554-L566 and docs/methodology/REGISTRY.md:L632-L646 with diff_diff/efficient_did_covariates.py:L457-L549.
[Newly identified] P1: the DR path still has no overlap diagnostic warning even though the registry now marks overlap diagnostics complete; the code silently clips ratios via ratio_clip instead. See docs/methodology/REGISTRY.md:L506-L516, docs/methodology/REGISTRY.md:L617-L620, docs/methodology/REGISTRY.md:L657-L673, diff_diff/efficient_did.py:L479-L493, and diff_diff/efficient_did_covariates.py:L238-L239.
The polynomial-sieve basis and Gaussian-kernel/Silverman choices are documented implementation choices in the registry and are not defects.

Methodology

Affected method: EfficientDiD covariate-enabled doubly robust staggered-adoption path (Eq. 4.3, Eq. 4.4, conditional Omega*_{gt}(X), EIF/SE).

[Newly identified] Severity P0. compute_eif_cov() is not the influence function for the estimator actually computed on the DR path. The estimator is att_gt = mean_i[w_i(X)'gen_out_i], but the code returns sum_h w_ih(gen_out_ih - mean_j gen_out_jh), which only centers correctly when weights are global, not unit-specific. Impact: group-time analytical SEs, p-values, and CIs are silently wrong on the covariate path, and the same bad eif_by_gt values are then reused by the multiplier bootstrap. Concrete fix: change the DR EIF to center on the scalar att_gt for the per-unit-weight estimator in diff_diff/efficient_did_covariates.py:L626-L669, update the caller in diff_diff/efficient_did.py:L530-L537, and add a regression test that the sample mean of the returned EIF is approximately zero when weights vary across units.
[Newly identified] Severity P1. The new Omega*_{gt}(X) implementation does not match Eq. 3.12 / the registry: it scales every covariance term by unconditional pi_g, pi_inf, and pi_gp rather than conditional p_g(X), p_inf(X), and p_{g'}(X). Impact: the shipped covariate path is not the documented conditional-efficient method, and the class/module docstrings overstate what is implemented in diff_diff/efficient_did.py:L50-L57 and diff_diff/efficient_did_covariates.py:L4-L8. Concrete fix: either implement the per-unit propensity scaling required by Eq. 3.12, or add a **Note (deviation from source)** in the registry and reword the public docs so this path is no longer presented as achieving the paper’s conditional efficiency bound.
[Newly identified] Severity P1. The overlap assumption check required by the registry is missing from the DR path. The code estimates/clips ratios but never emits a near-boundary warning, and there is no pscore_trim-style user control or diagnostic equivalent to the existing helper in diff_diff/linalg.py:L1271-L1293. Impact: severe overlap problems are silently masked by clipping, which is exactly the kind of missing assumption check the registry says should warn users. Concrete fix: add an actual diagnostic call on the underlying propensity/ratio estimates before clipping, expose the trimming threshold as a user-facing parameter if intended, and only mark the registry requirement complete once that warning path exists.

Code Quality

No additional findings.

Performance

No additional findings.

Maintainability

No additional findings.

Tech Debt

No additional findings. TODO.md does not currently track the P0/P1 issues above, so they remain unmitigated.

Security

No findings.

Documentation/Tests

Severity P3. The new tests mostly check finiteness and path selection, so they would not catch either blocking issue above. For example, the near-separation test only asserts finite output in tests/test_efficient_did.py:L1351-L1374, and there is no test that the DR EIF is mean-zero or benchmarked against a known SE. Impact: both inference/assumption regressions can land silently. Concrete fix: add one unit test for compute_eif_cov() with non-constant per-unit weights and one warning assertion for extreme-overlap clipping.
Static review only: numpy and pytest are not available in this environment, so I could not execute the new test slice.

Path to Approval

Re-derive the DR EIF for the per-unit-weight estimator and fix diff_diff/efficient_did_covariates.py:L626-L669 plus its caller in diff_diff/efficient_did.py:L530-L537 so the returned influence values are centered at the scalar att_gt; add a regression test that the sample mean of the EIF is approximately zero under varying weights.
In diff_diff/efficient_did_covariates.py:L457-L549, either replace unconditional pi_* scaling with estimated conditional propensities / s_hat_{g'} per Eq. 3.12, or explicitly document the deviation in docs/methodology/REGISTRY.md:L657-L673 and downgrade the public “efficient” claims for the covariate path.
Add real overlap diagnostics to the DR ratio path by reusing the existing warning pattern from diff_diff/linalg.py:L1271-L1293 / diff_diff/staggered.py:L1505-L1534, and add a test that explicitly asserts the warning on a near-separation design.

P0 fix: EIF now centers on scalar ATT (EIF_i = w_i @ gen_out_i - ATT) instead of per-pair means, ensuring mean(EIF) ≈ 0 when weights vary by unit. Added unit test verifying mean-zero property. P1 fix: Add overlap warning when sieve ratios require clipping, with test asserting the warning fires on near-separation covariates. P1 fix: Document unconditional-pi Omega* as deviation in REGISTRY.md; downgrade "efficient" claim in docstring for covariate path. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T21:12:34Z

/ai-review

github-actions · 2026-03-21T21:21:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c7770291f394064750cabc739fbafe3165a99bb

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review outcome: the prior blocking methodology issues appear addressed or mitigated. The DR path now computes ATT(g,t) as the sample average of per-unit weighted Eq. 4.4 generated outcomes and centers the plug-in EIF on that scalar ATT, which matches the paper’s Eq. 4.3 / Theorem 4.1 structure. diff_diff/efficient_did.py:L532-L540 diff_diff/efficient_did_covariates.py:L639-L684 (arxiv.org)
Re-review outcome: overlap diagnostics are now present via ratio-clipping warnings, and the conditional-Omega*(X) unconditional-pi approximation is explicitly documented in the Methodology Registry, so those earlier review items are no longer blockers. diff_diff/efficient_did_covariates.py:L238-L252 docs/methodology/REGISTRY.md:L671-L674
[Newly identified] P1: validation for the new DR tuning parameters is incomplete. _validate_params() does not validate kernel_bandwidth at all and only partially validates ratio_clip; invalid values can flow into the kernel/clipping math and produce NaN ratios, covariances, weights, ATTs, and SEs instead of a clean user-facing error. diff_diff/efficient_did.py:L135-L150 diff_diff/efficient_did_covariates.py:L239-L252 diff_diff/efficient_did_covariates.py:L367-L386 The paper’s kernel-smoothing step explicitly assumes a bandwidth h > 0. (arxiv.org)
The new tests cover happy-path DR estimation, overlap warnings, and EIF centering, but they do not cover invalid kernel_bandwidth / non-finite tuning inputs, so the blocker above would not be caught today. tests/test_efficient_did.py:L1189-L1205
Static review only: pytest and numpy are not installed in this environment, so I could not execute the new test slice.

Methodology

Affected method: EfficientDiD doubly robust staggered-adoption path with covariates (Omega*_{gt}(X), Eq. 4.3, Eq. 4.4, EIF/SE). (arxiv.org)

Severity P3. Impact: the remaining source deviation is the unconditional-pi scaling inside the covariate-path Omega*(X) rather than the paper’s conditional propensity scaling from Eq. 3.12. This is now explicitly documented in the registry, so under the review rubric it is informational rather than a defect. Concrete fix: none required for approval; optionally mirror the registry note more explicitly in code-local docstrings to avoid overclaiming full conditional efficiency. docs/methodology/REGISTRY.md:L671-L674 diff_diff/efficient_did.py:L47-L59 (arxiv.org)

Code Quality

Severity P1 [Newly identified]. Impact: the new DR tuning-parameter validation is incomplete. kernel_bandwidth is accepted without any finite/positive check, and ratio_clip=np.nan also bypasses the current guard because only <= 1.0 is tested. Those bad values then reach _kernel_weights_matrix() / np.clip() and can poison downstream DR inference with NaN outputs. Concrete fix: in EfficientDiD._validate_params(), require kernel_bandwidth is None or (np.isfinite(kernel_bandwidth) and kernel_bandwidth > 0) and ratio_clip to be finite and > 1.0; while touching the block, it would also be prudent to validate sieve_k_max as None or a positive integer. diff_diff/efficient_did.py:L135-L150 diff_diff/efficient_did_covariates.py:L239-L252 diff_diff/efficient_did_covariates.py:L367-L386 (arxiv.org)

Performance

No findings.

Maintainability

No findings.

Tech Debt

No mitigating TODO.md entry covers the P1 above, so it remains unmitigated for assessment purposes. TODO.md:L44-L57

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new validation tests only cover ratio_clip and sieve_criterion; they do not assert rejection of invalid kernel_bandwidth values, and EfficientDiDResults adds new DR metadata fields without documenting them in its attribute block. Concrete fix: add pytest.raises coverage for kernel_bandwidth=0, np.nan, and np.inf, and document estimation_path, sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth in the results dataclass docstring. tests/test_efficient_did.py:L1189-L1205 diff_diff/efficient_did_results.py:L28-L78 diff_diff/efficient_did_results.py:L107-L111

Path to Approval

Tighten DR tuning validation in diff_diff/efficient_did.py:L135-L150 so kernel_bandwidth must be finite and strictly positive (or None) and ratio_clip must be finite and > 1.0.
Add regression tests in tests/test_efficient_did.py that assert ValueError for kernel_bandwidth=0, kernel_bandwidth=np.nan, and kernel_bandwidth=np.inf rather than letting those values reach the numerical routines.

Validate kernel_bandwidth (None or finite > 0), ratio_clip (finite > 1.0), and sieve_k_max (None or positive int) in _validate_params(). Rejects NaN, Inf, zero, and negative values with clear ValueError messages. Add pytest.raises tests for all invalid parameter combinations. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T21:23:33Z

/ai-review

github-actions · 2026-03-21T21:36:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7608264bf1d0be401abcc8579719b9ea236b1931

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior P1 is resolved. _validate_params() now rejects invalid ratio_clip, kernel_bandwidth, and sieve_k_max, and regression coverage was added for those cases in diff_diff/efficient_did.py:L135 and tests/test_efficient_did.py:L1189.
Affected methodology: the new covariate/DR EDiD path touches the Section 4 nuisance-estimation workflow, Eq. 4.3 weighted ATT construction, Eq. 4.4 generated outcomes, Eq. 3.12 conditional Omega*, and EIF-based analytical inference. citeturn1view0turn1view1
I did not find any new unmitigated P0/P1 issues in inference handling. The new path uses safe_inference() rather than inline t-stat / p-value / CI code at diff_diff/efficient_did.py:L604, diff_diff/efficient_did.py:L626, and diff_diff/efficient_did.py:L677.
The only material source deviation I found remains the unconditional-pi scaling in Omega*(X) instead of the paper’s conditional generalized-propensity version; that deviation is explicitly documented in the registry, so under the review rubric it is P3 informational rather than a blocker. citeturn1view0turn1view1
Static review only: I could not run the test suite in this environment because numpy/scipy are not installed.

Methodology

Severity P3 (informational). Impact: the paper’s conditional Omega*_{gt}(X) construction is tied to conditional generalized propensities / s_hat_{g'} in the Section 4 algorithm, while this implementation intentionally uses unconditional cohort fractions instead. That is already documented in docs/methodology/REGISTRY.md:L674, so it is not a defect for approval; the remaining issue is that the new module/function docstrings still read as if Eq. 3.12 is implemented literally in diff_diff/efficient_did_covariates.py:L1 and diff_diff/efficient_did_covariates.py:L421. Concrete fix: mirror the registry’s deviation note in those code-local docstrings. citeturn1view0turn1view1

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity P3. Impact: EfficientDiDResults now exposes DR metadata (estimation_path, sieve_k_max, sieve_criterion, ratio_clip, kernel_bandwidth) but the public attribute block still stops at bootstrap_results, so the new results surface is under-documented in diff_diff/efficient_did_results.py:L28 even though the fields exist at diff_diff/efficient_did_results.py:L107. Concrete fix: extend the dataclass attribute docs to cover the new DR fields.
Severity P3. Impact: the new tests cover validation, PT variants, aggregation modes, overlap warnings, and bootstrap, but they do not exercise the documented “all sieve fits fail -> constant ratio 1” fallback recorded in docs/methodology/REGISTRY.md:L621. That leaves a documented edge case unguarded by tests. Concrete fix: add a focused unit test around estimate_propensity_ratio_sieve() at diff_diff/efficient_did_covariates.py:L136 that forces all K attempts to fail and asserts the documented fallback behavior.

…ep 4) Replace unconditional cohort fractions pi_g in Omega*(X) with per-unit sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X), matching Eq 3.12 and the paper's algorithm step 4. The inverse propensity sieve uses the same polynomial basis and AIC/BIC selection as the ratio estimator. FOC: (Psi_{g'}' Psi_{g'}) beta = Psi_all.sum(axis=0) — same closed-form linear system structure. This eliminates the last documented methodology deviation from the paper. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T21:43:47Z

/ai-review

github-actions · 2026-03-21T21:51:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 16dc2f9f5540bd40ee559f78d9bf3b4198151108

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review result: the prior P1 around new-parameter validation is resolved in diff_diff/efficient_did.py:L135 and covered by regressions starting at tests/test_efficient_did.py:L1189.
Affected methodology: EfficientDiD with covariates, specifically the Section 4 doubly robust path, Eq. 4.3/4.4 generated outcomes, step-4 inverse propensities s_hat_{g'}(X), and Eq. 3.12 conditional Omega*.
Severity P1 [Newly identified]: estimate_inverse_propensity_sieve() silently falls back to unconditional n / n_group scaling at diff_diff/efficient_did_covariates.py:L312, and those values feed the conditional Omega* weighting path at diff_diff/efficient_did.py:L529 and diff_diff/efficient_did.py:L541. REGISTRY.md documents an all-K fallback only for ratio estimation at docs/methodology/REGISTRY.md:L621, while step 4 is still described as sieve-estimated s_hat_{g'}(X) at docs/methodology/REGISTRY.md:L640 and docs/methodology/REGISTRY.md:L674.
I did not find new inline inference or partial-NaN anti-patterns in the changed estimator path; the new code continues to use safe_inference() at diff_diff/efficient_did.py:L620.
The other source deviations I checked, including polynomial sieves and Gaussian-kernel conditional covariance, are now explicitly documented in the registry at docs/methodology/REGISTRY.md:L672 and docs/methodology/REGISTRY.md:L673, so I am not treating them as defects.
Static review only: I could not run the test suite in this environment because numpy is not installed.

Methodology

Affected method(s): Efficient Difference-in-Differences with covariates, especially the DR nuisance-estimation and efficient-weighting path.

Severity P1 [Newly identified]. Impact: when step-4 inverse propensity estimation cannot solve any sieve system, estimate_inverse_propensity_sieve() silently substitutes unconditional n / n_group values at diff_diff/efficient_did_covariates.py:L262 and diff_diff/efficient_did_covariates.py:L312. That changes the Eq. 3.12 weighting/SE path without a warning or a REGISTRY.md note, even though the registry only documents this kind of fallback for r_hat and otherwise says step 4 uses sieve-estimated s_hat_{g'}(X) at docs/methodology/REGISTRY.md:L621, docs/methodology/REGISTRY.md:L640, and docs/methodology/REGISTRY.md:L674. Concrete fix: either fail explicitly when step 4 cannot be estimated, or keep the fallback but emit a UserWarning and add a **Note:** to REGISTRY.md documenting the unconditional-s_hat fallback as an intentional deviation.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity P3. Impact: EfficientDiDResults now exposes estimation_path, sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth at diff_diff/efficient_did_results.py:L107, but the public attribute docs still stop at bootstrap_results at diff_diff/efficient_did_results.py:L28. Concrete fix: extend the dataclass attribute block to document the new DR-facing result fields.
Severity P3. Impact: the new covariate test block covers smoke, validation, aggregation, overlap warnings, and bootstrap starting at tests/test_efficient_did.py:L1091 and tests/test_efficient_did.py:L1413, but I did not find a focused regression test for the documented all-K ratio fallback at docs/methodology/REGISTRY.md:L621 or the new step-4 inverse-propensity fallback at diff_diff/efficient_did_covariates.py:L312. Concrete fix: add direct unit tests that force n_basis >= n_group or a singular basis for both helpers and assert the warning/fallback behavior.

Path to Approval

In estimate_inverse_propensity_sieve(), stop silently switching to unconditional n / n_group weights. Either raise on all-K failure, or emit a UserWarning and document the fallback in REGISTRY.md with a **Note:**.
Add targeted regression tests for both sieve-failure paths: one for estimate_inverse_propensity_sieve() and one for estimate_propensity_ratio_sieve(), using a tiny or collinear group setup that forces all K attempts to fail.
Update the EfficientDiDResults attribute docs to cover estimation_path, sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth.

P1 fix: estimate_inverse_propensity_sieve() now emits UserWarning on all-K fallback (matching the ratio sieve pattern). Documented in REGISTRY.md alongside the existing ratio fallback note. Add TestSieveFallbacks class with targeted tests forcing tiny groups (n_basis >= n_group) for both ratio and inverse propensity sieve helpers. Extend EfficientDiDResults attribute docs to cover estimation_path, sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T21:54:07Z

/ai-review

github-actions · 2026-03-21T22:05:02Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f94f499c148485e84b135fd5a25ba19454a43d49

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review result: the prior methodology blocker around the step-4 inverse-propensity fallback is resolved. It is now documented in docs/methodology/REGISTRY.md:L621-L622 and regression-covered in tests/test_efficient_did.py:L1467-L1481.
Affected methodology: EfficientDiD with covariates, especially the Eq. 4.2 propensity-ratio nuisance, Eq. 4.4 generated outcomes, and Eq. 3.12 conditional Omega* weighting path. (arxiv.org)
Severity P1 [Newly identified]: diff_diff/efficient_did_covariates.py:L197-L254 still silently falls back to a constant ratio of 1 when every sieve candidate fails. Those ratios are consumed in diff_diff/efficient_did.py:L494-L523, so the estimator can quietly lose the ratio arm of the DR correction in the exact edge cases where the nuisance fit is failing. (arxiv.org)
I did not find new inline-inference or partial-NaN anti-patterns in the changed estimator path; inference still goes through diff_diff/efficient_did.py:L620.
Static review only: I could not execute the test suite here because numpy, pandas, scipy, and pytest are not installed in this environment.

Methodology

Severity P1 [Newly identified]. Impact: diff_diff/efficient_did_covariates.py:L197-L254 initializes best_ratio to ones and returns it unchanged when all sieve fits fail, but unlike the inverse-propensity helper there is no best_ic == np.inf warning path. The new registry note says this fallback should emit a UserWarning at docs/methodology/REGISTRY.md:L621-L622, and the returned ratios feed directly into the Eq. 4.4 score via diff_diff/efficient_did.py:L494-L523. I infer from Eq. 4.4 that this silently collapses the ratio-adjustment arm of the DR score to a constant-one fallback, so users can lose the advertised DR protection exactly when the ratio nuisance is not estimable. Concrete fix: add an explicit warnings.warn(...) branch when no ratio sieve candidate succeeds, including the n_basis >= n_gp early-exit case, and keep the constant-one fallback only as an explicit surfaced deviation. (arxiv.org)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked findings. The pre-existing EfficientDiD public-doc gap is already tracked in TODO.md:L54, so I did not count it against this PR.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new ratio-fallback regression at tests/test_efficient_did.py:L1451-L1465 only checks finiteness, so it would not catch the missing warning behavior described above. Concrete fix: change it to with pytest.warns(UserWarning, match=...) and also assert the fallback output is exactly the constant-one vector, mirroring tests/test_efficient_did.py:L1467-L1481.

Path to Approval

Add a UserWarning in diff_diff/efficient_did_covariates.py:L197-L254 when estimate_propensity_ratio_sieve() exits via the constant-one fallback because no K succeeded.
Strengthen tests/test_efficient_did.py:L1451-L1465 to assert both the warning and the constant-one fallback.
Tighten docs/methodology/REGISTRY.md:L621-L622 if needed so the note matches the actual trigger conditions for the fallback.

Symmetric with the inverse propensity fallback: emit UserWarning when no sieve candidate succeeds and ratio falls back to constant 1. Updated test to assert the warning fires. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T22:18:15Z

/ai-review

github-actions · 2026-03-21T22:25:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 24b6b2a1ebee8813877cc36803cd03ebeefb37a0

Overall Assessment

✅ Looks good

Executive Summary

Re-review result: the prior P1 on the silent ratio-sieve fallback is resolved. The fallback now emits a UserWarning in diff_diff/efficient_did_covariates.py:239 and is regression-covered in tests/test_efficient_did.py:1451.
Affected methodology: the PR changes the with-covariates EfficientDiD path, including Eq. 4.4 generated outcomes, sieve ratio and inverse-propensity nuisances, kernel-smoothed conditional Omega*(X), and the Eq. 4.3 plug-in efficient weighting. I did not find a new undocumented mismatch against the paper in the changed estimator, inference, or bootstrap paths. (arxiv.org)
Highest remaining issue is P3 only: the new registry note is slightly narrower than the actual fallback trigger conditions.
Static review only: I could not run the test suite here because numpy, scipy, and pytest are unavailable in this environment.

Methodology

Severity: P3. Impact: docs/methodology/REGISTRY.md:621 and docs/methodology/REGISTRY.md:622 describe the sieve fallbacks as singular-matrix / exception-triggered cases, but the implementation also takes the same warned fallback when no sieve degree is feasible before any solve attempt at diff_diff/efficient_did_covariates.py:204 and diff_diff/efficient_did_covariates.py:326, with warnings emitted at diff_diff/efficient_did_covariates.py:239 and diff_diff/efficient_did_covariates.py:354. This is documentation drift, not a correctness issue. Concrete fix: broaden both notes to “when no K succeeds” and drop the exception-triggered wording.
No P0/P1 findings. The previous fallback-warning blocker is addressed in diff_diff/efficient_did_covariates.py:239 and tests/test_efficient_did.py:1451.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The pre-existing EfficientDiD public-doc gap remains tracked at TODO.md:54 and does not count against this PR.

Security

No findings.

Documentation/Tests

No findings in the changed tests. The prior re-review target is now covered by tests/test_efficient_did.py:1451. Verification here was static only because the runtime dependencies are unavailable.

Broaden sieve fallback descriptions from "singular basis matrix" / "exception-triggered" to "when no sieve degree K succeeds" — covers both the n_basis >= n_group early-exit and singular system cases. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T22:33:33Z

/ai-review

github-actions · 2026-03-21T22:44:56Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7a86d431267905ac78f18e3cefa02ceeea7bfee2

Overall Assessment

⚠️ Needs changes

Executive Summary

Highest-severity issue is a P1 [Newly identified] undocumented methodology mismatch: the new covariate path hardcodes OLS working models for the outcome regressions in diff_diff/efficient_did_covariates.py:L28-L87, while diff_diff/efficient_did.py:L48-L59 still presents the with-covariates path as the generic semiparametrically efficient estimator. The paper explicitly separates parametric working models from the flexible nonparametric nuisance-estimation procedure used for the efficient estimator. (cowles.yale.edu)
The prior re-review blocker remains resolved: both sieve fallback paths warn in diff_diff/efficient_did_covariates.py:L238-L246 and diff_diff/efficient_did_covariates.py:L353-L360, with regression coverage in tests/test_efficient_did.py:L1453-L1483.
No new NaN/inline-inference anti-patterns were introduced: changed inference paths use safe_inference() in diff_diff/efficient_did.py:L617-L642 and the aggregation helpers continue to do the same in diff_diff/efficient_did.py:L961-L1023.
I did not treat the polynomial sieve basis, Gaussian kernel/Silverman bandwidth, ratio clipping, or multiplier bootstrap pattern as defects here; those are either documented Note: items in docs/methodology/REGISTRY.md:L621-L675 or valid implementation choices.

Methodology

Severity: P1 [Newly identified]. Impact: the covariate path’s shipped methodology is a DR working-model implementation, not the paper’s generic efficient Section 4 nuisance-estimation path as currently described. The new code fixes m_hat_{g',t,t_pre}(X) to linear OLS in diff_diff/efficient_did_covariates.py:L28-L87, but the estimator docstring still says the class “achieves the semiparametric efficiency bound” and describes the covariate path as the Section 4 efficient estimator in diff_diff/efficient_did.py:L48-L59. The registry’s EfficientDiD entry also still frames Step 1 as sieve/kernel/ML nuisance estimation and has no **Note:** documenting the OLS substitution in docs/methodology/REGISTRY.md:L581-L587 and docs/methodology/REGISTRY.md:L631-L675. The paper itself distinguishes parametric working models from the flexible nonparametric procedure and warns that misspecified parametric models can still induce asymptotic bias. Concrete fix: add a registry **Note:** and align the in-code docstrings to say the current covariate path uses linear OLS working models and is DR, but not generically guaranteed to attain the paper’s semiparametric efficiency bound unless that working model is correctly specified; alternatively, replace estimate_outcome_regression() with a flexible nuisance interface consistent with Section 4. (cowles.yale.edu)

Path to Approval

Add a **Note:** under EfficientDiD in docs/methodology/REGISTRY.md documenting that the shipped covariate path uses linear OLS working models for m_hat, so the implementation is DR but not generically the paper’s flexible semiparametrically efficient covariate estimator.
Update the estimator/module docstrings in diff_diff/efficient_did.py and diff_diff/efficient_did_covariates.py to remove the unconditional efficiency claim for the DR path unless/until a flexible nuisance-estimation interface is implemented.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No findings in the added test coverage. Static review only: I could not execute tests/test_efficient_did.py in this environment because pytest and numpy are unavailable.

…ocstrings The paper's Section 4 describes flexible nonparametric nuisance estimation for m_hat, while our implementation uses linear OLS. Add REGISTRY.md note and align class/module docstrings: the DR property ensures consistency, but OLS does not generically guarantee the semiparametric efficiency bound unless the conditional mean is linear in covariates. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T22:47:39Z

/ai-review

github-actions · 2026-03-21T23:03:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 56fa2e74dfa004ef634926ccfa71f7885c2b0d77

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior methodology-docs blocker is largely resolved: the new REGISTRY.md note and the updated class/new-module docstrings now describe the OLS working-model caveat for the DR path at docs/methodology/REGISTRY.md:676, diff_diff/efficient_did.py:48, and diff_diff/efficient_did_covariates.py:1.
P1: the new step-4 inverse-propensity path silently clips s_hat = 1 / p_g(X) into [1, n] at diff_diff/efficient_did_covariates.py:368 and then uses those truncated values inside conditional Omega* at diff_diff/efficient_did_covariates.py:597 and diff_diff/efficient_did_covariates.py:632, but unlike the ratio path there is no overlap diagnostic and no matching REGISTRY.md note.
No new inline-inference or partial-NaN anti-patterns were introduced; the changed analytical and aggregated inference paths still route through safe_inference() at diff_diff/efficient_did.py:623, diff_diff/efficient_did.py:645, diff_diff/efficient_did.py:966, and diff_diff/efficient_did.py:1026.
The added tests cover ratio-overlap warnings and total inverse-propensity failure fallback at tests/test_efficient_did.py:1377 and tests/test_efficient_did.py:1469, but not the finite-clipping case behind the blocker.
Static review only: I could not execute the test suite here because numpy, pandas, scipy, and pytest are unavailable in this environment.

Methodology

Severity: P1. Impact: Assumption O requires each generalized propensity score to stay in (0, 1), and the paper’s step 4 explicitly estimates s_{g'}(X) = 1 / p_{g'}(X) before plugging it into conditional Omega*. This PR’s new implementation silently truncates best_s to [1, n] at diff_diff/efficient_did_covariates.py:368, then uses those values in the conditional covariance weighting at diff_diff/efficient_did_covariates.py:597 and diff_diff/efficient_did_covariates.py:632. Unlike the ratio path’s explicit clipping warning at diff_diff/efficient_did_covariates.py:255, there is no warning when step-4 clipping binds, and REGISTRY.md documents the inverse-propensity fallback plus ratio clipping but not ordinary s_hat truncation at docs/methodology/REGISTRY.md:622, docs/methodology/REGISTRY.md:673, and docs/methodology/REGISTRY.md:675. This can silently mask overlap problems and change the weighting/SE path. Concrete fix: add an explicit warning whenever best_s is clipped at either bound, and either document that truncation in REGISTRY.md as an intentional deviation or replace it with a fail-fast overlap diagnostic. (arxiv.org)
Severity: P3. Impact: the file-level docstring in diff_diff/efficient_did.py:1 still says the estimator achieves the efficiency bound without qualifying the DR path, even though the class docstring and REGISTRY.md now do. Because the deviation is documented in docs/methodology/REGISTRY.md:676, I would treat this as informational rather than blocking. Concrete fix: align the module docstring with the class docstring.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No additional findings. I did not find a matching mitigation entry under TODO.md:44, so the P1 above remains untracked and unmitigated.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the new tests exercise ratio clipping warnings and total inverse-propensity estimation failure, but they do not cover the intermediate case where s_hat remains finite yet is clipped into [1, n]; that leaves the blocking overlap/inference path unguarded at tests/test_efficient_did.py:1377 and tests/test_efficient_did.py:1469. Concrete fix: add a regression test with near-zero generalized propensities for both target and comparison groups so the ratio stays moderate but estimate_inverse_propensity_sieve() clips and warns.
Static review only: I could not run tests/test_efficient_did.py in this environment because the scientific Python stack is missing.

Path to Approval

In diff_diff/efficient_did_covariates.py:278, detect when best_s hits either clip bound and emit a UserWarning analogous to the existing ratio-clipping warning.
Add a **Note:** to docs/methodology/REGISTRY.md:675 if step-4 truncation is intentional, or remove the truncation and replace it with an explicit overlap failure/warning policy.
Add a targeted regression test in tests/test_efficient_did.py:1469 for finite s_hat clipping, not just all-K failure fallback.

…string P1 fix: estimate_inverse_propensity_sieve() now warns when s_hat values are clipped to [1, n], matching the ratio path's overlap diagnostics. Documented in REGISTRY.md. P3 fix: module-level docstring in efficient_did.py now qualifies the covariate path, consistent with the class docstring and REGISTRY.md. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-21T23:19:22Z

/ai-review

github-actions · 2026-03-21T23:26:41Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a3a7909307cc27c6afa42f72adbfc8bf527f2fda

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the prior P1 methodology blocker is resolved. Step-4 inverse-propensity clipping now emits a warning in diff_diff/efficient_did_covariates.py:368, and the truncation is now documented as a **Note:** in docs/methodology/REGISTRY.md:675.
Affected method: EfficientDiD with the new covariate-adjusted doubly robust ATT(g,t) path. The changed implementation matches the paper’s core structure on the parts that matter most here: Eq. 4.4 generated outcomes, Eq. 4.3 per-unit efficient weighting, and Theorem 4.1 / Remark 4.1 inference-weight logic. (cowles.yale.edu)
No new inline-inference or partial-NaN anti-patterns were introduced; the affected inference paths still route through safe_inference() in diff_diff/efficient_did.py:623, diff_diff/efficient_did.py:645, diff_diff/efficient_did.py:966, and diff_diff/efficient_did.py:1026.
Severity: P3. Impact: public docs are now stale and still tell users EfficientDiD is “no-covariates only” and broadly efficiency-bound attaining, which no longer matches the new public behavior. Concrete fix: update docs/api/efficient_did.rst:7, docs/api/efficient_did.rst:17, docs/api/efficient_did.rst:139, docs/choosing_estimator.rst:267, and docs/tutorials/15_efficient_did.ipynb:561 to describe the DR path and qualify the efficiency claim to the no-covariates path / correctly specified linear working-model case.
Severity: P3. Impact: the new finite-s_hat clipping warning is fixed in code but still lacks a targeted regression test; tests/test_efficient_did.py:1377 can pass on ratio-clipping alone, and tests/test_efficient_did.py:1469 only covers total sieve-failure fallback. Concrete fix: add a direct unit test for estimate_inverse_propensity_sieve() that produces finite out-of-bounds s_hat values and asserts the specific warning from diff_diff/efficient_did_covariates.py:368.

Methodology

Affected method: EfficientDiD with covariates (doubly robust ATT(g,t) path).

No unmitigated methodology findings. The step-4 clipping behavior is now both warned and documented, so under the review rubric it is no longer an undocumented deviation. The remaining changed estimator pieces I checked are consistent with the source material and the Methodology Registry. (cowles.yale.edu)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: public documentation is inconsistent with the merged behavior and with the new REGISTRY.md caveat. Concrete fix: update the API page, estimator-selection guide, and tutorial references noted above so they stop advertising EfficientDiD as Phase-1/no-covariates-only.
Severity: P3. Impact: the newly added inverse-propensity clipping warning path is not directly regression-tested. Concrete fix: add a focused test for finite clipping, not just all-K fallback failure.

Validation note: static review only. I could not run the test suite here because numpy, pandas, scipy, and pytest are unavailable in this environment.

Add warning on ratio sieve all-K fallback

24b6b2a

Symmetric with the inverse propensity fallback: emit UserWarning when no sieve candidate succeeds and ratio falls back to constant 1. Updated test to assert the warning fires. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber merged commit c88c890 into main Mar 22, 2026
14 checks passed

igerber deleted the edid-covariates branch March 22, 2026 00:36

Conversation

igerber commented Mar 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant