Skip to content

Add doubly robust covariates path to EfficientDiD#225

Merged
igerber merged 12 commits intomainfrom
edid-covariates
Mar 22, 2026
Merged

Add doubly robust covariates path to EfficientDiD#225
igerber merged 12 commits intomainfrom
edid-covariates

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 21, 2026

Summary

  • Add doubly robust (DR) covariates path to EfficientDiD estimator (Phase 2 of Chen, Sant'Anna & Xie 2025)
  • New module efficient_did_covariates.py with pure-function DR math: outcome regression via OLS, propensity score ratios via logit, per-unit generated outcomes (Eq 4.4), unconditional Omega*, and EIF computation
  • When covariates=["x1", "x2"] is passed to fit(), automatically uses the DR path; covariates=None (default) uses the existing no-covariates closed-form path unchanged
  • Add pscore_trim parameter with overlap diagnostics via _check_propensity_diagnostics()
  • Add estimation_path field to EfficientDiDResults ("nocov" or "dr")
  • Extend DGP helper with covariate generation (including confounding for DR testing)
  • 17 new tests across 5 classes: basic correctness, validation, PT assumptions, edge cases, bootstrap

Methodology references (required if estimator / math changes)

  • Method name(s): Efficient Difference-in-Differences (doubly robust with covariates path)
  • Paper / source link(s): Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient Difference-in-Differences and Event Study Estimators. (https://arxiv.org/abs/2506.17729), Equations 4.3-4.4, Section 4
  • Any intentional deviations from the source (and why):
    • Propensity score ratios via standard logistic regression instead of sieve-based convex minimization (Eq 4.1-4.2). Valid under DR property; sieve approach deferred.
    • Unconditional Omega* for efficient weights instead of kernel-smoothed conditional covariance Omega*(X). Gives valid DR estimator with correct SEs but not the full semiparametric efficiency bound. Conditional weights deferred.

Validation

  • Tests added/updated: tests/test_efficient_did.py (17 new tests), tests/helpers/edid_dgp.py (extended)
  • All 85 EDID tests pass including 7 HRS empirical validation tests (unchanged)
  • Backtest / simulation / notebook evidence: DGP with confounding validates DR path produces different (corrected) estimates vs nocov path

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implements Phase 2 of the Chen, Sant'Anna & Xie (2025) methodology:
when covariates are provided, uses outcome regression (OLS) and
propensity score ratios (logit) for doubly robust ATT estimation.
The estimator is consistent if either the outcome model or the
propensity model is correctly specified.

New module efficient_did_covariates.py contains the DR math functions.
The no-covariates closed-form path remains unchanged as the default.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions
Copy link

Overall Assessment

⛔ Blocker

Executive Summary

Methodology

  • Severity P0. Impact: Eq. 4.4 requires r_hat_{g,g'}(X) = p_g(X) / p_{g'}(X), but the implementation builds X_combined in original unit order and builds D as ones(n_g) + zeros(n_gp). Those are only aligned if all g units happen to appear before all g' units. For interleaved/shuffled unit ids, the fitted logit is estimating the wrong binary classification problem, so the DR generated outcomes, ATT(g,t), EIF, and both analytical and multiplier-bootstrap inference can be wrong with no diagnostic. Concrete fix: in diff_diff/efficient_did_covariates.py:L132-L166, either stack X_g then X_gp explicitly or define D = mask_g[combined_mask].astype(float) so labels follow the selected rows; also short-circuit mask_g == mask_gp to return an all-ones ratio for r_{g,g}(X).
  • Severity P3. Impact: using standard logit instead of the paper’s sieve ratio estimator, and unconditional Omega* instead of conditional Omega*(X), is documented in the registry and tracked in TODO.md, so these are acceptable deferred deviations rather than review defects. Concrete fix: none required for this PR.

Code Quality

No additional findings beyond the methodology blocker above.

Performance

No separate performance finding. Fixing the r_{g,g} special case in the P0 item will also remove repeated failed logit fits on valid PT-All same-cohort moments.

Maintainability

Tech Debt

  • Severity P3. Impact: the PR properly adds TODO.md entries for the deferred sieve-based propensity ratio estimator and conditional Omega*(X) weighting, so those deferrals are tracked and non-blocking. See TODO.md:L56-L57. Concrete fix: none required for this PR.

Security

No findings.

Documentation/Tests

  • Severity P1. Impact: the new covariate path is only covered by smoke-style tests for plain fit() and one overall bootstrap call; it does not exercise aggregate="event_study", aggregate="group", or aggregate="all" with covariates, and it never shuffles/interleaves unit ordering. Because the helper DGP assigns cohorts in contiguous unit blocks, the current suite is predisposed to miss the row-order bug in the propensity helper. See tests/helpers/edid_dgp.py:L44-L48, tests/test_efficient_did.py:L1093-L1300, and diff_diff/efficient_did.py:L580-L662. Concrete fix: add covariate-path regression tests that:
    1. shuffle or relabel units so cohorts are interleaved,
    2. cover PT-All and PT-Post,
    3. cover aggregate="event_study", aggregate="group", and aggregate="all",
    4. include at least one bootstrap case on an aggregated covariate path.

Path to Approval

  1. Fix diff_diff/efficient_did_covariates.py:L132-L166 so the propensity-ratio labels are aligned with the selected rows, and special-case r_{g,g}(X) = 1.
  2. Add a regression test with shuffled/interleaved unit ordering that fails on the current implementation and passes once the label alignment bug is fixed.
  3. Add covariate-path aggregation/bootstrapping tests for event_study, group, and all, with at least one PT-All bootstrap case.
  4. Thread pscore_trim into EfficientDiDResults so the new estimator parameter is preserved downstream.

…rim to results

P0 fix: estimate_propensity_ratio() now derives D labels from
mask_g[combined_mask] instead of concatenating ones/zeros, ensuring
correct alignment regardless of unit ordering. Also short-circuits
r_{g,g}(X) = 1 for same-cohort comparisons under PT-All.

P1 fix: pscore_trim is now preserved in EfficientDiDResults, matching
the staggered_results pattern for parameter traceability.

P1 fix: add shuffled-unit regression test that would have caught the
label bug, plus aggregation tests (event_study, group, all) and
PT-All bootstrap test for full covariate-path coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 02537476753671800dbd9ef04a2777398ce013af


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

  • Severity P1. Impact: estimate_propensity_ratio() switches to a constant cohort-share odds model whenever solve_logit() raises, by setting pscore_all = n_g / (n_g + n_gp) and therefore r_hat_{g,g'}(X) = n_g/n_gp for all X in diff_diff/efficient_did_covariates.py:L154-L169. Eq. 4.4 is written in terms of estimated ratio functions r_{g,g'}(X), and the paper’s Section 4 construction centers nuisance estimation on ratio estimators plus plug-in covariance estimation; this fallback is a real methodology deviation and it is not disclosed in REGISTRY.md. Concrete fix: either raise on logit failure, or add an explicit REGISTRY.md deviation note describing the constant-ratio fallback and its implications. (cowles.yale.edu)
  • Severity P3. Impact: the new logit-for-sieve and unconditional-Omega* substitutions are acceptable documented deviations, so they are not defects. Concrete fix: none required. (cowles.yale.edu)
  • Severity P3. Impact: the public class docstring still says EfficientDiD “Achieves the semiparametric efficiency bound” even when covariates routes to the documented non-efficient approximation with unconditional Omega*. That is user-facing overstatement, but it is mitigated by the registry note. Concrete fix: align diff_diff/efficient_did.py:L46-L54 with docs/methodology/REGISTRY.md:L671-L672.

Code Quality

  • No new findings.

Performance

  • No new findings.

Maintainability

  • No new findings.

Tech Debt

  • Severity P3. Impact: the remaining methodology gaps, sieve-based ratio estimation and conditional Omega*(X) weighting, are properly tracked in TODO.md:L56-L57 and documented in docs/methodology/REGISTRY.md:L656-L672, so they are non-blocking deferred work. Concrete fix: none required.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new “confounding” DGP in tests/helpers/edid_dgp.py:L88-L99 adds an x1 * time trend only for ever-treated units, so conditional parallel trends does not actually hold. The associated test in tests/test_efficient_did.py:L1139-L1158 only checks that the DR path runs and differs from nocov, so it does not validate the claimed DR recovery scenario. Concrete fix: generate confounding through X-dependent untreated trends that apply to all groups and let treatment timing depend on X; then assert the DR estimate is closer to the known ATT than the no-covariates estimate.

Path to Approval

  1. Resolve the propensity-model failure behavior in diff_diff/efficient_did_covariates.py:L154-L169: either raise instead of falling back, or document the fallback explicitly in docs/methodology/REGISTRY.md as a deliberate deviation from the paper.
  2. Add a regression test that forces solve_logit() failure and verifies the chosen behavior end-to-end.

…ng DGP

- Document propensity-ratio constant-fallback as deviation in REGISTRY.md
- Fix confounding DGP: X-dependent trends apply to all groups (not just
  treated), x1 distribution shifts by group; conditional PT now holds
- Test asserts DR bias < nocov bias against known true ATT
- Add logit-failure regression test via mock
- Clarify docstring: covariate path uses unconditional Omega*, does not
  achieve full semiparametric efficiency bound
- Fix pre-existing E402 lint: move edid_dgp import to top of file

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 86492b0272d9d5a9613e6f25f35be9ec7c78b3a4


Overall Assessment

✅ Looks good

Executive Summary

  • Affected method: the PR adds a covariate-enabled doubly robust path for EfficientDiD, specifically the staggered-adoption Section 4 estimator built from Eq. 4.3 / Eq. 4.4 generated outcomes and EIF-based analytical SEs. (ar5iv.org)
  • Re-review result: the prior blocker on undocumented propensity fallback is addressed. The fallback is now recorded in docs/methodology/REGISTRY.md#L621, exercised in tests/test_efficient_did.py#L1311, and pscore_trim / estimation_path are wired through diff_diff/efficient_did.py#L693 and diff_diff/efficient_did_results.py#L107.
  • The earlier DGP/test concern is also addressed: the confounding DGP now applies the x1-dependent untreated trend to all units in tests/helpers/edid_dgp.py#L96, and the corresponding test now checks bias relative to the known ATT in tests/test_efficient_did.py#L1137.
  • No unmitigated P0/P1 findings remain. The two headline deviations from the paper, standard logit for propensity-ratio estimation and unconditional Omega* weights, are now documented in the registry/TODO, and Remark 4.1 / Remark 4.2 make those non-blocking implementation choices rather than methodology defects. (ar5iv.org)
  • Remaining issues are P3 only: the new fallback note/docstring is still slightly imprecise, and the overlap-warning test does not actually assert that a warning fired.
  • Static review only: pytest is not installed here, and numpy / pandas are unavailable, so I could not execute the new test slice.

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3. Impact: the remaining paper-parity work is properly tracked in TODO.md#L56 and TODO.md#L57, so it should not block this PR. Concrete fix: none required.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: tests/test_efficient_did.py#L1356 is named as a warning test, but it only records warnings and then checks for finite estimates; it never asserts that _check_propensity_diagnostics() in diff_diff/linalg.py#L1271 or the call site in diff_diff/efficient_did_covariates.py#L144 actually emitted a trimming / near-separation warning. A regression could therefore remove the diagnostic without failing CI. Concrete fix: assert that at least one captured warning message mentions trimming, propensity scores, or near-separation.
  • Static note: I could not run the test suite here because pytest is not installed, and numpy / pandas are unavailable.

Implements paper-faithful Section 4 nuisance estimation:

Sieve-based propensity ratio (Eq 4.1-4.2):
- Closed-form linear system per polynomial degree K
- AIC/BIC model selection across sieve dimensions
- Ratio clipping to [1/ratio_clip, ratio_clip] for positivity
- Eliminates logit convergence/separation issues entirely

Kernel-smoothed conditional Omega*(X) (Eq 3.12):
- Nadaraya-Watson estimator with Gaussian kernel and local means
- Silverman bandwidth by default, user-overridable
- Per-unit efficient weights w(X_i) from Omega*(X_i) inverse
- Plug-in EIF valid under Neyman orthogonality (Remark 4.2)

New params: sieve_k_max, sieve_criterion, ratio_clip, kernel_bandwidth
Removed: pscore_trim (replaced by ratio_clip)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 74d7723c43c86a6c8783c74abe132d2960f6bda0


Overall Assessment

⛔ Blocker

Executive Summary

Methodology

Affected method: EfficientDiD covariate-enabled doubly robust staggered-adoption path (Eq. 4.3, Eq. 4.4, conditional Omega*_{gt}(X), EIF/SE).

  • [Newly identified] Severity P0. compute_eif_cov() is not the influence function for the estimator actually computed on the DR path. The estimator is att_gt = mean_i[w_i(X)'gen_out_i], but the code returns sum_h w_ih(gen_out_ih - mean_j gen_out_jh), which only centers correctly when weights are global, not unit-specific. Impact: group-time analytical SEs, p-values, and CIs are silently wrong on the covariate path, and the same bad eif_by_gt values are then reused by the multiplier bootstrap. Concrete fix: change the DR EIF to center on the scalar att_gt for the per-unit-weight estimator in diff_diff/efficient_did_covariates.py:L626-L669, update the caller in diff_diff/efficient_did.py:L530-L537, and add a regression test that the sample mean of the returned EIF is approximately zero when weights vary across units.
  • [Newly identified] Severity P1. The new Omega*_{gt}(X) implementation does not match Eq. 3.12 / the registry: it scales every covariance term by unconditional pi_g, pi_inf, and pi_gp rather than conditional p_g(X), p_inf(X), and p_{g'}(X). Impact: the shipped covariate path is not the documented conditional-efficient method, and the class/module docstrings overstate what is implemented in diff_diff/efficient_did.py:L50-L57 and diff_diff/efficient_did_covariates.py:L4-L8. Concrete fix: either implement the per-unit propensity scaling required by Eq. 3.12, or add a **Note (deviation from source)** in the registry and reword the public docs so this path is no longer presented as achieving the paper’s conditional efficiency bound.
  • [Newly identified] Severity P1. The overlap assumption check required by the registry is missing from the DR path. The code estimates/clips ratios but never emits a near-boundary warning, and there is no pscore_trim-style user control or diagnostic equivalent to the existing helper in diff_diff/linalg.py:L1271-L1293. Impact: severe overlap problems are silently masked by clipping, which is exactly the kind of missing assumption check the registry says should warn users. Concrete fix: add an actual diagnostic call on the underlying propensity/ratio estimates before clipping, expose the trimming threshold as a user-facing parameter if intended, and only mark the registry requirement complete once that warning path exists.

Code Quality

  • No additional findings.

Performance

  • No additional findings.

Maintainability

  • No additional findings.

Tech Debt

  • No additional findings. TODO.md does not currently track the P0/P1 issues above, so they remain unmitigated.

Security

  • No findings.

Documentation/Tests

  • Severity P3. The new tests mostly check finiteness and path selection, so they would not catch either blocking issue above. For example, the near-separation test only asserts finite output in tests/test_efficient_did.py:L1351-L1374, and there is no test that the DR EIF is mean-zero or benchmarked against a known SE. Impact: both inference/assumption regressions can land silently. Concrete fix: add one unit test for compute_eif_cov() with non-constant per-unit weights and one warning assertion for extreme-overlap clipping.
  • Static review only: numpy and pytest are not available in this environment, so I could not execute the new test slice.

Path to Approval

  1. Re-derive the DR EIF for the per-unit-weight estimator and fix diff_diff/efficient_did_covariates.py:L626-L669 plus its caller in diff_diff/efficient_did.py:L530-L537 so the returned influence values are centered at the scalar att_gt; add a regression test that the sample mean of the EIF is approximately zero under varying weights.
  2. In diff_diff/efficient_did_covariates.py:L457-L549, either replace unconditional pi_* scaling with estimated conditional propensities / s_hat_{g'} per Eq. 3.12, or explicitly document the deviation in docs/methodology/REGISTRY.md:L657-L673 and downgrade the public “efficient” claims for the covariate path.
  3. Add real overlap diagnostics to the DR ratio path by reusing the existing warning pattern from diff_diff/linalg.py:L1271-L1293 / diff_diff/staggered.py:L1505-L1534, and add a test that explicitly asserts the warning on a near-separation design.

P0 fix: EIF now centers on scalar ATT (EIF_i = w_i @ gen_out_i - ATT)
instead of per-pair means, ensuring mean(EIF) ≈ 0 when weights vary
by unit. Added unit test verifying mean-zero property.

P1 fix: Add overlap warning when sieve ratios require clipping, with
test asserting the warning fires on near-separation covariates.

P1 fix: Document unconditional-pi Omega* as deviation in REGISTRY.md;
downgrade "efficient" claim in docstring for covariate path.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c7770291f394064750cabc739fbafe3165a99bb


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

Affected method: EfficientDiD doubly robust staggered-adoption path with covariates (Omega*_{gt}(X), Eq. 4.3, Eq. 4.4, EIF/SE). (arxiv.org)

  • Severity P3. Impact: the remaining source deviation is the unconditional-pi scaling inside the covariate-path Omega*(X) rather than the paper’s conditional propensity scaling from Eq. 3.12. This is now explicitly documented in the registry, so under the review rubric it is informational rather than a defect. Concrete fix: none required for approval; optionally mirror the registry note more explicitly in code-local docstrings to avoid overclaiming full conditional efficiency. docs/methodology/REGISTRY.md:L671-L674 diff_diff/efficient_did.py:L47-L59 (arxiv.org)

Code Quality

  • Severity P1 [Newly identified]. Impact: the new DR tuning-parameter validation is incomplete. kernel_bandwidth is accepted without any finite/positive check, and ratio_clip=np.nan also bypasses the current guard because only <= 1.0 is tested. Those bad values then reach _kernel_weights_matrix() / np.clip() and can poison downstream DR inference with NaN outputs. Concrete fix: in EfficientDiD._validate_params(), require kernel_bandwidth is None or (np.isfinite(kernel_bandwidth) and kernel_bandwidth > 0) and ratio_clip to be finite and > 1.0; while touching the block, it would also be prudent to validate sieve_k_max as None or a positive integer. diff_diff/efficient_did.py:L135-L150 diff_diff/efficient_did_covariates.py:L239-L252 diff_diff/efficient_did_covariates.py:L367-L386 (arxiv.org)

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No mitigating TODO.md entry covers the P1 above, so it remains unmitigated for assessment purposes. TODO.md:L44-L57

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Tighten DR tuning validation in diff_diff/efficient_did.py:L135-L150 so kernel_bandwidth must be finite and strictly positive (or None) and ratio_clip must be finite and > 1.0.
  2. Add regression tests in tests/test_efficient_did.py that assert ValueError for kernel_bandwidth=0, kernel_bandwidth=np.nan, and kernel_bandwidth=np.inf rather than letting those values reach the numerical routines.

Validate kernel_bandwidth (None or finite > 0), ratio_clip (finite > 1.0),
and sieve_k_max (None or positive int) in _validate_params(). Rejects
NaN, Inf, zero, and negative values with clear ValueError messages.

Add pytest.raises tests for all invalid parameter combinations.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 7608264bf1d0be401abcc8579719b9ea236b1931


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review result: the prior P1 is resolved. _validate_params() now rejects invalid ratio_clip, kernel_bandwidth, and sieve_k_max, and regression coverage was added for those cases in diff_diff/efficient_did.py:L135 and tests/test_efficient_did.py:L1189.
  • Affected methodology: the new covariate/DR EDiD path touches the Section 4 nuisance-estimation workflow, Eq. 4.3 weighted ATT construction, Eq. 4.4 generated outcomes, Eq. 3.12 conditional Omega*, and EIF-based analytical inference. citeturn1view0turn1view1
  • I did not find any new unmitigated P0/P1 issues in inference handling. The new path uses safe_inference() rather than inline t-stat / p-value / CI code at diff_diff/efficient_did.py:L604, diff_diff/efficient_did.py:L626, and diff_diff/efficient_did.py:L677.
  • The only material source deviation I found remains the unconditional-pi scaling in Omega*(X) instead of the paper’s conditional generalized-propensity version; that deviation is explicitly documented in the registry, so under the review rubric it is P3 informational rather than a blocker. citeturn1view0turn1view1
  • Static review only: I could not run the test suite in this environment because numpy/scipy are not installed.

Methodology

  • Severity P3 (informational). Impact: the paper’s conditional Omega*_{gt}(X) construction is tied to conditional generalized propensities / s_hat_{g'} in the Section 4 algorithm, while this implementation intentionally uses unconditional cohort fractions instead. That is already documented in docs/methodology/REGISTRY.md:L674, so it is not a defect for approval; the remaining issue is that the new module/function docstrings still read as if Eq. 3.12 is implemented literally in diff_diff/efficient_did_covariates.py:L1 and diff_diff/efficient_did_covariates.py:L421. Concrete fix: mirror the registry’s deviation note in those code-local docstrings. citeturn1view0turn1view1

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: EfficientDiDResults now exposes DR metadata (estimation_path, sieve_k_max, sieve_criterion, ratio_clip, kernel_bandwidth) but the public attribute block still stops at bootstrap_results, so the new results surface is under-documented in diff_diff/efficient_did_results.py:L28 even though the fields exist at diff_diff/efficient_did_results.py:L107. Concrete fix: extend the dataclass attribute docs to cover the new DR fields.
  • Severity P3. Impact: the new tests cover validation, PT variants, aggregation modes, overlap warnings, and bootstrap, but they do not exercise the documented “all sieve fits fail -> constant ratio 1” fallback recorded in docs/methodology/REGISTRY.md:L621. That leaves a documented edge case unguarded by tests. Concrete fix: add a focused unit test around estimate_propensity_ratio_sieve() at diff_diff/efficient_did_covariates.py:L136 that forces all K attempts to fail and asserts the documented fallback behavior.

…ep 4)

Replace unconditional cohort fractions pi_g in Omega*(X) with per-unit
sieve-estimated inverse propensities s_hat_{g'}(X) = 1/p_{g'}(X), matching
Eq 3.12 and the paper's algorithm step 4.

The inverse propensity sieve uses the same polynomial basis and AIC/BIC
selection as the ratio estimator. FOC: (Psi_{g'}' Psi_{g'}) beta =
Psi_all.sum(axis=0) — same closed-form linear system structure.

This eliminates the last documented methodology deviation from the paper.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 16dc2f9f5540bd40ee559f78d9bf3b4198151108


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

Affected method(s): Efficient Difference-in-Differences with covariates, especially the DR nuisance-estimation and efficient-weighting path.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Path to Approval

  1. In estimate_inverse_propensity_sieve(), stop silently switching to unconditional n / n_group weights. Either raise on all-K failure, or emit a UserWarning and document the fallback in REGISTRY.md with a **Note:**.
  2. Add targeted regression tests for both sieve-failure paths: one for estimate_inverse_propensity_sieve() and one for estimate_propensity_ratio_sieve(), using a tiny or collinear group setup that forces all K attempts to fail.
  3. Update the EfficientDiDResults attribute docs to cover estimation_path, sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth.

P1 fix: estimate_inverse_propensity_sieve() now emits UserWarning on
all-K fallback (matching the ratio sieve pattern). Documented in
REGISTRY.md alongside the existing ratio fallback note.

Add TestSieveFallbacks class with targeted tests forcing tiny groups
(n_basis >= n_group) for both ratio and inverse propensity sieve helpers.

Extend EfficientDiDResults attribute docs to cover estimation_path,
sieve_k_max, sieve_criterion, ratio_clip, and kernel_bandwidth.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: f94f499c148485e84b135fd5a25ba19454a43d49


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Re-review result: the prior methodology blocker around the step-4 inverse-propensity fallback is resolved. It is now documented in docs/methodology/REGISTRY.md:L621-L622 and regression-covered in tests/test_efficient_did.py:L1467-L1481.
  • Affected methodology: EfficientDiD with covariates, especially the Eq. 4.2 propensity-ratio nuisance, Eq. 4.4 generated outcomes, and Eq. 3.12 conditional Omega* weighting path. (arxiv.org)
  • Severity P1 [Newly identified]: diff_diff/efficient_did_covariates.py:L197-L254 still silently falls back to a constant ratio of 1 when every sieve candidate fails. Those ratios are consumed in diff_diff/efficient_did.py:L494-L523, so the estimator can quietly lose the ratio arm of the DR correction in the exact edge cases where the nuisance fit is failing. (arxiv.org)
  • I did not find new inline-inference or partial-NaN anti-patterns in the changed estimator path; inference still goes through diff_diff/efficient_did.py:L620.
  • Static review only: I could not execute the test suite here because numpy, pandas, scipy, and pytest are not installed in this environment.

Methodology

  • Severity P1 [Newly identified]. Impact: diff_diff/efficient_did_covariates.py:L197-L254 initializes best_ratio to ones and returns it unchanged when all sieve fits fail, but unlike the inverse-propensity helper there is no best_ic == np.inf warning path. The new registry note says this fallback should emit a UserWarning at docs/methodology/REGISTRY.md:L621-L622, and the returned ratios feed directly into the Eq. 4.4 score via diff_diff/efficient_did.py:L494-L523. I infer from Eq. 4.4 that this silently collapses the ratio-adjustment arm of the DR score to a constant-one fallback, so users can lose the advertised DR protection exactly when the ratio nuisance is not estimable. Concrete fix: add an explicit warnings.warn(...) branch when no ratio sieve candidate succeeds, including the n_basis >= n_gp early-exit case, and keep the constant-one fallback only as an explicit surfaced deviation. (arxiv.org)

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked findings. The pre-existing EfficientDiD public-doc gap is already tracked in TODO.md:L54, so I did not count it against this PR.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Add a UserWarning in diff_diff/efficient_did_covariates.py:L197-L254 when estimate_propensity_ratio_sieve() exits via the constant-one fallback because no K succeeded.
  2. Strengthen tests/test_efficient_did.py:L1451-L1465 to assert both the warning and the constant-one fallback.
  3. Tighten docs/methodology/REGISTRY.md:L621-L622 if needed so the note matches the actual trigger conditions for the fallback.

Symmetric with the inverse propensity fallback: emit UserWarning when
no sieve candidate succeeds and ratio falls back to constant 1.
Updated test to assert the warning fires.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 24b6b2a1ebee8813877cc36803cd03ebeefb37a0


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review result: the prior P1 on the silent ratio-sieve fallback is resolved. The fallback now emits a UserWarning in diff_diff/efficient_did_covariates.py:239 and is regression-covered in tests/test_efficient_did.py:1451.
  • Affected methodology: the PR changes the with-covariates EfficientDiD path, including Eq. 4.4 generated outcomes, sieve ratio and inverse-propensity nuisances, kernel-smoothed conditional Omega*(X), and the Eq. 4.3 plug-in efficient weighting. I did not find a new undocumented mismatch against the paper in the changed estimator, inference, or bootstrap paths. (arxiv.org)
  • Highest remaining issue is P3 only: the new registry note is slightly narrower than the actual fallback trigger conditions.
  • Static review only: I could not run the test suite here because numpy, scipy, and pytest are unavailable in this environment.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The pre-existing EfficientDiD public-doc gap remains tracked at TODO.md:54 and does not count against this PR.

Security

  • No findings.

Documentation/Tests

  • No findings in the changed tests. The prior re-review target is now covered by tests/test_efficient_did.py:1451. Verification here was static only because the runtime dependencies are unavailable.

Broaden sieve fallback descriptions from "singular basis matrix" /
"exception-triggered" to "when no sieve degree K succeeds" — covers
both the n_basis >= n_group early-exit and singular system cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 7a86d431267905ac78f18e3cefa02ceeea7bfee2


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

  • Severity: P1 [Newly identified]. Impact: the covariate path’s shipped methodology is a DR working-model implementation, not the paper’s generic efficient Section 4 nuisance-estimation path as currently described. The new code fixes m_hat_{g',t,t_pre}(X) to linear OLS in diff_diff/efficient_did_covariates.py:L28-L87, but the estimator docstring still says the class “achieves the semiparametric efficiency bound” and describes the covariate path as the Section 4 efficient estimator in diff_diff/efficient_did.py:L48-L59. The registry’s EfficientDiD entry also still frames Step 1 as sieve/kernel/ML nuisance estimation and has no **Note:** documenting the OLS substitution in docs/methodology/REGISTRY.md:L581-L587 and docs/methodology/REGISTRY.md:L631-L675. The paper itself distinguishes parametric working models from the flexible nonparametric procedure and warns that misspecified parametric models can still induce asymptotic bias. Concrete fix: add a registry **Note:** and align the in-code docstrings to say the current covariate path uses linear OLS working models and is DR, but not generically guaranteed to attain the paper’s semiparametric efficiency bound unless that working model is correctly specified; alternatively, replace estimate_outcome_regression() with a flexible nuisance interface consistent with Section 4. (cowles.yale.edu)

Path to Approval

  1. Add a **Note:** under EfficientDiD in docs/methodology/REGISTRY.md documenting that the shipped covariate path uses linear OLS working models for m_hat, so the implementation is DR but not generically the paper’s flexible semiparametrically efficient covariate estimator.
  2. Update the estimator/module docstrings in diff_diff/efficient_did.py and diff_diff/efficient_did_covariates.py to remove the unconditional efficiency claim for the DR path unless/until a flexible nuisance-estimation interface is implemented.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No findings in the added test coverage. Static review only: I could not execute tests/test_efficient_did.py in this environment because pytest and numpy are unavailable.

…ocstrings

The paper's Section 4 describes flexible nonparametric nuisance estimation
for m_hat, while our implementation uses linear OLS. Add REGISTRY.md note
and align class/module docstrings: the DR property ensures consistency,
but OLS does not generically guarantee the semiparametric efficiency bound
unless the conditional mean is linear in covariates.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 56fa2e74dfa004ef634926ccfa71f7885c2b0d77


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No additional findings. I did not find a matching mitigation entry under TODO.md:44, so the P1 above remains untracked and unmitigated.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the new tests exercise ratio clipping warnings and total inverse-propensity estimation failure, but they do not cover the intermediate case where s_hat remains finite yet is clipped into [1, n]; that leaves the blocking overlap/inference path unguarded at tests/test_efficient_did.py:1377 and tests/test_efficient_did.py:1469. Concrete fix: add a regression test with near-zero generalized propensities for both target and comparison groups so the ratio stays moderate but estimate_inverse_propensity_sieve() clips and warns.
  • Static review only: I could not run tests/test_efficient_did.py in this environment because the scientific Python stack is missing.

Path to Approval

  1. In diff_diff/efficient_did_covariates.py:278, detect when best_s hits either clip bound and emit a UserWarning analogous to the existing ratio-clipping warning.
  2. Add a **Note:** to docs/methodology/REGISTRY.md:675 if step-4 truncation is intentional, or remove the truncation and replace it with an explicit overlap failure/warning policy.
  3. Add a targeted regression test in tests/test_efficient_did.py:1469 for finite s_hat clipping, not just all-K failure fallback.

…string

P1 fix: estimate_inverse_propensity_sieve() now warns when s_hat values
are clipped to [1, n], matching the ratio path's overlap diagnostics.
Documented in REGISTRY.md.

P3 fix: module-level docstring in efficient_did.py now qualifies the
covariate path, consistent with the class docstring and REGISTRY.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 21, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: a3a7909307cc27c6afa42f72adbfc8bf527f2fda


Overall Assessment

Looks good

Executive Summary

Methodology

Affected method: EfficientDiD with covariates (doubly robust ATT(g,t) path).

No unmitigated methodology findings. The step-4 clipping behavior is now both warned and documented, so under the review rubric it is no longer an undocumented deviation. The remaining changed estimator pieces I checked are consistent with the source material and the Methodology Registry. (cowles.yale.edu)

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: public documentation is inconsistent with the merged behavior and with the new REGISTRY.md caveat. Concrete fix: update the API page, estimator-selection guide, and tutorial references noted above so they stop advertising EfficientDiD as Phase-1/no-covariates-only.
  • Severity: P3. Impact: the newly added inverse-propensity clipping warning path is not directly regression-tested. Concrete fix: add a focused test for finite clipping, not just all-K fallback failure.

Validation note: static review only. I could not run the test suite here because numpy, pandas, scipy, and pytest are unavailable in this environment.

@igerber igerber merged commit c88c890 into main Mar 22, 2026
14 checks passed
@igerber igerber deleted the edid-covariates branch March 22, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant