Migrate to uv, drop 3.9 and 3.10, fix tests#335
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
| ], | ||
| _assert_split_unsplit_equivalent( | ||
| resp_split, resp_single, strategy, | ||
| extra_exclude_paths=[r"root\[\d+\]\['element_id'\]"], |
There was a problem hiding this comment.
Silently ignored extra_exclude_paths in hi_res test path
Low Severity
The extra_exclude_paths argument at the test_integration_split_pdf_with_caching call site is dead code. The test is parametrized exclusively with shared.Strategy.HI_RES, so _assert_split_unsplit_equivalent always enters the hi_res branch, which never uses extra_exclude_paths. The element_id exclusion that was applied in the old DeepDiff comparison is silently dropped, giving a misleading impression that it still influences the assertion.
Additional Locations (1)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| if timeout_seconds is None: | ||
| task_responses = task_responses_future.result() | ||
| else: | ||
| task_responses = task_responses_future.result(timeout=timeout_seconds + TIMEOUT_BUFFER_SECONDS) |
There was a problem hiding this comment.
Future timeout equals per-request timeout, too short for batched operations
High Severity
The timeout_seconds value (derived from the per-request timeout) is reused as the total operation timeout for task_responses_future.result(). The same value also sets the per-request httpx.Timeout passed to run_tasks. For large PDFs, chunks are processed in sequential batches (limited by the concurrency semaphore), so total wall-clock time can be ceil(num_chunks / concurrency_level) * timeout_seconds. With TIMEOUT_BUFFER_SECONDS of only 5, the future will raise concurrent.futures.TimeoutError well before all batches finish. For example, a 1000-page PDF at concurrency 10 yields ~50 chunks in 5 batches — the operation needs up to 5× the per-request timeout but only gets 1× plus 5 seconds.


Note
Medium Risk
Medium risk due to switching packaging/build tooling from Poetry to
uv/setuptools and changing Python support/CI matrices, which can break installs and releases. Also adjusts split-PDF request/timeout behavior and integration test assertions, which could mask or surface behavior changes in OCR/hi-res outputs.Overview
Migrates the project from Poetry to
uvand setuptools:pyproject.tomlnow uses dynamic versioning, dependency groups, and a setuptools build backend;poetry.lock/poetry.tomlare removed;scripts/publish.shnow builds/publishes viauvwith stricter shell/Python>=3.11 guards.Updates CI and local workflows to match the new tooling and support policy: Python testing is now
3.11–3.13(dropping3.9/3.10), jobs usesetup-uvwith locked installs (UV_LOCKED=1), and integration/contract tests run on a larger runner.Refines split-PDF behavior and tests:
SplitPdfHooknow propagates request timeouts into its internal async client and uses an in-memory no-op request (preserving extensions) to jump toafter_success; integration tests relaxhi_ressplit-vs-unsplit comparisons to “similarity” checks, standardize longer client timeouts, and a new unit test suite adds regeneration/packaging invariants and multipart serialization guards.Written by Cursor Bugbot for commit 1eb4db7. This will update automatically on new commits. Configure here.