Skip to content

Migrate to uv, drop 3.9 and 3.10, fix tests#335

Open
PastelStorm wants to merge 7 commits intomainfrom
evoss/migrate-to-uv
Open

Migrate to uv, drop 3.9 and 3.10, fix tests#335
PastelStorm wants to merge 7 commits intomainfrom
evoss/migrate-to-uv

Conversation

@PastelStorm
Copy link
Copy Markdown

@PastelStorm PastelStorm commented Apr 3, 2026

Note

Medium Risk
Medium risk due to switching packaging/build tooling from Poetry to uv/setuptools and changing Python support/CI matrices, which can break installs and releases. Also adjusts split-PDF request/timeout behavior and integration test assertions, which could mask or surface behavior changes in OCR/hi-res outputs.

Overview
Migrates the project from Poetry to uv and setuptools: pyproject.toml now uses dynamic versioning, dependency groups, and a setuptools build backend; poetry.lock/poetry.toml are removed; scripts/publish.sh now builds/publishes via uv with stricter shell/Python>=3.11 guards.

Updates CI and local workflows to match the new tooling and support policy: Python testing is now 3.113.13 (dropping 3.9/3.10), jobs use setup-uv with locked installs (UV_LOCKED=1), and integration/contract tests run on a larger runner.

Refines split-PDF behavior and tests: SplitPdfHook now propagates request timeouts into its internal async client and uses an in-memory no-op request (preserving extensions) to jump to after_success; integration tests relax hi_res split-vs-unsplit comparisons to “similarity” checks, standardize longer client timeouts, and a new unit test suite adds regeneration/packaging invariants and multipart serialization guards.

Written by Cursor Bugbot for commit 1eb4db7. This will update automatically on new commits. Configure here.

@socket-security
Copy link
Copy Markdown

socket-security bot commented Apr 3, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedgithub/​astral-sh/​setup-uv@​d0d8abe699bfb85fec6de9f7adb5ae17292296ff99100100100100

View full report

],
_assert_split_unsplit_equivalent(
resp_split, resp_single, strategy,
extra_exclude_paths=[r"root\[\d+\]\['element_id'\]"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently ignored extra_exclude_paths in hi_res test path

Low Severity

The extra_exclude_paths argument at the test_integration_split_pdf_with_caching call site is dead code. The test is parametrized exclusively with shared.Strategy.HI_RES, so _assert_split_unsplit_equivalent always enters the hi_res branch, which never uses extra_exclude_paths. The element_id exclusion that was applied in the old DeepDiff comparison is silently dropped, giving a misleading impression that it still influences the assertion.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

if timeout_seconds is None:
task_responses = task_responses_future.result()
else:
task_responses = task_responses_future.result(timeout=timeout_seconds + TIMEOUT_BUFFER_SECONDS)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future timeout equals per-request timeout, too short for batched operations

High Severity

The timeout_seconds value (derived from the per-request timeout) is reused as the total operation timeout for task_responses_future.result(). The same value also sets the per-request httpx.Timeout passed to run_tasks. For large PDFs, chunks are processed in sequential batches (limited by the concurrency semaphore), so total wall-clock time can be ceil(num_chunks / concurrency_level) * timeout_seconds. With TIMEOUT_BUFFER_SECONDS of only 5, the future will raise concurrent.futures.TimeoutError well before all batches finish. For example, a 1000-page PDF at concurrency 10 yields ~50 chunks in 5 batches — the operation needs up to 5× the per-request timeout but only gets 1× plus 5 seconds.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants