Skip to content

Preserve path_column in structure_pdfs results for cache-hit jobs#1272

Open
Reichenbachian wants to merge 2 commits intomainfrom
dev/cache_me_if_you_pdf
Open

Preserve path_column in structure_pdfs results for cache-hit jobs#1272
Reichenbachian wants to merge 2 commits intomainfrom
dev/cache_me_if_you_pdf

Conversation

@Reichenbachian
Copy link
Contributor

@Reichenbachian Reichenbachian commented Mar 6, 2026

Original Prompt

pull the sdk and make a pr fixing this

Surprises and Discoveries

  • structure_pdfs was overwriting path_column from a local job_to_pdf_path map keyed by submitted job IDs.
  • On cache-hit rows, returned entities can reference prior job IDs, so this map lookup can miss and null out an otherwise-present pdf_path from entity.properties.

Changes Made

  • Updated structure_pdfs row assembly to read path_column directly from entity.properties instead of remapping via job_to_pdf_path.
  • This keeps pdf_path stable for cache-hit entities and avoids dropping values in downstream workflow tables.

Tests

  • Ran uv run python -m py_compile src/structify/resources/polars.py.

@Reichenbachian Reichenbachian marked this pull request as ready for review March 6, 2026 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant