Docs versioning: update CI to warn for outdated metadata#3279
Docs versioning: update CI to warn for outdated metadata#3279
Conversation
Normalize CLI target specs (handle Windows/backslashes, ./, absolute paths) and classify them as file/dir/glob. Implement matching logic and validation (report matched files and unmatched selectors, return rc=2 on unmatched), and apply target specs in scan/update. Add helpers (normalize_target_spec, compile_target_specs, validate_requested_targets, print_target_match_summary, iter_scan_candidate_paths) and adjust main argument help. Update tests to cover directory, glob, Windows-style paths, and CLI reporting.
Add a fail_if_metadata_sync_needed policy flag and utilities to detect/collect files whose embedded metadata last_content_updated differs from the computed git content date. New helpers: record_needs_metadata_sync() (skips non-md/ipynb and meta.ignore), collect_metadata_sync_targets() (returns sorted unique paths), and build_metadata_sync_command() (emits a ready-to-run python command to update targets using --set-content-date-from-git and --ack-meta-commit-marker). The enforce() flow now emits violations for out-of-sync md/ipynb files when the flag is enabled.
Introduce actionable metadata-sync guidance for maintainers: rename SCHEMA_VERSION to REPORT_SCHEMA_VERSION, restrict ToolConfig.version to 1, and add metadata_sync_targets and metadata_sync_command to the Report model. Add build_git_add_command helper and mark records requiring metadata sync (metadata_sync_needed) so summaries include that count. Extend markdown output and CLI to show which files need metadata updates and to print suggested commands (sync command, git add and commit) for applying fixes locally.
Update the GitHub Actions workflow to only run docs and notebook staleness checks for changed .md/.markdown/.ipynb files. Adds a step to collect changed docs into tmp/docs_nb_checks/changed_docs.txt and expose the count via outputs; conditionally runs the report and optional policy check only when there are changed files, passing the files as --targets to the checker. Adds a no-op step when no docs changed and uploads the changed_docs.txt alongside other artifacts. Also renames the job to reflect the new behavior to reduce unnecessary scanning and noise.
There was a problem hiding this comment.
Pull request overview
This PR updates the docs/notebooks staleness tooling and CI workflow to reduce unnecessary scanning and to add reporting/enforcement around syncing embedded deeplabcut.last_content_updated metadata with git-derived content dates.
Changes:
- Update the GitHub Actions workflow to detect and scan only changed
.md/.ipynb(and.markdown) files, skipping the job work when none changed. - Extend the staleness report to identify files needing metadata sync and provide suggested local commands to update and commit.
- Add a policy option to optionally fail
checkwhen metadata sync is needed, and bump the report schema version.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
tools/docs_and_notebooks_check.py |
Adds metadata-sync detection/reporting, optional enforcement, and updates report schema/version typing. |
.github/workflows/docs_and_notebooks_checks.yml |
Optimizes CI by computing changed doc targets and running the tool only on those files. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Treat target specs that fail normalization as kind "invalid" so they are not silently ignored. compile_target_specs now appends invalid specs, target_spec_matches_path returns False for invalid kinds, and validate_requested_targets records invalid raw selectors as unmatched (and safely iterates when specs may be None). This ensures malformed or non-normalizable CLI selectors are reported back to the user rather than dropped.
Introduce shared pytest fixtures (repo, cfg) and import Callable to reduce repetition of tmp_path/git init logic across tests. Refactor many tests to use the new fixtures and ToolConfig factory, and add/adjust tests covering target validation and scanning edge cases (several validate_requested_targets variations, scan_files with invalid only-targets, and main returning 2 for invalid selector). Overall this centralizes repo/config setup and adds coverage for target handling behavior.
as it is unused
tools/docs_and_notebooks_check.py: Return False from record_needs_metadata_sync when computed timestamp is None to avoid triggering sync for files with no computed last_content_updated. Also import shlex and quote paths in build_metadata_sync_command so generated shell commands are safe for paths with spaces/special chars.
Expand and clarify the --targets argument help strings in tools/docs_and_notebooks_check.py for the update and normalize subcommands. The updated messages document that --targets accepts exact files, directories, and glob patterns (with examples) and note that both '/' and '\\' path separators are accepted. This is a documentation-only change to improve user guidance; no functional behavior is altered.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def record_needs_metadata_sync(rec: FileRecord) -> bool: | ||
| if rec.kind not in {"md", "ipynb"}: | ||
| return False | ||
| if rec.meta and rec.meta.ignore: | ||
| return False | ||
|
|
||
| embedded = rec.meta.last_content_updated if rec.meta else None | ||
| computed = rec.last_content_updated | ||
|
|
||
| if computed is None: | ||
| return False | ||
|
|
||
| return embedded != computed |
There was a problem hiding this comment.
record_needs_metadata_sync() treats any record with rec.meta is None as needing sync (since embedded becomes None). That means files flagged as invalid_metadata will always be reported as metadata_sync_needed, even when deeplabcut.last_content_updated is present but other metadata fields are invalid. This conflates “invalid metadata” with “missing/out-of-sync last_content_updated” and can produce misleading remediation commands. Consider short-circuiting when "invalid_metadata" in rec.warnings (or deriving last_content_updated from the raw metadata/frontmatter when possible) so the sync signal only covers genuinely missing/out-of-sync content dates.
| if record_needs_metadata_sync(rec): | ||
| rec.warnings.append("metadata_sync_needed") | ||
|
|
There was a problem hiding this comment.
New behavior adds the metadata_sync_needed warning and the associated report fields/command generation, but there are no tests exercising it (e.g., out-of-sync last_content_updated in frontmatter / notebook metadata should produce the warning, appear in Report.metadata_sync_targets, and generate a non-empty metadata_sync_command). Adding a focused unit test here would prevent regressions in CI reporting/enforcement.
Co-authored-by: Copilot <[email protected]>
…bCut/DeepLabCut into cy/docs-versioning-tweaks
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…m/DeepLabCut/DeepLabCut into cy/docs-versioning-tweaks" This reverts commit 46fdb04, reversing changes made to 6d3b1be.
Introduce TypedDict-based types for FileKind and TargetSpec (with TargetKind) and tighten type annotations across functions (compile_target_specs, target_spec_matches_path, target_matches). Replace PurePosixPath-based glob matching with fnmatch.fnmatchcase only to ensure consistent, shell-style pattern behavior across platforms and remove unused imports. Minor cleanups: import TypedDict and update variable type hints for better static checking and readability.
This pull request enhances the documentation and notebook staleness check workflow by updating the CI process to now only scan documentation files that have changed, and the staleness report provides maintainers with guidance on syncing embedded metadata by providing commands examples.
Additionally, the policy configuration and enforcement logic have been extended to support stricter checks for metadata synchronization.
TODO
Automated summary
Key improvements include:
CI Workflow Efficiency:
.github/workflows/docs_and_notebooks_checks.yml) now detects and scans only changed.mdand.ipynbfiles, skipping the scan if no relevant files were modified. This reduces unnecessary computation and speeds up CI runs.Metadata Synchronization Guidance:
tools/docs_and_notebooks_check.py) now identifies files where the embeddeddeeplabcut.last_content_updatedmetadata is missing or out of sync with the git content date. The generated report includes a list of such files and provides maintainers with suggested commands to synchronize metadata and commit changes.Policy Configuration and Enforcement:
fail_if_metadata_sync_neededoption to the policy config, allowing the CI to fail if any documentation files require metadata synchronization. The enforcement logic was updated to respect this setting.Reporting and Summarization:
Schema and Type Updates:
These changes streamline the documentation maintenance workflow, improve feedback to contributors, and help keep embedded metadata accurate and up to date.