Skip to content

Docs versioning: update CI to warn for outdated metadata#3279

Draft
C-Achard wants to merge 20 commits intomainfrom
cy/docs-ci-tweaks
Draft

Docs versioning: update CI to warn for outdated metadata#3279
C-Achard wants to merge 20 commits intomainfrom
cy/docs-ci-tweaks

Conversation

@C-Achard
Copy link
Copy Markdown
Collaborator

@C-Achard C-Achard commented Apr 9, 2026

This pull request enhances the documentation and notebook staleness check workflow by updating the CI process to now only scan documentation files that have changed, and the staleness report provides maintainers with guidance on syncing embedded metadata by providing commands examples.
Additionally, the policy configuration and enforcement logic have been extended to support stricter checks for metadata synchronization.

TODO

Automated summary

Key improvements include:

CI Workflow Efficiency:

  • The GitHub Actions workflow (.github/workflows/docs_and_notebooks_checks.yml) now detects and scans only changed .md and .ipynb files, skipping the scan if no relevant files were modified. This reduces unnecessary computation and speeds up CI runs.

Metadata Synchronization Guidance:

  • The staleness check tool (tools/docs_and_notebooks_check.py) now identifies files where the embedded deeplabcut.last_content_updated metadata is missing or out of sync with the git content date. The generated report includes a list of such files and provides maintainers with suggested commands to synchronize metadata and commit changes.

Policy Configuration and Enforcement:

  • Added a fail_if_metadata_sync_needed option to the policy config, allowing the CI to fail if any documentation files require metadata synchronization. The enforcement logic was updated to respect this setting.

Reporting and Summarization:

  • The summary and markdown report now include a count and section for files needing metadata sync, making it easier to spot and address these issues.

Schema and Type Updates:

  • Bumped schema version and switched to typing.Literal.

These changes streamline the documentation maintenance workflow, improve feedback to contributors, and help keep embedded metadata accurate and up to date.

C-Achard added 4 commits April 9, 2026 15:04
Normalize CLI target specs (handle Windows/backslashes, ./, absolute paths) and classify them as file/dir/glob. Implement matching logic and validation (report matched files and unmatched selectors, return rc=2 on unmatched), and apply target specs in scan/update. Add helpers (normalize_target_spec, compile_target_specs, validate_requested_targets, print_target_match_summary, iter_scan_candidate_paths) and adjust main argument help. Update tests to cover directory, glob, Windows-style paths, and CLI reporting.
Add a fail_if_metadata_sync_needed policy flag and utilities to detect/collect files whose embedded metadata last_content_updated differs from the computed git content date. New helpers: record_needs_metadata_sync() (skips non-md/ipynb and meta.ignore), collect_metadata_sync_targets() (returns sorted unique paths), and build_metadata_sync_command() (emits a ready-to-run python command to update targets using --set-content-date-from-git and --ack-meta-commit-marker). The enforce() flow now emits violations for out-of-sync md/ipynb files when the flag is enabled.
Introduce actionable metadata-sync guidance for maintainers: rename SCHEMA_VERSION to REPORT_SCHEMA_VERSION, restrict ToolConfig.version to 1, and add metadata_sync_targets and metadata_sync_command to the Report model. Add build_git_add_command helper and mark records requiring metadata sync (metadata_sync_needed) so summaries include that count. Extend markdown output and CLI to show which files need metadata updates and to print suggested commands (sync command, git add and commit) for applying fixes locally.
Update the GitHub Actions workflow to only run docs and notebook staleness checks for changed .md/.markdown/.ipynb files. Adds a step to collect changed docs into tmp/docs_nb_checks/changed_docs.txt and expose the count via outputs; conditionally runs the report and optional policy check only when there are changed files, passing the files as --targets to the checker. Adds a no-op step when no docs changed and uploads the changed_docs.txt alongside other artifacts. Also renames the job to reflect the new behavior to reduce unnecessary scanning and noise.
@C-Achard C-Achard self-assigned this Apr 9, 2026
@C-Achard C-Achard added enhancement New feature or request documentation documentation updates/comments CI Related to CI/CD jobs and automated testing labels Apr 9, 2026
@C-Achard C-Achard requested a review from Copilot April 9, 2026 14:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the docs/notebooks staleness tooling and CI workflow to reduce unnecessary scanning and to add reporting/enforcement around syncing embedded deeplabcut.last_content_updated metadata with git-derived content dates.

Changes:

  • Update the GitHub Actions workflow to detect and scan only changed .md/.ipynb (and .markdown) files, skipping the job work when none changed.
  • Extend the staleness report to identify files needing metadata sync and provide suggested local commands to update and commit.
  • Add a policy option to optionally fail check when metadata sync is needed, and bump the report schema version.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tools/docs_and_notebooks_check.py Adds metadata-sync detection/reporting, optional enforcement, and updates report schema/version typing.
.github/workflows/docs_and_notebooks_checks.yml Optimizes CI by computing changed doc targets and running the tool only on those files.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/docs_and_notebooks_check.py
Comment thread tools/docs_and_notebooks_check.py
Comment thread tools/docs_and_notebooks_check.py
Comment thread .github/workflows/docs_and_notebooks_checks.yml Outdated
Comment thread .github/workflows/docs_and_notebooks_checks.yml Outdated
Treat target specs that fail normalization as kind "invalid" so they are not silently ignored. compile_target_specs now appends invalid specs, target_spec_matches_path returns False for invalid kinds, and validate_requested_targets records invalid raw selectors as unmatched (and safely iterates when specs may be None). This ensures malformed or non-normalizable CLI selectors are reported back to the user rather than dropped.
Introduce shared pytest fixtures (repo, cfg) and import Callable to reduce repetition of tmp_path/git init logic across tests. Refactor many tests to use the new fixtures and ToolConfig factory, and add/adjust tests covering target validation and scanning edge cases (several validate_requested_targets variations, scan_files with invalid only-targets, and main returning 2 for invalid selector). Overall this centralizes repo/config setup and adds coverage for target handling behavior.
tools/docs_and_notebooks_check.py: Return False from record_needs_metadata_sync when computed timestamp is None to avoid triggering sync for files with no computed last_content_updated. Also import shlex and quote paths in build_metadata_sync_command so generated shell commands are safe for paths with spaces/special chars.
Expand and clarify the --targets argument help strings in tools/docs_and_notebooks_check.py for the update and normalize subcommands. The updated messages document that --targets accepts exact files, directories, and glob patterns (with examples) and note that both '/' and '\\' path separators are accepted. This is a documentation-only change to improve user guidance; no functional behavior is altered.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +592 to +604
def record_needs_metadata_sync(rec: FileRecord) -> bool:
if rec.kind not in {"md", "ipynb"}:
return False
if rec.meta and rec.meta.ignore:
return False

embedded = rec.meta.last_content_updated if rec.meta else None
computed = rec.last_content_updated

if computed is None:
return False

return embedded != computed
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_needs_metadata_sync() treats any record with rec.meta is None as needing sync (since embedded becomes None). That means files flagged as invalid_metadata will always be reported as metadata_sync_needed, even when deeplabcut.last_content_updated is present but other metadata fields are invalid. This conflates “invalid metadata” with “missing/out-of-sync last_content_updated” and can produce misleading remediation commands. Consider short-circuiting when "invalid_metadata" in rec.warnings (or deriving last_content_updated from the raw metadata/frontmatter when possible) so the sync signal only covers genuinely missing/out-of-sync content dates.

Copilot uses AI. Check for mistakes.
Comment thread tools/docs_and_notebooks_check.py
Comment thread tools/docs_and_notebooks_check.py
Comment on lines +731 to +733
if record_needs_metadata_sync(rec):
rec.warnings.append("metadata_sync_needed")

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior adds the metadata_sync_needed warning and the associated report fields/command generation, but there are no tests exercising it (e.g., out-of-sync last_content_updated in frontmatter / notebook metadata should produce the warning, appear in Report.metadata_sync_targets, and generate a non-empty metadata_sync_command). Adding a focused unit test here would prevent regressions in CI reporting/enforcement.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/docs_and_notebooks_checks.yml Outdated
C-Achard and others added 9 commits April 10, 2026 11:35
…m/DeepLabCut/DeepLabCut into cy/docs-versioning-tweaks"

This reverts commit 46fdb04, reversing
changes made to 6d3b1be.
Introduce TypedDict-based types for FileKind and TargetSpec (with TargetKind) and tighten type annotations across functions (compile_target_specs, target_spec_matches_path, target_matches). Replace PurePosixPath-based glob matching with fnmatch.fnmatchcase only to ensure consistent, shell-style pattern behavior across platforms and remove unused imports. Minor cleanups: import TypedDict and update variable type hints for better static checking and readability.
Base automatically changed from cy/docs-versioning-tweaks to main April 12, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Related to CI/CD jobs and automated testing documentation documentation updates/comments enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants