Skip to content

fix if director is empty#1

Merged
phoenixthrush merged 1 commit intomainfrom
nfo
Jan 30, 2026
Merged

fix if director is empty#1
phoenixthrush merged 1 commit intomainfrom
nfo

Conversation

@phoenixthrush
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings January 30, 2026 16:37
@phoenixthrush phoenixthrush merged commit d5bcf9e into main Jan 30, 2026
3 checks passed
@phoenixthrush phoenixthrush deleted the nfo branch January 30, 2026 16:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses an issue with empty Director fields and makes several related improvements to metadata extraction and output formatting.

Changes:

  • Modified _labeled_single function to extract only the first non-empty line after converting <br> tags to newlines, preventing empty fields from capturing extraneous multi-line content
  • Added special handling for the Director field to distinguish between "field not found" and "field found but empty", preventing incorrect fallback to alternative extraction methods
  • Added filtering to remove "genre", "genres", or "genre(s)" literal values from the genres list
  • Changed JSON output to use empty arrays instead of null for empty genres/actresses lists
  • Redirected JSON status messages to stderr to avoid interfering with JSON output on stdout

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/javdb/main.py Modified metadata extraction logic to handle empty Director fields, improved genre filtering, changed JSON array output behavior, and redirected status messages to stderr for JSON mode
pyproject.toml Bumped version from 0.1.0 to 0.1.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/javdb/__main__.py
Comment on lines +260 to +265
director = _labeled_single(html, "Director")
if director is None:
director = extract(["Director"], max_words=8)
elif not director:
director = None
meta["Director"] = director
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Director field now has special handling to distinguish between "field not found" (None) and "field found but empty" (empty string converted to None). This prevents fallback to the extract function when the Director field exists but is empty in the HTML. This same issue potentially affects other metadata fields (Studio, Series, etc.) that use _labeled_single(html, label) or extract(...) pattern, which would incorrectly fall back to extract even when the field exists but is empty. Consider whether this fix should be applied consistently to other fields, or document why Director requires special handling.

Copilot uses AI. Check for mistakes.
Comment thread src/javdb/__main__.py
genres_list.append(g)
if not g:
continue
if re.fullmatch(r"genres?|genre\(s\)?", g, flags=re.I):
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern has redundancy: both genres? and genre\(s\)? match the string "genre", making the pattern more complex than necessary. A clearer pattern would be r"genre(s|\(s\))?" to match "genre", "genres", and "genre(s)" without redundancy.

Suggested change
if re.fullmatch(r"genres?|genre\(s\)?", g, flags=re.I):
if re.fullmatch(r"genre(s|\(s\))?", g, flags=re.I):

Copilot uses AI. Check for mistakes.
Comment thread src/javdb/__main__.py
Comment on lines +375 to +376
"genres": genres_list,
"actresses": actresses_list,
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters the JSON output format for empty genres and actresses from null to [] (empty array). While this makes the output more consistent with the preview_images field, it represents a breaking change in the API contract that may affect consumers of this JSON output. Consider whether this breaking change is intentional and documented, or if backward compatibility should be maintained.

Suggested change
"genres": genres_list,
"actresses": actresses_list,
"genres": genres_list or None,
"actresses": actresses_list or None,

Copilot uses AI. Check for mistakes.
Comment thread src/javdb/__main__.py
Comment on lines +402 to +404
print("JSON written ->", output_path, file=sys.stderr)
except Exception as e:
print("Failed saving JSON:", e)
print("Failed saving JSON:", e, file=sys.stderr)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to redirect JSON status messages to stderr prevents them from interfering with JSON output on stdout, which is good. However, this creates an inconsistency: NFO mode has the same issue with status messages (lines 503, 505, 524, 526) being printed to stdout before the NFO output (line 554), which would interfere with the NFO on stdout when using --download. For consistency, all status messages should be redirected to stderr, or both JSON and NFO outputs should be handled consistently.

Copilot uses AI. Check for mistakes.
Comment thread src/javdb/__main__.py
Comment on lines +57 to +59
block = re.sub(r"(?is)<br\s*/?>", "\n", block)
first_line = next((ln for ln in block.splitlines() if ln.strip()), "")
return _clean_html_text(first_line)
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic extracts only the first non-empty line after converting <br> tags to newlines. While this fixes cases where empty fields contain extraneous content on subsequent lines, it also changes behavior for all fields that use this function (DVD ID, Content ID, Release Date, Runtime, Studio, Director, Series). If any of these fields legitimately contain multi-line content separated by <br> tags, only the first line will now be captured. Verify that this behavior change is acceptable for all these fields, not just Director.

Suggested change
block = re.sub(r"(?is)<br\s*/?>", "\n", block)
first_line = next((ln for ln in block.splitlines() if ln.strip()), "")
return _clean_html_text(first_line)
# Preserve line structure: convert <br> to newlines, then keep all
# non-empty lines rather than only the first.
block = re.sub(r"(?is)<br\s*/?>", "\n", block)
raw_lines = [ln for ln in block.splitlines() if ln.strip()]
cleaned_lines = [_clean_html_text(ln) for ln in raw_lines]
cleaned_lines = [ln for ln in cleaned_lines if ln]
if not cleaned_lines:
return None
return "\n".join(cleaned_lines)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants