fix if director is empty by phoenixthrush · Pull Request #1 · phoenixthrush/javdb-python

phoenixthrush · 2026-01-30T16:37:22Z

No description provided.

Copilot

Pull request overview

This pull request addresses an issue with empty Director fields and makes several related improvements to metadata extraction and output formatting.

Changes:

Modified _labeled_single function to extract only the first non-empty line after converting <br> tags to newlines, preventing empty fields from capturing extraneous multi-line content
Added special handling for the Director field to distinguish between "field not found" and "field found but empty", preventing incorrect fallback to alternative extraction methods
Added filtering to remove "genre", "genres", or "genre(s)" literal values from the genres list
Changed JSON output to use empty arrays instead of null for empty genres/actresses lists
Redirected JSON status messages to stderr to avoid interfering with JSON output on stdout

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
src/javdb/main.py	Modified metadata extraction logic to handle empty Director fields, improved genre filtering, changed JSON array output behavior, and redirected status messages to stderr for JSON mode
pyproject.toml	Bumped version from 0.1.0 to 0.1.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-30T16:42:37Z

+    director = _labeled_single(html, "Director")
+    if director is None:
+        director = extract(["Director"], max_words=8)
+    elif not director:
+        director = None
+    meta["Director"] = director


The Director field now has special handling to distinguish between "field not found" (None) and "field found but empty" (empty string converted to None). This prevents fallback to the extract function when the Director field exists but is empty in the HTML. This same issue potentially affects other metadata fields (Studio, Series, etc.) that use _labeled_single(html, label) or extract(...) pattern, which would incorrectly fall back to extract even when the field exists but is empty. Consider whether this fix should be applied consistently to other fields, or document why Director requires special handling.

Copilot · 2026-01-30T16:42:37Z

-                genres_list.append(g)
+            if not g:
+                continue
+            if re.fullmatch(r"genres?|genre\(s\)?", g, flags=re.I):


The regex pattern has redundancy: both genres? and genre\(s\)? match the string "genre", making the pattern more complex than necessary. A clearer pattern would be r"genre(s|\(s\))?" to match "genre", "genres", and "genre(s)" without redundancy.

Suggested change

if re.fullmatch(r"genres?|genre\(s\)?", g, flags=re.I):

if re.fullmatch(r"genre(s|\(s\))?", g, flags=re.I):

Copilot · 2026-01-30T16:42:37Z

+        "genres": genres_list,
+        "actresses": actresses_list,


This change alters the JSON output format for empty genres and actresses from null to [] (empty array). While this makes the output more consistent with the preview_images field, it represents a breaking change in the API contract that may affect consumers of this JSON output. Consider whether this breaking change is intentional and documented, or if backward compatibility should be maintained.

Suggested change

"genres": genres_list,

"actresses": actresses_list,

"genres": genres_list or None,

"actresses": actresses_list or None,

Copilot · 2026-01-30T16:42:37Z

+                print("JSON written ->", output_path, file=sys.stderr)
            except Exception as e:
-                print("Failed saving JSON:", e)
+                print("Failed saving JSON:", e, file=sys.stderr)


The change to redirect JSON status messages to stderr prevents them from interfering with JSON output on stdout, which is good. However, this creates an inconsistency: NFO mode has the same issue with status messages (lines 503, 505, 524, 526) being printed to stdout before the NFO output (line 554), which would interfere with the NFO on stdout when using --download. For consistency, all status messages should be redirected to stderr, or both JSON and NFO outputs should be handled consistently.

Copilot · 2026-01-30T16:42:38Z

+    block = re.sub(r"(?is)<br\s*/?>", "\n", block)
+    first_line = next((ln for ln in block.splitlines() if ln.strip()), "")
+    return _clean_html_text(first_line)


The new logic extracts only the first non-empty line after converting <br> tags to newlines. While this fixes cases where empty fields contain extraneous content on subsequent lines, it also changes behavior for all fields that use this function (DVD ID, Content ID, Release Date, Runtime, Studio, Director, Series). If any of these fields legitimately contain multi-line content separated by <br> tags, only the first line will now be captured. Verify that this behavior change is acceptable for all these fields, not just Director.

Suggested change

block = re.sub(r"(?is)<br\s*/?>", "\n", block)

first_line = next((ln for ln in block.splitlines() if ln.strip()), "")

return _clean_html_text(first_line)

# Preserve line structure: convert <br> to newlines, then keep all

# non-empty lines rather than only the first.

block = re.sub(r"(?is)<br\s*/?>", "\n", block)

raw_lines = [ln for ln in block.splitlines() if ln.strip()]

cleaned_lines = [_clean_html_text(ln) for ln in raw_lines]

cleaned_lines = [ln for ln in cleaned_lines if ln]

if not cleaned_lines:

return None

return "\n".join(cleaned_lines)

fix if director is empty

2e66884

Copilot AI review requested due to automatic review settings January 30, 2026 16:37

phoenixthrush merged commit d5bcf9e into main Jan 30, 2026
3 checks passed

Copilot started reviewing on behalf of phoenixthrush January 30, 2026 16:37 View session

phoenixthrush deleted the nfo branch January 30, 2026 16:38

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix if director is empty#1

fix if director is empty#1
phoenixthrush merged 1 commit intomainfrom
nfo

phoenixthrush commented Jan 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Copilot AI Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if re.fullmatch(r"genres?\|genre\(s\)?", g, flags=re.I):
	if re.fullmatch(r"genre(s\|\(s\))?", g, flags=re.I):

-    block = re.sub(r"(?is)<br\s*/?>", "\n", block)
-    first_line = next((ln for ln in block.splitlines() if ln.strip()), "")
-    return _clean_html_text(first_line)
+    # Preserve line structure: convert <br> to newlines, then keep all
+    # non-empty lines rather than only the first.
+    block = re.sub(r"(?is)<br\s*/?>", "\n", block)
+    raw_lines = [ln for ln in block.splitlines() if ln.strip()]
+    cleaned_lines = [_clean_html_text(ln) for ln in raw_lines]
+    cleaned_lines = [ln for ln in cleaned_lines if ln]
+    if not cleaned_lines:
+        return None
+    return "\n".join(cleaned_lines)

Conversation

phoenixthrush commented Jan 30, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants