Refactor AI notes handling and version mapping extraction#24832
Refactor AI notes handling and version mapping extraction#24832m-aliozkaya merged 1 commit intodevfrom
Conversation
Updated the AI response handling and improved raw notes cleaning process. Enhanced version mapping table extraction and file writing to preserve all content.
There was a problem hiding this comment.
Pull request overview
This pull request refactors the GitHub Actions workflow that updates ABP Studio documentation. It enhances the handling of release notes when AI formatting is unavailable and improves the version mapping table extraction logic to preserve all file content.
Changes:
- Enhanced fallback release notes cleaning with better GitHub metadata removal, username sanitization, and bullet point normalization
- Improved version mapping table parsing to detect table boundaries and preserve content before and after the table
- Added comprehensive cleaning rules for GitHub-generated release notes including PR links and user mentions
| elif table_start > 0 and line.strip() and not line.strip().startswith('|'): | ||
| table_end = i | ||
| break | ||
|
|
||
| if table_start == 0: | ||
| print("❌ Could not find version mapping table") | ||
| exit(1) | ||
|
|
||
| # Extract header + separator + data rows | ||
| header_lines = lines[:table_start+2] # Keep everything before data rows | ||
| data_rows = [l for l in lines[table_start+2:] if l.strip().startswith('|')] | ||
| # If no end found, table goes to end of file | ||
| if table_end == 0: | ||
| table_end = len(lines) | ||
|
|
||
| # Extract sections | ||
| before_table = lines[:table_start] # Everything before table | ||
| table_header = lines[table_start:table_start+2] # Header + separator | ||
| data_rows = [l for l in lines[table_start+2:table_end] if l.strip().startswith('|')] # Data rows | ||
| after_table = lines[table_end:] # Everything after table |
There was a problem hiding this comment.
The table end detection skips empty lines but doesn't preserve them. When table_end is set to line 45 (the first non-empty, non-table line), the slicing loses the empty line(s) between the table and the next section. In the version-mapping.md file, there's an empty line (44) between the table and the "Working with ABP's Preview Versions" heading. This empty line will be lost when reconstructing the file, potentially affecting formatting.
| # Clean GitHub username mentions (@username -> username) | ||
| line=$(echo "$line" | sed 's/@\([a-zA-Z0-9_-]*\)/\1/g') |
There was a problem hiding this comment.
The sed command here removes the '@' symbol from usernames but doesn't handle the edge case where a username might be immediately followed by punctuation (like '@username,' or '@username.'). This could leave unwanted punctuation in the cleaned notes. Consider using a word boundary or more precise pattern.
| # Clean GitHub username mentions (@username -> username) | |
| line=$(echo "$line" | sed 's/@\([a-zA-Z0-9_-]*\)/\1/g') | |
| # Clean GitHub username mentions (@username or @username, -> username) | |
| line=$(echo "$line" | sed 's/@\([A-Za-z0-9_-][A-Za-z0-9_-]*\)[[:punct:]]\{0,1\}/\1/g') |
| [[ "$line" =~ ^(Full\ Changelog|\*\*Full\ Changelog) ]] && continue | ||
|
|
||
| # Clean GitHub username mentions (@username -> username) | ||
| line=$(echo "$line" | sed 's/@\([a-zA-Z0-9_-]*\)/\1/g') |
There was a problem hiding this comment.
The regex pattern '[a-zA-Z0-9_-]' uses the '' quantifier which matches zero or more characters. This means it could match an empty string after '@', which would be incorrect. Change '*' to '+' to require at least one character in the username.
| line=$(echo "$line" | sed 's/@\([a-zA-Z0-9_-]*\)/\1/g') | |
| line=$(echo "$line" | sed 's/@\([a-zA-Z0-9_-]\+\)/\1/g') |
| line=$(echo "$line" | sed 's#by [a-zA-Z0-9_-]* in##g') | ||
|
|
||
| # Clean extra "by username" mentions | ||
| line=$(echo "$line" | sed 's/by [a-zA-Z0-9_-]*$//g') |
There was a problem hiding this comment.
The sed pattern 's/by [a-zA-Z0-9_-]$//' uses '' which matches zero or more characters, allowing 'by $' to match even when there's no username. This could incorrectly remove a trailing 'by' from legitimate text. Use '+' instead to require at least one character.
| line=$(echo "$line" | sed 's/by [a-zA-Z0-9_-]*$//g') | |
| line=$(echo "$line" | sed 's/by [a-zA-Z0-9_-]\+$//g') |
|
|
||
| # Clean GitHub PR/issue links - extract just the description | ||
| line=$(echo "$line" | sed 's#in https://github.com/[^[:space:]]*##g') | ||
| line=$(echo "$line" | sed 's#by [a-zA-Z0-9_-]* in##g') |
There was a problem hiding this comment.
The sed pattern 's#by [a-zA-Z0-9_-]* in##g' uses '*' which matches zero or more characters. This could match 'by in' (with just spaces) and incorrectly remove it from legitimate text. Use '+' to require at least one character in the username.
| line=$(echo "$line" | sed 's#by [a-zA-Z0-9_-]* in##g') | |
| line=$(echo "$line" | sed 's#by [a-zA-Z0-9_-]\+ in##g') |
| # Normalize to dash | ||
| echo "$line" | sed 's/^[[:space:]]*\*/- /' |
There was a problem hiding this comment.
The regex pattern '^[[:space:]][-]' will match lines starting with either a dash or asterisk, but the normalization at line 231 only handles asterisks preceded by optional whitespace. If there's a line with significant leading whitespace and a dash (e.g., " - item"), the normalization won't trigger, and the whitespace will be preserved. Consider normalizing all bullet formats consistently, regardless of leading whitespace.
| # Normalize to dash | |
| echo "$line" | sed 's/^[[:space:]]*\*/- /' | |
| # Normalize to dash for both '*' and '-' bullets, and normalize spacing | |
| echo "$line" | sed 's/^[[:space:]]*[-*][[:space:]]*/- /' |
No description provided.