fix(converter): split consecutive paragraph lines into separate Text blocks#97
Merged
riba2534 merged 2 commits intoriba2534:mainfrom Apr 13, 2026
Merged
Conversation
Contributor
Author
|
@riba2534 Hi!这个 PR 修复了 Markdown 连续行导入飞书时合并为一段的问题,改动较小,测试都通过了,麻烦有空看一下 🙏 |
…blocks Markdown lines separated only by a newline (no blank line) were merged into a single Text block when imported to Feishu. Common pattern: **A派**:xxx **B派**:yyy Both lines ended up in one paragraph block, displayed as a single run. Root cause: `extractTextElements` walked the paragraph AST without checking `SoftLineBreak()` on `*ast.Text` nodes. Fix: introduce `extractParagraphLines` (mirrors the existing `extractQuoteLines` logic, which already handles this correctly for blockquotes) and update `convertParagraph` to emit one Text block per logical line. Single-line paragraphs and paragraphs separated by blank lines are unaffected. Known limitation: a soft line break *inside* an inline container (e.g. `**line1\nline2**`) is not split because those nodes use WalkSkipChildren. This edge case is extremely rare in practice. Add regression tests covering block count, content and bold-style preservation after the split.
- 在段落软换行拆分时正确处理 <u>/<mark>/<br/> 等内联 HTML 节点, 避免 RawHTML 内容在多行段落中被静默丢失 - <br> 触发软换行、<u>/<mark> 切换下划线样式、其他 HTML 节点沿用 extractChildElements 的处理逻辑(handleInlineHTMLTag 或丢弃) - 对 ast.Text 调用 unescapeMarkdownText,与 extractTextElements 保持 一致,修复段落内转义字符残留反斜杠的问题 - 补充 <u>/<mark> 两个单元测试 Addresses review findings on PR riba2534#97.
9d76995 to
40969a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题
Markdown 中用单换行符分隔的连续行(无空行)在导入飞书时会合并为一段。常见场景:
导入后两行出现在同一个 Text block 中,飞书显示为一段连续文本。
根因
extractTextElements遍历段落 AST 时未检测*ast.Text节点的SoftLineBreak()标志。按 CommonMark 规范,段落内的换行只是 SoftLineBreak,不新建段落,因此整个段落作为一个 AST Paragraph 节点,所有行都被塞进一个 Text block。有趣的是,引用块(blockquote)已经通过
extractQuoteLines正确处理了这个问题,但普通段落没有。修复
新增
extractParagraphLines,逻辑与extractQuoteLines相同(遇到SoftLineBreak分行),额外支持内联图片和行内公式后处理。修改convertParagraph为每行创建独立 Text block。Known limitation:换行符位于内联容器内部(如
**行一\n行二**)不会触发分行,此类写法极罕见,已在注释中记录。测试
新增两个回归测试:
TestConvert_ConsecutiveLinesBecomeSeparateBlocks:验证 block 数量TestConvert_ConsecutiveBoldLinesPreserveContent:验证分行后每块的内容文本和粗体样式均正确保留Test plan
go test ./internal/converter/...全部通过