Skip to content

fix(converter): split consecutive paragraph lines into separate Text blocks#97

Merged
riba2534 merged 2 commits intoriba2534:mainfrom
nengqi:fix/paragraph-softlinebreak
Apr 13, 2026
Merged

fix(converter): split consecutive paragraph lines into separate Text blocks#97
riba2534 merged 2 commits intoriba2534:mainfrom
nengqi:fix/paragraph-softlinebreak

Conversation

@nengqi
Copy link
Copy Markdown
Contributor

@nengqi nengqi commented Apr 12, 2026

问题

Markdown 中用单换行符分隔的连续行(无空行)在导入飞书时会合并为一段。常见场景:

**A派**:xxx
**B派**:yyy

导入后两行出现在同一个 Text block 中,飞书显示为一段连续文本。

根因

extractTextElements 遍历段落 AST 时未检测 *ast.Text 节点的 SoftLineBreak() 标志。按 CommonMark 规范,段落内的换行只是 SoftLineBreak,不新建段落,因此整个段落作为一个 AST Paragraph 节点,所有行都被塞进一个 Text block。

有趣的是,引用块(blockquote)已经通过 extractQuoteLines 正确处理了这个问题,但普通段落没有。

修复

新增 extractParagraphLines,逻辑与 extractQuoteLines 相同(遇到 SoftLineBreak 分行),额外支持内联图片和行内公式后处理。修改 convertParagraph 为每行创建独立 Text block。

  • 单行段落行为不变
  • 空行分隔的多段落行为不变
  • 只影响"同一段落内的软换行"场景

Known limitation:换行符位于内联容器内部(如 **行一\n行二**)不会触发分行,此类写法极罕见,已在注释中记录。

测试

新增两个回归测试:

  • TestConvert_ConsecutiveLinesBecomeSeparateBlocks:验证 block 数量
  • TestConvert_ConsecutiveBoldLinesPreserveContent:验证分行后每块的内容文本和粗体样式均正确保留

Test plan

  • go test ./internal/converter/... 全部通过
  • 单行/多行/空行分隔场景均验证
  • 粗体样式在分行后正确保留

@nengqi
Copy link
Copy Markdown
Contributor Author

nengqi commented Apr 13, 2026

@riba2534 Hi!这个 PR 修复了 Markdown 连续行导入飞书时合并为一段的问题,改动较小,测试都通过了,麻烦有空看一下 🙏

nengqi and others added 2 commits April 13, 2026 15:19
…blocks

Markdown lines separated only by a newline (no blank line) were merged
into a single Text block when imported to Feishu. Common pattern:

  **A派**:xxx
  **B派**:yyy

Both lines ended up in one paragraph block, displayed as a single run.

Root cause: `extractTextElements` walked the paragraph AST without
checking `SoftLineBreak()` on `*ast.Text` nodes.

Fix: introduce `extractParagraphLines` (mirrors the existing
`extractQuoteLines` logic, which already handles this correctly for
blockquotes) and update `convertParagraph` to emit one Text block per
logical line. Single-line paragraphs and paragraphs separated by blank
lines are unaffected.

Known limitation: a soft line break *inside* an inline container
(e.g. `**line1\nline2**`) is not split because those nodes use
WalkSkipChildren. This edge case is extremely rare in practice.

Add regression tests covering block count, content and bold-style
preservation after the split.
- 在段落软换行拆分时正确处理 <u>/<mark>/<br/> 等内联 HTML 节点,
  避免 RawHTML 内容在多行段落中被静默丢失
- <br> 触发软换行、<u>/<mark> 切换下划线样式、其他 HTML 节点沿用
  extractChildElements 的处理逻辑(handleInlineHTMLTag 或丢弃)
- 对 ast.Text 调用 unescapeMarkdownText,与 extractTextElements 保持
  一致,修复段落内转义字符残留反斜杠的问题
- 补充 <u>/<mark> 两个单元测试

Addresses review findings on PR riba2534#97.
@riba2534 riba2534 force-pushed the fix/paragraph-softlinebreak branch from 9d76995 to 40969a8 Compare April 13, 2026 07:22
@riba2534 riba2534 merged commit 8090ad2 into riba2534:main Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants