fix: properly strip HTML tags and resolve entities in feed article summaries by stakeswky · Pull Request #149 · bewcloud/bewcloud

stakeswky · 2026-02-21T14:39:10Z

Fixes #146

Problem

The feed reader displays raw HTML tags and unresolved entities in article summaries instead of clean plain text.

Root Cause

parseTextFromHtml() in lib/feed.ts used document.textContent directly on the parsed HTML document object, which could include artifacts from the document wrapper and didn't handle all edge cases.

Fix

Extract text from the <body> element specifically (where the actual content lives after parsing)
Collapse multiple whitespace/newlines into single spaces for cleaner display
Add early return for empty/whitespace-only input
Use optional chaining for safer null handling

…mmaries Fixes bewcloud#146 The parseTextFromHtml function was using document.textContent directly on the parsed HTML document, which could leave raw HTML tags and unresolved entities in feed article summaries. Changes: - Extract text from body element to avoid document wrapper artifacts - Collapse multiple whitespace/newlines into single spaces for cleaner output - Add early return for empty/whitespace-only input - Use optional chaining for safer null handling

BrunoBernardino

Thank you for the suggested fix, @stakeswky ! While your summary appears to be AI-generated, the code fix is too small to make that assessment, and it's simple enough for me to worry too much about it.

I do have a request to either improve or remove a couple of lines of code in here.

Thanks, I hope that makes sense!

BrunoBernardino · 2026-02-22T18:05:41Z

+    // Collapse multiple whitespace/newlines into single spaces
+    .replace(/\s+/g, ' ')


I don't think this is necessary and it will break text-only summaries that are properly formatted with line breaks. That being said, it would make sense to remove/trim more than 2 newline or whitespace characters in a row.

Good point — updated in 7a3bfc5. Now it only collapses runs of 2+ non-newline whitespace into a single space, and 3+ consecutive newlines into a double newline. Single line breaks are preserved.

.replace(/[^\S\n]{2,}/g, ' ') .replace(/\n{3,}/g, '\n\n')

…pace Address review feedback: the previous \s+ regex was too aggressive and broke text-only summaries with legitimate line breaks. Now: - Collapse runs of 2+ non-newline whitespace into a single space - Collapse 3+ consecutive newlines into double newline (paragraph break) - Single line breaks are preserved

stakeswky · 2026-02-23T02:20:14Z

Hi @BrunoBernardino, thanks for the feedback! I've updated the regex in the second commit:

[^\S\n]{2,} → collapses runs of 2+ non-newline whitespace into a single space
\n{3,} → collapses 3+ consecutive newlines into a double newline (paragraph break)

Single line breaks are now preserved. Let me know if this looks good!

BrunoBernardino

Thanks for the changes!

BrunoBernardino requested changes Feb 22, 2026

View reviewed changes

BrunoBernardino approved these changes Feb 23, 2026

View reviewed changes

BrunoBernardino merged commit 1aca444 into bewcloud:main Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: properly strip HTML tags and resolve entities in feed article summaries#149

fix: properly strip HTML tags and resolve entities in feed article summaries#149
BrunoBernardino merged 2 commits intobewcloud:mainfrom
stakeswky:fix/146-feed-html-processing

stakeswky commented Feb 21, 2026

Uh oh!

BrunoBernardino left a comment

Uh oh!

BrunoBernardino Feb 22, 2026

Uh oh!

stakeswky Feb 23, 2026

Uh oh!

stakeswky commented Feb 23, 2026

Uh oh!

BrunoBernardino left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Collapse multiple whitespace/newlines into single spaces
		.replace(/\s+/g, ' ')

Uh oh!

Conversation

stakeswky commented Feb 21, 2026

Fixes #146

Problem

Root Cause

Fix

Uh oh!

BrunoBernardino left a comment

Choose a reason for hiding this comment

Uh oh!

BrunoBernardino Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

stakeswky Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

stakeswky commented Feb 23, 2026

Uh oh!

BrunoBernardino left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants